Skip to content
Browse files

genbank.pm: in "_read_GenBank_Species", adjusted a regex to

not match 'str.' or 'var.' when deciding if the line belonged
to CLASSIFICATION instead of ORGANISM. This happened with NC_021815,
and 2 other plasmids because unusually long descriptions:
  ORGANISM  Salmonella enterica subsp. enterica serovar Typhimurium var. 5-
            str. CFSAN001921
            Bacteria; Proteobacteria; Gammaproteobacteria; Enterobacteriales;
            Enterobacteriaceae; Salmonella.
  • Loading branch information...
1 parent dfb457a commit 443d3990d43558d27870591b94dd737410aa171f @fjossandon fjossandon committed Jun 27, 2014
Showing with 2 additions and 1 deletion.
  1. +2 −1 Bio/SeqIO/genbank.pm
View
3 Bio/SeqIO/genbank.pm
@@ -1551,7 +1551,8 @@ sub _read_GenBank_Species {
chomp $data;
$tag = 'CLASSIFICATION' if ( $tag ne 'CLASSIFICATION'
and $tag eq 'ORGANISM'
- and $line =~ m{[;\.]+});
+ # Don't match "str." or "var." (NC_021815)
+ and $line =~ m{(?<!\bstr|\bvar)[;\.]+});
}
(exists $ann->{$tag}) ? ($ann->{$tag} .= ' '.$data) : ($ann->{$tag} .= $data);
$line = undef;

0 comments on commit 443d399

Please sign in to comment.
Something went wrong with that request. Please try again.