Skip to content

Commit

Permalink
Cope with malformed EMBL (or GenBank) files where the features are ov…
Browse files Browse the repository at this point in the history
…er-indented (Bug 3062)
  • Loading branch information
peterjc committed Apr 27, 2010
1 parent 6229185 commit 73caa40
Show file tree
Hide file tree
Showing 4 changed files with 63 additions and 1 deletion.
4 changes: 3 additions & 1 deletion Bio/GenBank/Scanner.py
Expand Up @@ -167,7 +167,9 @@ def parse_features(self, skip=False):
line = self.handle.readline()
while line[:self.FEATURE_QUALIFIER_INDENT] == self.FEATURE_QUALIFIER_SPACER \
or line.rstrip() == "" : # cope with blank lines in the midst of a feature
feature_lines.append(line[self.FEATURE_QUALIFIER_INDENT:].rstrip())
#Use strip to remove any harmless trailing white space AND and leading
#white space (e.g. out of spec files with too much intentation)
feature_lines.append(line[self.FEATURE_QUALIFIER_INDENT:].strip())
line = self.handle.readline()
features.append(self.parse_feature(feature_key, feature_lines))
self.line = line
Expand Down
36 changes: 36 additions & 0 deletions Tests/EMBL/A04195_bad_indent.embl
@@ -0,0 +1,36 @@
ID A04195 IMGT/LIGM annotation : by annotators; RNA; SYN; 51 BP.
XX
AC A04195;
XX
DT 15-MAY-1995 (Rel. 2, arrived in LIGM-DB )
DT 20-APR-1999 (Rel. 11, Last updated, Version 4)
XX
DE Artificial Ig lambda-chain mRNA ;
DE RNA; rearranged configuration; Ig-Light-Lambda; regular.
XX
KW antigen receptor; Immunoglobulin superfamily (IgSF);
KW Immunoglobulin (IG); IG-Light; IG-Light-Lambda; cDNA; rearranged.
XX
OS synthetic construct
OC other sequences; artificial sequences.
XX
RN [1]
RP 1-51
RA ;
RT ;
RL Patent number WO8403712-A/4, 27-SEP-1984.
XX
DR EMBL; A04195.
XX
FH Key Location/Qualifiers
FH
FT L-REGION 10..51
FT /partial
FT /organism="Artificial sequence"
FT /product="Ig Lambda"
FT /translation="MQAVMTQESALTTS"
FT INIT-CODON 10..12
XX
SQ Sequence 51 BP; 15 A; 13 C; 10 G; 13 T; 0 other;
gattgatcaa tgcaggctgt tatgactcag gaatctgcac tcaccacatc a 51
//
23 changes: 23 additions & 0 deletions Tests/output/test_SeqIO
Expand Up @@ -1829,6 +1829,29 @@ Testing reading embl format file EMBL/Human_contigs.embl
Checking can write/read as 'genbank' format
Checking can write/read as 'qual' format
Failed: No suitable quality scores found in letter_annotations of SeqRecord (id=AL954800.2).
Testing reading embl format file EMBL/A04195_bad_indent.embl
ID and Name='A04195',
Seq='GATTGATCAATGCAGGCTGTTATGACTCAGGAATCTGCAC...CACATCA', length=51
Checking can write/read as 'fasta' format
Checking can write/read as 'clustal' format
Checking can write/read as 'phylip' format
Checking can write/read as 'stockholm' format
Checking can write/read as 'embl' format
Checking can write/read as 'fastq' format
Failed: No suitable quality scores found in letter_annotations of SeqRecord (id=A04195).
Checking can write/read as 'fastq-illumina' format
Failed: No suitable quality scores found in letter_annotations of SeqRecord (id=A04195).
Checking can write/read as 'fastq-solexa' format
Failed: No suitable quality scores found in letter_annotations of SeqRecord (id=A04195).
Checking can write/read as 'genbank' format
Checking can write/read as 'phd' format
Failed: No suitable quality scores found in letter_annotations of SeqRecord (id=A04195).
Checking can write/read as 'qual' format
Failed: No suitable quality scores found in letter_annotations of SeqRecord (id=A04195).
Checking can write/read as 'sff' format
Failed: Missing SFF flow information
Checking can write/read as 'tab' format
Checking can write/read as 'nexus' format
Testing reading stockholm format file Stockholm/simple.sth
ID and Name='AP001509.1',
Seq='UUAAUCGAGCUCAACACUCUUCGUAUAUCCUC-UCAAUAU...UUAAUGU', length=104
Expand Down
1 change: 1 addition & 0 deletions Tests/test_SeqIO.py
Expand Up @@ -139,6 +139,7 @@ def send_warnings_to_stdout(message, category, filename, lineno,
("embl", False, 'EMBL/AAA03323.embl', 1), # 2008, PA line but no AC
("embl", False, 'EMBL/AE017046.embl', 1), #See also NC_005816.gb
("embl", False, 'EMBL/Human_contigs.embl', 2), #contigs, no sequences
("embl", False, 'EMBL/A04195_bad_indent.embl', 1), # features over indented
("stockholm", True, 'Stockholm/simple.sth', 2),
("stockholm", True, 'Stockholm/funny.sth', 5),
#Following PHYLIP files are currently only used here and in test_AlignIO.py,
Expand Down

0 comments on commit 73caa40

Please sign in to comment.