Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

GFF.write fails when using a single SeqRecord. #51

Closed
mercutio22 opened this Issue Mar 10, 2012 · 4 comments

Comments

Projects
None yet
3 participants

In [6]: seqTP53

Out[6]: SeqRecord(seq=Seq('TGGTTCAAGTAATTCTCCTGCCTCAGACTCCAGAGTAGCTGGGATTACAGGCGC...CCC', IUPACAmbiguousDNA()), id='NG_017013.1', name='NG_017013', description='Homo sapiens tumor protein p53 (TP53), RefSeqGene on chromosome 17.', dbxrefs=[])

with open('tp53.gff', 'w') as file:
GFF.write(seqTP53, file)

ERROR: An unexpected error occurred while tokenizing input
The following traceback may be corrupted or invalid

The error message is: ('EOF in multi-line statement', (8, 0))

AttributeError Traceback (most recent call last)
/home/merc/gitcode/mirna-django/src/scripts/ in ()
1 with open('tp53.gff', 'w') as file:
----> 2 GFF.write(seqTP53, file)
3

/usr/local/lib/python2.7/dist-packages/bcbio-0.1-py2.7.egg/BCBio/GFF/GFFOutput.pyc in write(recs, out_handle, include_fasta)
183 """
184 writer = GFF3Writer()
--> 185 return writer.write(recs, out_handle, include_fasta)

/usr/local/lib/python2.7/dist-packages/bcbio-0.1-py2.7.egg/BCBio/GFF/GFFOutput.pyc in write(self, recs, out_handle, include_fasta)
74 fasta_recs = []
75 for rec in recs:
---> 76 self._write_rec(rec, out_handle)
77 self._write_annotations(rec.annotations, rec.id, out_handle)
78 for sf in rec.features:

/usr/local/lib/python2.7/dist-packages/bcbio-0.1-py2.7.egg/BCBio/GFF/GFFOutput.pyc in _write_rec(self, rec, out_handle)
99 def _write_rec(self, rec, out_handle):
100 # if we have a SeqRecord, write out optional directive

--> 101 if len(rec.seq) > 0:
102 out_handle.write("##sequence-region %s 1 %s\n" % (rec.id, len(rec.seq)))
103

AttributeError: 'str' object has no attribute 'seq'

I just realized GFF.write expects a <generator object parse at 0x2c70d70>. Would you please make it also accept SeqRecord objects?

It would be useful when fetching and parsing with SeqIO.read. But maybe I shouldn't be doing that in the first place.

class GFF3Writer:

    ...

    def write(self, recs, out_handle, include_fasta=False):
        """Write the provided records to the given handle in GFF3 format.
"""
        id_handler = _IdHandler()
        self._write_header(out_handle)
        fasta_recs = []
        # New code starts here
        try:
            recs = iter(recs)
        except TypeError:
            # A non-iterable is a single record, so put it in a list
            recs = [ recs ]
        # New code ends here
        for rec in recs:
            self._write_rec(rec, out_handle)
            self._write_annotations(rec.annotations, rec.id, out_handle)
            for sf in rec.features:
                sf = self._clean_feature(sf)
                id_handler = self._write_feature(sf, rec.id, out_handle,
                        id_handler)
            if include_fasta and len(rec.seq) > 0:
                fasta_recs.append(rec)
        if len(fasta_recs) > 0:
            self._write_fasta(fasta_recs, out_handle)

@chapmanb chapmanb closed this in 5352c68 Mar 10, 2012

Owner

chapmanb commented Mar 10, 2012

Hugo and Ryan;
Thanks for reporting the problem and for the fix. I checked this in:

5352c68

so if you pull from git it should be working as expected. Thanks again.

Thanks a lot. Really appreciate it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment