New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Genbank LOCUS "circular" topology and molecule type ignored #363
Comments
This is a duplicate of https://redmine.open-bio.org/issues/2578 which I will now close in favour of this issue since we are (slowly) moving to GitHub issues instead. |
This fixes issue biopython#363 Signed-off-by: Kai Blin <kblin@biosustain.dtu.dk>
I almost have a patch for it, just got some of the padding wrong. |
This fixes issue biopython#363 Signed-off-by: Kai Blin <kblin@biosustain.dtu.dk>
Reposting these from the old BugZilla/Redmine tracker regarding how http://lists.open-bio.org/pipermail/biosql-l/2011-July/001774.html |
Wow, eerie. A colleague of mine also just talked to be about this bug. So what would be needed to progress on this? Change the patch to work with a global SeqRecord feature |
The pragmatic short term solution is just add some code in our BioSQL wrapper to ignore the new annotation. Real solution requires mapping to however BioPerl etc all have stored the circular attribute in the database schema - I filed biosql/biosql#5 |
Not sure if I want to follow BioPerl here, though. BioPerl basically ignores whatever the chromosome topology is, unless it's circular. |
With a binary flag, that's your only choice, of course. But personally I'd prefer if from Bio import SeqIO
recs = SeqIO.parse(infile, 'genbank')
SeqIO.write(recs, outfile, 'genbank') would leave me with an |
I agree no loss of data on parsing/writing via |
To be clear, I'm happy with the |
Rebased from pull request biopython#812 to address biopython#363, with minor changes as discussed on the pull request.
I shouldn't have closed this as it only recorded the topology - I'm about to open a pull request to record the molecule type as well. |
As of #1005, the molecule type should be explicitly recorded in the record annotations. |
Rebased from pull request biopython#812 to address biopython#363, with minor changes as discussed on the pull request.
I first noticed this when reading in a genbank file, writing it out again, and then looking at the diff. All was well, except that the word "circular" from the LOCUS line had disappeared.
This is what I think is going on. When reading genbank files that have "circular" in the first line, eg
LOCUS NC_014275 15386 bp DNA circular INV 01-JUL-2010
using SeqIO we eventually get to
Bio/GenBank/__init__.py
and the
_FeatureConsumer.residue_type(self, type)
method.At that point,
type
contains the string "DNA circular". The method contains only one line ---self._seq_type = type.strip()
However, putting it there makes it get lost.
I suspect the info about whether it is circular should be put in
self.data
, eg in theself.data.annotations
dictionary. The reason is because theBio.GenBank.Scanner.InsdcScanner.parse()
method returnsconsumer.data
(ieself.data
above), and so does notreturn consumer._seq_type
containing the circular topology info.The text was updated successfully, but these errors were encountered: