Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Genbank LOCUS "circular" topology and molecule type ignored #363

Closed
pgfoster opened this issue Sep 12, 2014 · 12 comments · Fixed by #1005
Closed

Genbank LOCUS "circular" topology and molecule type ignored #363

pgfoster opened this issue Sep 12, 2014 · 12 comments · Fixed by #1005

Comments

@pgfoster
Copy link

I first noticed this when reading in a genbank file, writing it out again, and then looking at the diff. All was well, except that the word "circular" from the LOCUS line had disappeared.

This is what I think is going on. When reading genbank files that have "circular" in the first line, eg
LOCUS NC_014275 15386 bp DNA circular INV 01-JUL-2010
using SeqIO we eventually get to
Bio/GenBank/__init__.py
and the _FeatureConsumer.residue_type(self, type) method.
At that point, type contains the string "DNA circular". The method contains only one line ---
self._seq_type = type.strip()
However, putting it there makes it get lost.
I suspect the info about whether it is circular should be put in self.data, eg in the self.data.annotations dictionary. The reason is because the Bio.GenBank.Scanner.InsdcScanner.parse() method returns consumer.data (ie self.data above), and so does not return consumer._seq_type containing the circular topology info.

@peterjc
Copy link
Member

peterjc commented Sep 12, 2014

This is a duplicate of https://redmine.open-bio.org/issues/2578 which I will now close in favour of this issue since we are (slowly) moving to GitHub issues instead.

@peterjc peterjc changed the title Genbank file "circular" topology ignored Genbank LOCUS "circular" topology and molecule type ignored Jan 18, 2016
kblin added a commit to kblin/biopython that referenced this issue Apr 21, 2016
This fixes issue biopython#363

Signed-off-by: Kai Blin <kblin@biosustain.dtu.dk>
@kblin
Copy link
Contributor

kblin commented Apr 21, 2016

I almost have a patch for it, just got some of the padding wrong.

kblin added a commit to kblin/biopython that referenced this issue Apr 21, 2016
This fixes issue biopython#363

Signed-off-by: Kai Blin <kblin@biosustain.dtu.dk>
@peterjc
Copy link
Member

peterjc commented Apr 21, 2016

Reposting these from the old BugZilla/Redmine tracker regarding how is_circular is/should be stored in BioSQL, see these old threads:

http://lists.open-bio.org/pipermail/biosql-l/2011-July/001774.html
http://lists.open-bio.org/pipermail/bioperl-l/2011-July/035433.html

@kblin
Copy link
Contributor

kblin commented Jul 27, 2016

Wow, eerie. A colleague of mine also just talked to be about this bug. So what would be needed to progress on this? Change the patch to work with a global SeqRecord feature is_circular like BioPerl?

@peterjc
Copy link
Member

peterjc commented Jul 27, 2016

The pragmatic short term solution is just add some code in our BioSQL wrapper to ignore the new annotation. Real solution requires mapping to however BioPerl etc all have stored the circular attribute in the database schema - I filed biosql/biosql#5

@kblin
Copy link
Contributor

kblin commented Jul 27, 2016

Not sure if I want to follow BioPerl here, though. BioPerl basically ignores whatever the chromosome topology is, unless it's circular.

@kblin
Copy link
Contributor

kblin commented Jul 27, 2016

With a binary flag, that's your only choice, of course. But personally I'd prefer if

from Bio import SeqIO
recs = SeqIO.parse(infile, 'genbank')
SeqIO.write(recs, outfile, 'genbank')

would leave me with an outfile that is identical to my infile.

@peterjc
Copy link
Member

peterjc commented Jul 27, 2016

I agree no loss of data on parsing/writing via SeqIO is the goal. If we have to do anything nasty for BioSQL and matching BioPerl, let's isolate that to our BioSQL code.

@peterjc
Copy link
Member

peterjc commented Jul 27, 2016

To be clear, I'm happy with the record.annotations["topology"] approach outlined on #812.

peterjc pushed a commit to peterjc/biopython that referenced this issue Jul 27, 2016
Rebased from pull request biopython#812 to address biopython#363, with
minor changes as discussed on the pull request.
@peterjc
Copy link
Member

peterjc commented Jul 29, 2016

Closed via PR #903, thanks to @kblin in particular, but also everyone else who has pushed at this over the years. Sorry it has taken so long.

@peterjc peterjc closed this as completed Jul 29, 2016
@peterjc
Copy link
Member

peterjc commented Nov 23, 2016

I shouldn't have closed this as it only recorded the topology - I'm about to open a pull request to record the molecule type as well.

@peterjc
Copy link
Member

peterjc commented Nov 24, 2016

As of #1005, the molecule type should be explicitly recorded in the record annotations.

MarkusPiotrowski pushed a commit to MarkusPiotrowski/biopython that referenced this issue Oct 31, 2017
Rebased from pull request biopython#812 to address biopython#363, with
minor changes as discussed on the pull request.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants