Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explicitly record EMBL/GenBank molecule type #1005

Merged
merged 1 commit into from Nov 24, 2016

Conversation

peterjc
Copy link
Member

@peterjc peterjc commented Nov 23, 2016

This builds on my recent commits to improve the EMBL/GenBank topology parsing (see fd7b171 and the subsequent commits which refactored to cover some corner cases, and add more tests), and will really close #363

Note this means record.annotations["topology"] and record.annotations["molecule_type"] together essentially replace the legacy handling of the composite residue type dating from early GenBank/EMBL files where this was one field.

@peterjc
Copy link
Member Author

peterjc commented Nov 23, 2016

TravisCI is happy except for pypy3 which is failing in general.

@peterjc
Copy link
Member Author

peterjc commented Nov 23, 2016

@kblin could you cast your eyes over this please?

@codecov-io
Copy link

codecov-io commented Nov 23, 2016

Current coverage is 80.39% (diff: 78.78%)

Merging #1005 into master will increase coverage by <.01%

@@             master      #1005   diff @@
==========================================
  Files           319        319          
  Lines         48762      48792    +30   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
+ Hits          39199      39224    +25   
- Misses         9563       9568     +5   
  Partials          0          0          

Powered by Codecov. Last update eec4a79...169cf73

@kblin
Copy link
Contributor

kblin commented Nov 28, 2016

Sorry, missed that notification again. I really need a better setup where my GitHub notifications go in my mail client, as GitHub is unable to send them to the right address.

The only worry I have is that I do see quite a bunch of GenBank files that have neither "linear" nor "circular", but the way I read the check in the GenBank parser, that would trigger the ParserFailureError("Unexpected topology %r should be linear or circular" % topology) error.

@peterjc
Copy link
Member Author

peterjc commented Nov 28, 2016

Sorry, I could have waited but ended up working on a bunch of related EMBL/GenBank bits at the end of last week and wanted to merge it sooner rather than later.

[Update - reworded after double checking the code:]
Regarding the error for values other than linear and circular, that does also allow blank - which would mean nothing in the annotations dictionary at all. See the line above, if topology: and the pattern in the scanner of stripping white space before calling the consumer method.

Perhaps I need to reword the exception slightly?

@kblin
Copy link
Contributor

kblin commented Dec 1, 2016

Nah, it's fine. I just looked over the code too quickly. Looking good. Belated 👍 from me :)

@peterjc
Copy link
Member Author

peterjc commented Dec 1, 2016

Thanks :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Genbank LOCUS "circular" topology and molecule type ignored
3 participants