-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
final merge: Offline SBOL3 <-> GenBank Converter; Integration with previous online Converter #186
Conversation
Co-authored-by: Tom Mitchell <tcmitchell@users.noreply.github.com>
add: initial simple conversion gb/sbol3 with sample files
Co-authored-by: Tom Mitchell <tcmitchell@users.noreply.github.com>
…logging; del: tmp scripts; formatting
add: genbank conversion support for multiple records and features
… legacy converter
I have begun testing with the iGEM distribution, and run into an error. It appears that converted GenBank files are getting a 0-based "RangeStart" value rather than 1-based, which is both incorrect and results in invalid SBOL when the range should start at 1. An example of such a file: This was made by converting GenBank files such as the following: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The 0-based range error needs to be corrected.
This turned out to be an issue with BioPython, as it parsed the start Locations of GenBank files based on 0-indexing instead of 1-indexing. I have created a PR for the fix, where we convert the start value to 1-indexed when converting from GenBank to SBOL3 and 0-indexed for the opposite direction. |
fix: 0-indexed parsing of start locations by BioPython bug
It appears that something changed between biopython 1.79 and 1.81 that is causing the tests to fail. This will need investigation. |
SeqFeature is no longer hashable as of BioPython 1.80. Try an alternate implementation that avoids using SeqFeature instances as keys in a dict. Store the loc_positions on a new property of SeqFeature. Akin to monkey patching.
Fix SeqFeature not hashable error
Hey @jakebeal can this one be merged now? The tests are all passing after fixing the BioPython Hashable change. |
…s be 1) add a non-standard sbol#locationPosition type, and 2) reference to an unrecognized Location from a SequenceFeature. Temporarily address this by reverting to the prior behavior of truncating fuzzy ranges into ranges.
I am working through some errors that come up in its usage with the iGEM workflow. Once all of those are either resolved or sufficiently identified to defer into other tickets, I will merge this. |
Kludge defuzz fuzzy features
…vel. The CustomReferenceProperty and CustomStructuredCommentProperty classes in the direct Genbank conversion were set up as TopLevel classes when they should have been Identified objects. This caused a great deal of problems, since their identities were still those of Identified objects. I have also changed some incorrectly plural properties to singular names, following the standard SBOL convention (e.g., the "accessions" variable is listed as a set of individual "accession" properties) After this commit, I will follow with others that clean up now-unused code and style issues.
…mTopLevel classes into behaving like CustomIdentified
In the direct SBOL3->GenBank converter, there were a number of fields only filled from saved data from GenBank->SBOL conversions Change these to default to taking information from SBOL as well: - 'molecule_type' annotation defaults based on Component.types (SBO material) - 'topology' annotation defaults based on Component.types (SO topology) - 'sequence_version' defaults to 1 - 'label' qualifier for features defaults to name Likewise, do not attempt to use missing GenBank information if it is not present. This affects annotations 'date', 'data_file_division', 'molecule_type', 'organism', 'source', 'topology', 'accessions', and 'sequence_version', and feature qualifier 'label' Other improvements: - tolerate deprecated values for orientations. - search multiple roles in component to find SO role for conversion - workaround patch for tyto URI normalization Also update tests: - Confirm that version>1 round-trips correctly - Add an sbol#inline value for an orientation - Provide diffs when GenBank conversions fail - More comprehensive test of ignoring non-converting SBOL properties - Version should always be set - No auto-generated names in converted feature (displayID is sufficient, and generating names just confuses round-tripping) Finally, style was cleaned up in various places
Once PR #203 and PR #206 are merged, this branch will be ready to merge. There is still one major outstanding issue that will need to be addressed, which is the "flattening" of hierarchical Component objects to include the features of their SubComponents. I have filed that as #207, and it can be addressed separately from this PR |
Fill in missing GenBank information from SBOL fields
Identified genbank customs
About the Project:
⇒ Description:
⇒ Project Repository:
https://github.com/SynBioDex/SBOL-utilities
⇒ Project Proposal:
[google-summer-of-code-2022-proposal.pdf](https://s3-us-west-2.amazonaws.com/secure.notion-static.com/bfc495bd-2261-4700-bf4c-82fe3770af02/google-summer-of-code-2022-proposal.pdf)
⇒ Project Profile:
[Google Summer of Code](https://summerofcode.withgoogle.com/programs/2022/projects/IvKQhkk9)
⇒ Project Description:
nrnb/GoogleSummerOfCode#183
⇒ Constituent Pull Requests:
https://github.com/SynBioDex/SBOL-utilities/pulls?q=label%3Asbol3-genbank+
Project Blog:
Credits: