Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RefSeqGPFFParser update #336

Closed
wants to merge 3 commits into from
Closed

Conversation

tgrego
Copy link
Contributor

@tgrego tgrego commented Nov 14, 2018

Description

Update to the RefSeqGPFFParserr as part of the efforts of the xref sprint.
See ENSCORESW-2898.
Genbank parser from ensembl-io is now used to parse the source files instead of a custom parser.
This will require Ensembl/ensembl-io#69 to be merged.
There are a few differences introduced from the original parser. For instance, only refseq ids with the prefixes defined in $refseq_sources are considered. Previously for peptide files all other possible types were considered and treated as RefSeq_peptide.

Testing

No unit tests.
Tested with subset of rat, however related xrefs were absent.
Ongoing testing with full dataset.

@nerdstrike
Copy link
Contributor

Your build requires ensembl-io as a dependency

@tgrego
Copy link
Contributor Author

tgrego commented Nov 20, 2018

Due to performance issues related to the usage of ensembl-io, #338 has been submitted with the refactoring changes without the ensembl-io dependency.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants