Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix GFF3 parser to handle trailing FASTA #2037

Closed
benwbooth opened this issue Aug 30, 2018 · 2 comments
Closed

Fix GFF3 parser to handle trailing FASTA #2037

benwbooth opened this issue Aug 30, 2018 · 2 comments

Comments

@benwbooth
Copy link

@benwbooth benwbooth commented Aug 30, 2018

The GFF3 spec allows trailing FASTA sequences to be put into GFF3 files with a ##FASTA line preceding them. ADAM's GFF3 parser should either stop parsing the GFF3 file if the ##FASTA line is encountered, or possibly also return a NucleotideFragmentRDD with the parsed FASTA included somehow.

@heuermh
Copy link
Member

@heuermh heuermh commented Aug 30, 2018

Hello @benwbooth!

Yes, loadFeatures should not choke on a GFF3 file if it contains FASTA sequences. We don't really have any methods in our API that return more than one GenomicDataset types from the same file, so we'd probably want to add GFF3 support to loadSequences. Let me know what you think.

@benwbooth
Copy link
Author

@benwbooth benwbooth commented Aug 30, 2018

That seems sensible. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
2 participants