New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix GFF3 parser to handle trailing FASTA #2037

Closed
benwbooth opened this Issue Aug 30, 2018 · 2 comments

Comments

Projects
None yet
2 participants
@benwbooth
Copy link

benwbooth commented Aug 30, 2018

The GFF3 spec allows trailing FASTA sequences to be put into GFF3 files with a ##FASTA line preceding them. ADAM's GFF3 parser should either stop parsing the GFF3 file if the ##FASTA line is encountered, or possibly also return a NucleotideFragmentRDD with the parsed FASTA included somehow.

@heuermh

This comment has been minimized.

Copy link
Member

heuermh commented Aug 30, 2018

Hello @benwbooth!

Yes, loadFeatures should not choke on a GFF3 file if it contains FASTA sequences. We don't really have any methods in our API that return more than one GenomicDataset types from the same file, so we'd probably want to add GFF3 support to loadSequences. Let me know what you think.

@benwbooth

This comment has been minimized.

Copy link

benwbooth commented Aug 30, 2018

That seems sensible. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment