Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GFF3 formatted features written as single file must include gff-version pragma #1169

Closed
heuermh opened this issue Sep 12, 2016 · 9 comments
Closed
Milestone

Comments

@heuermh
Copy link
Member

@heuermh heuermh commented Sep 12, 2016

"##gff-version 3.2.1

The GFF version follows the format of 3.#.# in this spec. This directive must be present, must be the topmost line of the file. The version number always begins with 3, the second and third numbers are optional and indicate a major revision and a minor revision respectively."

https://github.com/The-Sequence-Ontology/Specifications/blob/master/gff3.md

@fnothaft
Copy link
Member

@fnothaft fnothaft commented Sep 12, 2016

That's a nice, simple header to write.

TBH though, we could do a mapPartitionsWithIndex where we add that at the start of the first partition and then write it as text at the start of the first shard. Actually, when we write shards, we should probably put that at the start of each shard.

Does this apply for GFF2 as well?

@heuermh
Copy link
Member Author

@heuermh heuermh commented Sep 12, 2016

I think it only needs to be present when writing as single file.

I don't believe it applies for GFF2, only GFF3.

@fnothaft
Copy link
Member

@fnothaft fnothaft commented Sep 12, 2016

I think it only needs to be present when writing as single file.

Wouldn't it make sense to add it when writing shards that aren't intended to be merged? Then each shard would be a valid GFF3.

I don't believe it applies for GFF2, only GFF3.

SGTM!

@heuermh
Copy link
Member Author

@heuermh heuermh commented Sep 13, 2016

Wouldn't it make sense to add it when writing shards that aren't intended to be merged? Then each shard would be a valid GFF3.

We don't do that for other things (IntervalList features, VCF files, BAM files, etc.), right? I admit to only half understanding how we move headers around.

@fnothaft
Copy link
Member

@fnothaft fnothaft commented Sep 13, 2016

Wouldn't it make sense to add it when writing shards that aren't intended to be merged? Then each shard would be a valid GFF3.

We don't do that for other things (IntervalList features, VCF files, BAM files, etc.), right? I admit to only half understanding how we move headers around.

We write the header on each shard for sharded SAM/BAM and VCF/BCF/BGZIP-VCF. We don't do that for IntervalList because support for even writing the header at all is pretty new. We should probably open a ticket and write it for each IntervalList shard.

@heuermh
Copy link
Member Author

@heuermh heuermh commented Sep 13, 2016

Ah I see. In that case, yes, we should do that for IntervalList and GFF3 formats.

@fnothaft fnothaft added this to the 0.23.0 milestone Mar 3, 2017
@heuermh heuermh added this to Triage in Release 0.23.0 Mar 8, 2017
@fnothaft
Copy link
Member

@fnothaft fnothaft commented May 12, 2017

@heuermh can you take this for 0.23.0?

@fnothaft
Copy link
Member

@fnothaft fnothaft commented May 15, 2017

Ping @heuermh

@heuermh
Copy link
Member Author

@heuermh heuermh commented May 15, 2017

I'd say push to 0.24.0

fnothaft added a commit to fnothaft/adam that referenced this issue May 15, 2017
fnothaft added a commit to fnothaft/adam that referenced this issue May 15, 2017
fnothaft added a commit to fnothaft/adam that referenced this issue May 15, 2017
heuermh added a commit that referenced this issue May 16, 2017
@fnothaft fnothaft moved this from Triage to Completed in Release 0.23.0 May 24, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
2 participants
You can’t perform that action at this time.