New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GFF3 formatted features written as single file must include gff-version pragma #1169

Closed
heuermh opened this Issue Sep 12, 2016 · 9 comments

Comments

Projects
2 participants
@heuermh
Member

heuermh commented Sep 12, 2016

"##gff-version 3.2.1

The GFF version follows the format of 3.#.# in this spec. This directive must be present, must be the topmost line of the file. The version number always begins with 3, the second and third numbers are optional and indicate a major revision and a minor revision respectively."

https://github.com/The-Sequence-Ontology/Specifications/blob/master/gff3.md

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Sep 12, 2016

Member

That's a nice, simple header to write.

TBH though, we could do a mapPartitionsWithIndex where we add that at the start of the first partition and then write it as text at the start of the first shard. Actually, when we write shards, we should probably put that at the start of each shard.

Does this apply for GFF2 as well?

Member

fnothaft commented Sep 12, 2016

That's a nice, simple header to write.

TBH though, we could do a mapPartitionsWithIndex where we add that at the start of the first partition and then write it as text at the start of the first shard. Actually, when we write shards, we should probably put that at the start of each shard.

Does this apply for GFF2 as well?

@heuermh

This comment has been minimized.

Show comment
Hide comment
@heuermh

heuermh Sep 12, 2016

Member

I think it only needs to be present when writing as single file.

I don't believe it applies for GFF2, only GFF3.

Member

heuermh commented Sep 12, 2016

I think it only needs to be present when writing as single file.

I don't believe it applies for GFF2, only GFF3.

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Sep 12, 2016

Member

I think it only needs to be present when writing as single file.

Wouldn't it make sense to add it when writing shards that aren't intended to be merged? Then each shard would be a valid GFF3.

I don't believe it applies for GFF2, only GFF3.

SGTM!

Member

fnothaft commented Sep 12, 2016

I think it only needs to be present when writing as single file.

Wouldn't it make sense to add it when writing shards that aren't intended to be merged? Then each shard would be a valid GFF3.

I don't believe it applies for GFF2, only GFF3.

SGTM!

@heuermh

This comment has been minimized.

Show comment
Hide comment
@heuermh

heuermh Sep 13, 2016

Member

Wouldn't it make sense to add it when writing shards that aren't intended to be merged? Then each shard would be a valid GFF3.

We don't do that for other things (IntervalList features, VCF files, BAM files, etc.), right? I admit to only half understanding how we move headers around.

Member

heuermh commented Sep 13, 2016

Wouldn't it make sense to add it when writing shards that aren't intended to be merged? Then each shard would be a valid GFF3.

We don't do that for other things (IntervalList features, VCF files, BAM files, etc.), right? I admit to only half understanding how we move headers around.

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Sep 13, 2016

Member

Wouldn't it make sense to add it when writing shards that aren't intended to be merged? Then each shard would be a valid GFF3.

We don't do that for other things (IntervalList features, VCF files, BAM files, etc.), right? I admit to only half understanding how we move headers around.

We write the header on each shard for sharded SAM/BAM and VCF/BCF/BGZIP-VCF. We don't do that for IntervalList because support for even writing the header at all is pretty new. We should probably open a ticket and write it for each IntervalList shard.

Member

fnothaft commented Sep 13, 2016

Wouldn't it make sense to add it when writing shards that aren't intended to be merged? Then each shard would be a valid GFF3.

We don't do that for other things (IntervalList features, VCF files, BAM files, etc.), right? I admit to only half understanding how we move headers around.

We write the header on each shard for sharded SAM/BAM and VCF/BCF/BGZIP-VCF. We don't do that for IntervalList because support for even writing the header at all is pretty new. We should probably open a ticket and write it for each IntervalList shard.

@heuermh

This comment has been minimized.

Show comment
Hide comment
@heuermh

heuermh Sep 13, 2016

Member

Ah I see. In that case, yes, we should do that for IntervalList and GFF3 formats.

Member

heuermh commented Sep 13, 2016

Ah I see. In that case, yes, we should do that for IntervalList and GFF3 formats.

@fnothaft fnothaft added this to the 0.23.0 milestone Mar 3, 2017

@heuermh heuermh added this to Triage in Release 0.23.0 Mar 8, 2017

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft May 12, 2017

Member

@heuermh can you take this for 0.23.0?

Member

fnothaft commented May 12, 2017

@heuermh can you take this for 0.23.0?

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft
Member

fnothaft commented May 15, 2017

Ping @heuermh

@heuermh

This comment has been minimized.

Show comment
Hide comment
@heuermh

heuermh May 15, 2017

Member

I'd say push to 0.24.0

Member

heuermh commented May 15, 2017

I'd say push to 0.24.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment