Tabix gff #670

Merged
merged 31 commits into from Feb 23, 2016

Conversation

Projects
None yet
4 participants
@cmdcolin
Contributor

cmdcolin commented Dec 14, 2015

This pull request adds the ability to open GFFs that are Tabix indexed (yay)

I was able to test it out on some NCBI data for Apis mellifera, and I could use the indexed fasta and tabix GFF together for speedy loading.

I also adopted the existing GFF3 jasmine tests for GFF3Tabix just to ensure some testing on the new data source!

tabix_gff

Users can open the Tabix GFF by selecting the bgzipped gff and the tbi file, similar to opening the VCF, and the track type will be recommended. Non-indexed GFF is still supported as well!

Also see #265

@enuggetry

This comment has been minimized.

Show comment
Hide comment
@enuggetry

enuggetry Dec 15, 2015

Contributor

Nice work. Thanks!

Contributor

enuggetry commented Dec 15, 2015

Nice work. Thanks!

@cmdcolin

This comment has been minimized.

Show comment
Hide comment
@cmdcolin

cmdcolin Dec 16, 2015

Contributor

Thanks! I hadn't even realized that tabix did gff for awhile, but once I saw that it did on the command line, i got motivated to make this...

Also, I just remembered that it would be good to integrate tabix gff with generate-names.pl (tabix vcf is supported by generate-names) but I wouldn't consider that a showstopper...it's just way more efficient to use the tabix gff versus the unindexed one :)

Contributor

cmdcolin commented Dec 16, 2015

Thanks! I hadn't even realized that tabix did gff for awhile, but once I saw that it did on the command line, i got motivated to make this...

Also, I just remembered that it would be good to integrate tabix gff with generate-names.pl (tabix vcf is supported by generate-names) but I wouldn't consider that a showstopper...it's just way more efficient to use the tabix gff versus the unindexed one :)

@cmdcolin

This comment has been minimized.

Show comment
Hide comment
@cmdcolin

cmdcolin Dec 16, 2015

Contributor

Also, the best way to create the tabix gff is to first sort it and "tidy" it with http://github.com/genometools/genometools

Then bgzip/tabix workflow is standard!

brew install genometools htslib
gt gff3 -sortlines -tidy myinput.gff3 > myoutput.sorted.gff3
bgzip myoutput.sorted.gff3
tabix -p gff myoutput.sorted.gff3.gz
Contributor

cmdcolin commented Dec 16, 2015

Also, the best way to create the tabix gff is to first sort it and "tidy" it with http://github.com/genometools/genometools

Then bgzip/tabix workflow is standard!

brew install genometools htslib
gt gff3 -sortlines -tidy myinput.gff3 > myoutput.sorted.gff3
bgzip myoutput.sorted.gff3
tabix -p gff myoutput.sorted.gff3.gz
@zhjilin

This comment has been minimized.

Show comment
Hide comment
@zhjilin

zhjilin Dec 18, 2015

Contributor

Don't know if this is the right place to raise my question.

I just 'copied' a similar BEDTabix function for 6 column bed format (chr/start/end/name/score/strand), however, I still have one specific question regarding to my own need: I have some features associated with 'name' and I put them as headers in bed.gz (like ##name ). Since the name can appear for tens of thousands times at different locations, so I don't want to add an additional column. Is there a simple way to map features and name when dealing lines loaded for further Store feature preparation ?

BED file I used:
##name1 certain feature
##name2 certain feature
chr1 500000 500200 name1 1000 +
chr1 500050 500250 name2 1000 +

Contributor

zhjilin commented Dec 18, 2015

Don't know if this is the right place to raise my question.

I just 'copied' a similar BEDTabix function for 6 column bed format (chr/start/end/name/score/strand), however, I still have one specific question regarding to my own need: I have some features associated with 'name' and I put them as headers in bed.gz (like ##name ). Since the name can appear for tens of thousands times at different locations, so I don't want to add an additional column. Is there a simple way to map features and name when dealing lines loaded for further Store feature preparation ?

BED file I used:
##name1 certain feature
##name2 certain feature
chr1 500000 500200 name1 1000 +
chr1 500050 500250 name2 1000 +

cmdcolin added some commits Dec 14, 2015

@cmdcolin

This comment has been minimized.

Show comment
Hide comment
@cmdcolin

cmdcolin Feb 23, 2016

Contributor

Added a couple things

  • Made an example track for volvox, should appear the same as the regular GFF3 in-memory adaptor track
  • Re-enabled continuous integration tabix tests (previously it got tripped up by the bgzip files)
  • Use generated IDs for features that don't specify an ID themselves using info from the whole gff line to reduce chance of collision
Contributor

cmdcolin commented Feb 23, 2016

Added a couple things

  • Made an example track for volvox, should appear the same as the regular GFF3 in-memory adaptor track
  • Re-enabled continuous integration tabix tests (previously it got tripped up by the bgzip files)
  • Use generated IDs for features that don't specify an ID themselves using info from the whole gff line to reduce chance of collision

enuggetry added a commit that referenced this pull request Feb 23, 2016

@enuggetry enuggetry merged commit ababa9f into master Feb 23, 2016

1 check passed

continuous-integration/travis-ci/push The Travis CI build passed
Details

@nathandunn nathandunn removed the in progress label Feb 23, 2016

@cmdcolin

This comment has been minimized.

Show comment
Hide comment
@cmdcolin

cmdcolin Feb 23, 2016

Contributor

Woohoo, thanks everyone

Contributor

cmdcolin commented Feb 23, 2016

Woohoo, thanks everyone

@cmdcolin

This comment has been minimized.

Show comment
Hide comment
@cmdcolin

cmdcolin Feb 23, 2016

Contributor

Also, I am not sure what you all think but #672 is very similar. It needs some code fixes but if interested, it can probably be fixed up too. Other users seemed to provide positive feedback for it

Contributor

cmdcolin commented Feb 23, 2016

Also, I am not sure what you all think but #672 is very similar. It needs some code fixes but if interested, it can probably be fixed up too. Other users seemed to provide positive feedback for it

@cmdcolin cmdcolin deleted the tabix_gff branch Feb 29, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment