Support for indexed fasta files as reference sequences #495

Closed
wants to merge 3 commits into
from

Conversation

Projects
None yet
3 participants
@bpow
Contributor

bpow commented Jul 16, 2014

Addressing issue #317

This could be implemented may different ways, I just made it so that the fasta file and index (<fasta_file>.fai) end up in the seq directory, and the fai is used to determine the offsets for loading the sequence.

There is no explicit chunk size, but this uses XHRBlob which downloads in 64kb chunks.

There is no support for compressed fasta files. This would be doable-- it would require additional preparation-- specifically bgzipping the fasta file and making a file which contains offsets to bgzf blocks.

bpow added some commits Jul 8, 2014

Remove seqChunkSize from JBrowse/View/Track/Sequence.js
Not every SeqFeature store for refSeqs will have a
seqChunkSize (e.g. IndexedFasta), so Sequence doesn't
need to know about it (seqChunkSize is an implementation
detail of SequenceChunks)
@cmdcolin

This comment has been minimized.

Show comment
Hide comment
@cmdcolin

cmdcolin Jul 18, 2014

Contributor

Hi Brad, thanks for the contribution! I will try and review this soon. I have a couple questions but i'll try and do some research before bugging you about it.

Thanks again!

Contributor

cmdcolin commented Jul 18, 2014

Hi Brad, thanks for the contribution! I will try and review this soon. I have a couple questions but i'll try and do some research before bugging you about it.

Thanks again!

@cmdcolin

This comment has been minimized.

Show comment
Hide comment
@cmdcolin

cmdcolin Jul 18, 2014

Contributor

Eh, while I was thinking about it, it looks like the query.seqChunkSize was removed. Is this unused or not compatible or something else?

Contributor

cmdcolin commented Jul 18, 2014

Eh, while I was thinking about it, it looks like the query.seqChunkSize was removed. Is this unused or not compatible or something else?

@bpow

This comment has been minimized.

Show comment
Hide comment
@bpow

bpow Jul 18, 2014

Contributor

That particular part of the git branch may not be necessary, and I probably
should have commented about it in the pull request. I'm not sure where else
it is used, since the query itself shouldn't specify chunk size (if I
understand it)-- that is an implementation detail of SequenceChunks, and
not more generally of other things that might extend Sequence. It wasn't
clear to me how a query might set a different chunk size if the chunk size
was set when the refSeqs were prepared on the server side.

So removing query.seqChunkSize isn't necessary for using IndexedFasta as
the refSeq, but the reasoning is that query.seqChunkSize isn't needed by
the more general classes of things that might implement Sequence. I could
easily be missing something, though, as I am new to this code base.

bp

On Thu, Jul 17, 2014 at 9:18 PM, Colin notifications@github.com wrote:

Eh, while I was thinking about it, it looks like the query.seqChunkSize
was removed. Is this unused or not compatible or something else?


Reply to this email directly or view it on GitHub
#495 (comment).

Contributor

bpow commented Jul 18, 2014

That particular part of the git branch may not be necessary, and I probably
should have commented about it in the pull request. I'm not sure where else
it is used, since the query itself shouldn't specify chunk size (if I
understand it)-- that is an implementation detail of SequenceChunks, and
not more generally of other things that might extend Sequence. It wasn't
clear to me how a query might set a different chunk size if the chunk size
was set when the refSeqs were prepared on the server side.

So removing query.seqChunkSize isn't necessary for using IndexedFasta as
the refSeq, but the reasoning is that query.seqChunkSize isn't needed by
the more general classes of things that might implement Sequence. I could
easily be missing something, though, as I am new to this code base.

bp

On Thu, Jul 17, 2014 at 9:18 PM, Colin notifications@github.com wrote:

Eh, while I was thinking about it, it looks like the query.seqChunkSize
was removed. Is this unused or not compatible or something else?


Reply to this email directly or view it on GitHub
#495 (comment).

@cmdcolin

This comment has been minimized.

Show comment
Hide comment
@cmdcolin

cmdcolin Jul 18, 2014

Contributor

Yes I wasn't sure myself if it is unused or not. Thanks for checking that
out though!

On Thu, Jul 17, 2014 at 8:47 PM, Bradford Powell notifications@github.com
wrote:

That particular part of the git branch may not be necessary, and I
probably
should have commented about it in the pull request. I'm not sure where
else
it is used, since the query itself shouldn't specify chunk size (if I
understand it)-- that is an implementation detail of SequenceChunks, and
not more generally of other things that might extend Sequence. It wasn't
clear to me how a query might set a different chunk size if the chunk size
was set when the refSeqs were prepared on the server side.

So removing query.seqChunkSize isn't necessary for using IndexedFasta as
the refSeq, but the reasoning is that query.seqChunkSize isn't needed by
the more general classes of things that might implement Sequence. I could
easily be missing something, though, as I am new to this code base.

bp

On Thu, Jul 17, 2014 at 9:18 PM, Colin notifications@github.com wrote:

Eh, while I was thinking about it, it looks like the query.seqChunkSize
was removed. Is this unused or not compatible or something else?


Reply to this email directly or view it on GitHub
#495 (comment).


Reply to this email directly or view it on GitHub
#495 (comment).

Contributor

cmdcolin commented Jul 18, 2014

Yes I wasn't sure myself if it is unused or not. Thanks for checking that
out though!

On Thu, Jul 17, 2014 at 8:47 PM, Bradford Powell notifications@github.com
wrote:

That particular part of the git branch may not be necessary, and I
probably
should have commented about it in the pull request. I'm not sure where
else
it is used, since the query itself shouldn't specify chunk size (if I
understand it)-- that is an implementation detail of SequenceChunks, and
not more generally of other things that might extend Sequence. It wasn't
clear to me how a query might set a different chunk size if the chunk size
was set when the refSeqs were prepared on the server side.

So removing query.seqChunkSize isn't necessary for using IndexedFasta as
the refSeq, but the reasoning is that query.seqChunkSize isn't needed by
the more general classes of things that might implement Sequence. I could
easily be missing something, though, as I am new to this code base.

bp

On Thu, Jul 17, 2014 at 9:18 PM, Colin notifications@github.com wrote:

Eh, while I was thinking about it, it looks like the query.seqChunkSize
was removed. Is this unused or not compatible or something else?


Reply to this email directly or view it on GitHub
#495 (comment).


Reply to this email directly or view it on GitHub
#495 (comment).

@cmdcolin

This comment has been minimized.

Show comment
Hide comment
@cmdcolin

cmdcolin Dec 2, 2015

Contributor

This was actually first merged into #647 and then merged into master. Big thanks for the help @bpow!

Contributor

cmdcolin commented Dec 2, 2015

This was actually first merged into #647 and then merged into master. Big thanks for the help @bpow!

@cmdcolin cmdcolin closed this Dec 3, 2015

@keiranmraine

This comment has been minimized.

Show comment
Hide comment
@keiranmraine

keiranmraine May 14, 2016

Contributor

How likely is it that compressed fasta will be supported? The latter samtools packages provide the tools to compress the reference and this could be presented to prepare-refseqs.pl as the file instead of the uncompressed.

bgzip genome.fa
samtools faidx genome.fa

Should #317 still be open?

Contributor

keiranmraine commented May 14, 2016

How likely is it that compressed fasta will be supported? The latter samtools packages provide the tools to compress the reference and this could be presented to prepare-refseqs.pl as the file instead of the uncompressed.

bgzip genome.fa
samtools faidx genome.fa

Should #317 still be open?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment