Add cacheMismatches option which greatly speeds up long reads processing #860

Merged
merged 1 commit into from Mar 22, 2017

Conversation

Projects
None yet
4 participants
@cmdcolin
Contributor

cmdcolin commented Mar 3, 2017

The "advent" of long reads is upon us.... and you can try out open datasets like https://github.com/nickloman/massive-nanopore-silliness !

The fasta file NC_000913.fna and bam file gt350kb.split.sorted.bam are pretty good test files in that repo.

If added to jbrowse, an alignments2 track for example will work, but it is slow to scroll around. That seems to be because the mismatches are recalculated every time the read is rendered (and of course, mismatches are numerous over 50kb of read length), so this PR adds the ability to "cache" the mismatches on the feature object and this makes scrolling around faster.

@cmdcolin cmdcolin removed the in progress label Mar 3, 2017

@cmdcolin

This comment has been minimized.

Show comment
Hide comment
@cmdcolin

cmdcolin Mar 22, 2017

Contributor

Planning on merging. I think in the future it was thought that it could be heuristically determined when to add cacheMismatches but maybe good enough as config for now

Contributor

cmdcolin commented Mar 22, 2017

Planning on merging. I think in the future it was thought that it could be heuristically determined when to add cacheMismatches but maybe good enough as config for now

@cmdcolin cmdcolin merged commit 8f56a47 into master Mar 22, 2017

2 checks passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
continuous-integration/travis-ci/push The Travis CI build passed
Details

@cmdcolin cmdcolin deleted the cache_mismatches branch Mar 22, 2017

@billzt

This comment has been minimized.

Show comment
Hide comment
@billzt

billzt May 4, 2017

Contributor

Well, as this has been released in 1.12.3, how to use it in the trackList.json file?

Contributor

billzt commented May 4, 2017

Well, as this has been released in 1.12.3, how to use it in the trackList.json file?

@cmdcolin

This comment has been minimized.

Show comment
Hide comment
@cmdcolin

cmdcolin May 5, 2017

Contributor

you can just enable this config option on a bam track :)

"cacheMismatches": true
Contributor

cmdcolin commented May 5, 2017

you can just enable this config option on a bam track :)

"cacheMismatches": true
@colindaven

This comment has been minimized.

Show comment
Hide comment
@colindaven

colindaven May 9, 2017

Contributor

Thanks for this guys, it looks very useful. I noticed slow performance on (very deep) long read datasets lately.

Contributor

colindaven commented May 9, 2017

Thanks for this guys, it looks very useful. I noticed slow performance on (very deep) long read datasets lately.

@cmdcolin

This comment has been minimized.

Show comment
Hide comment
@cmdcolin

cmdcolin May 10, 2017

Contributor

Good to hear. I think if you have deep seq and long reads you might still see quite a bit of slowness, initial load, etc.

I imagine even more performance improvements can be added though

Contributor

cmdcolin commented May 10, 2017

Good to hear. I think if you have deep seq and long reads you might still see quite a bit of slowness, initial load, etc.

I imagine even more performance improvements can be added though

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment