Optimize some view as pairs circumstances #9

cmdcolin · 2018-11-29T16:16:57Z

There are some areas with use of viewAsPairs where some excessive CPU usage is generated. The algorithm for determining which redispatch requests are made is sort of a brute force tool, calling getEntriesForRange for each unmatched read, and then it de-duplicates the results

In one area of a long insert size test file, I find this region of code comes up with https://github.com/GMOD/cram-js/blob/master/src/indexedCramFile.js#L94-L113

-541996 chunks before duplication
-32 after deduplication

This issue also applies to bam-js code

cmdcolin · 2018-11-29T16:17:53Z

Note that this large case is uncommon, most of the time there are very few unmatched reads and then we get only about 30-70 chunks before de-duplication and 1-5 after

cmdcolin · 2018-11-29T16:20:26Z

This happens in a sort of "fountain" region

cmdcolin · 2018-11-29T16:41:57Z

Even if I bucket-ize the getEntriesForRange requests, the returned results from getEntriesForRange results in 30,000 slices, about a gigabyte of memory usage and breaking 60mb fetchSizeLimit

rbuels · 2018-11-30T02:53:22Z

the only thing I can think of to do would be to have some kind of hard limit that will throw an exception

cmdcolin · 2018-11-30T17:49:02Z

It seems like the fetchSizeLimit might be a reasonable defense from this. There seems like there are 300,000 reads in a 700 bp region. Potentially the algorithm that I refer to above as brute force could be optimized but I dunno how much we can do in our case, it seems like we're doing our best (it just requires a minute to download results)

cmdcolin · 2018-11-30T17:49:50Z

samtools view out.sorted.bam 1:44388266-44389062|wc -l = 303130 😮

cmdcolin mentioned this issue Nov 29, 2018

Optimize some view as pairs circumstances GMOD/bam-js#12

Closed

cmdcolin closed this as completed Nov 30, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize some view as pairs circumstances #9

Optimize some view as pairs circumstances #9

cmdcolin commented Nov 29, 2018

cmdcolin commented Nov 29, 2018

cmdcolin commented Nov 29, 2018

cmdcolin commented Nov 29, 2018

rbuels commented Nov 30, 2018

cmdcolin commented Nov 30, 2018

cmdcolin commented Nov 30, 2018

Optimize some view as pairs circumstances #9

Optimize some view as pairs circumstances #9

Comments

cmdcolin commented Nov 29, 2018

cmdcolin commented Nov 29, 2018

cmdcolin commented Nov 29, 2018

cmdcolin commented Nov 29, 2018

rbuels commented Nov 30, 2018

cmdcolin commented Nov 30, 2018

cmdcolin commented Nov 30, 2018