-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize some view as pairs circumstances #9
Comments
Note that this large case is uncommon, most of the time there are very few unmatched reads and then we get only about 30-70 chunks before de-duplication and 1-5 after |
Even if I bucket-ize the getEntriesForRange requests, the returned results from getEntriesForRange results in 30,000 slices, about a gigabyte of memory usage and breaking 60mb fetchSizeLimit |
the only thing I can think of to do would be to have some kind of hard limit that will throw an exception |
It seems like the fetchSizeLimit might be a reasonable defense from this. There seems like there are 300,000 reads in a 700 bp region. Potentially the algorithm that I refer to above as brute force could be optimized but I dunno how much we can do in our case, it seems like we're doing our best (it just requires a minute to download results) |
|
There are some areas with use of viewAsPairs where some excessive CPU usage is generated. The algorithm for determining which redispatch requests are made is sort of a brute force tool, calling getEntriesForRange for each unmatched read, and then it de-duplicates the results
In one area of a long insert size test file, I find this region of code comes up with https://github.com/GMOD/cram-js/blob/master/src/indexedCramFile.js#L94-L113
-541996 chunks before duplication
-32 after deduplication
This issue also applies to bam-js code
The text was updated successfully, but these errors were encountered: