Paired read matcher can use enormous memory for large input with many chimeric reads #69

bolosky · 2016-02-17T01:05:15Z

The paired read matcher reads in an input SAM/BAM file in order and emits matched pairs of reads. For a sorted input where there are lots of chimerically mapped reads, it may be a long time between mate pairs showing up, and in the interim SNAP stores the first end in memory (not only uncompressed, but in a format that is actually pretty wasteful of buffer space).

This can use an inordinate amount of memory for large input files with a high chimeric read fraction. We will need to find some way to mitigate this, probably by spilling to disk.

bolosky · 2020-11-24T00:56:23Z

Fixed in 1.0.

bolosky assigned rnpandya Feb 17, 2016

bolosky closed this as completed Nov 24, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Paired read matcher can use enormous memory for large input with many chimeric reads #69

Paired read matcher can use enormous memory for large input with many chimeric reads #69

bolosky commented Feb 17, 2016

bolosky commented Nov 24, 2020

Paired read matcher can use enormous memory for large input with many chimeric reads #69

Paired read matcher can use enormous memory for large input with many chimeric reads #69

Comments

bolosky commented Feb 17, 2016

bolosky commented Nov 24, 2020