Enhance UMI counting RAM usage

**Currently, UMI counting store all UMIs in RAM, for all genes**

I wrote a DNA hashing function, and used the more optimized THashSet (from trove package) for storing these. But this takes a bit of time (to Hash and store), and can lead to noticeable RAM usage if the BAM is big enough.

We could check if the BAM is sorted, and in this case, do not store all UMIs for all genes, but only the UMIs of the current genes according to the current studied loci.

Another possibility would be to use tmp files to store all UMI barcodes per gene. I don't know how long this would take to write / read at the end to compile the final count matrix.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enhance UMI counting RAM usage #2

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Enhance UMI counting RAM usage #2

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions