Skip to content

Enhance UMI counting RAM usage #2

@VincentGardeux

Description

@VincentGardeux

Currently, UMI counting store all UMIs in RAM, for all genes

I wrote a DNA hashing function, and used the more optimized THashSet (from trove package) for storing these. But this takes a bit of time (to Hash and store), and can lead to noticeable RAM usage if the BAM is big enough.

We could check if the BAM is sorted, and in this case, do not store all UMIs for all genes, but only the UMIs of the current genes according to the current studied loci.

Another possibility would be to use tmp files to store all UMI barcodes per gene. I don't know how long this would take to write / read at the end to compile the final count matrix.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions