Skip to content

OOM in generateMatrix: dense float64 matrix wastes 8x memory on high-coverage data (e.g. 10x multiome) #28

@chlee-tabin

Description

@chlee-tabin

Summary

generateMatrix in AMULET.py allocates a dense cell × peak indicator matrix as np.zeros((n_peaks, n_cells)) (line 47), which defaults to float64 (8 bytes/cell). The matrix only ever stores 0/1 binary peak-in-cell indicators, so dtype=np.int8 is sufficient — that's an 8× memory reduction.

The same np.zeros call is reached again later in main flow for the repfiltered_matrix. Both matrices are built per AMULET run.

Why this matters

For libraries with high per-cell coverage and 25,000+ cells, the dense float64 matrix is the binding constraint on AMULET's memory footprint. The union peak set scales with cell count and per-cell fragment depth.

Concretely, on the duck (Anas platyrhynchos) snATAC multiome dataset I am working with (12 cellranger-arc 2.1.0 libraries, 4 K – 35 K cells/lib, median ~25,000 unique HQ fragments/cell):

Library n_cells Default (float64) With dtype=np.int8
ARC08 29,523 OOM at peak >240 GB 34 GB peak
PILOTARC4A 34,622 OOM at peak >246 GB 20 GB peak

(SLURM MaxRSS from sacct; full-cohort array on an HMS-O2 short partition node, 257 GB max cap.)

The default-dtype version is unrunnable on standard cluster nodes for these libraries — there is nowhere to bump memory to. The int8 version completes in ~1 hour.

Why some users don't hit it

Sci-ATAC-seq3 and similar low-coverage assays (~4 K reads/cell, e.g. Qiu et al. 2026, bioRxiv 10.64898/2026.04.07.717039, 4.3 M-cell mouse atlas) do not hit it — the per-cell fragment count is too low to make the union-peak set large enough to OOM. The bug specifically bites:

  • 10x Chromium scATAC / multiome (~25,000+ reads/cell)
  • ≥25 K cells/library
  • Standard 256 GB cluster nodes

Proposed fix (1-line change)

-    matrix = np.zeros((len(unionoverlaps), len(cellids)))
+    matrix = np.zeros((len(unionoverlaps), len(cellids)), dtype=np.int8)

The matrix only ever has matrix[oi, celliddict[cellid]] = 1 writes. All downstream operations (np.sum(matrix, axis=1) in inferRepeats, etc.) work unchanged on int8 — the sum returns int64 by default. Zero algorithmic change.

I will open a PR with this patch shortly. Verified end-to-end: doublet rates produced from float64 vs int8 matrices are identical (3.42–7.76% range across 12 libraries, q < 0.01).

Even better long-term would be scipy.sparse.csr_matrix (the indicator is >99 % zeros for typical scATAC data), but that's a bigger refactor — the int8 dtype change is a strictly-necessary first step that unblocks AMULET on modern high-coverage 10x data without changing any other code.

This issue blocks 10x multiome users entirely. Happy to help iterate on the PR.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions