Summary
generateMatrix in AMULET.py allocates a dense cell × peak indicator matrix as np.zeros((n_peaks, n_cells)) (line 47), which defaults to float64 (8 bytes/cell). The matrix only ever stores 0/1 binary peak-in-cell indicators, so dtype=np.int8 is sufficient — that's an 8× memory reduction.
The same np.zeros call is reached again later in main flow for the repfiltered_matrix. Both matrices are built per AMULET run.
Why this matters
For libraries with high per-cell coverage and 25,000+ cells, the dense float64 matrix is the binding constraint on AMULET's memory footprint. The union peak set scales with cell count and per-cell fragment depth.
Concretely, on the duck (Anas platyrhynchos) snATAC multiome dataset I am working with (12 cellranger-arc 2.1.0 libraries, 4 K – 35 K cells/lib, median ~25,000 unique HQ fragments/cell):
| Library |
n_cells |
Default (float64) |
With dtype=np.int8 |
| ARC08 |
29,523 |
OOM at peak >240 GB |
34 GB peak |
| PILOTARC4A |
34,622 |
OOM at peak >246 GB |
20 GB peak |
(SLURM MaxRSS from sacct; full-cohort array on an HMS-O2 short partition node, 257 GB max cap.)
The default-dtype version is unrunnable on standard cluster nodes for these libraries — there is nowhere to bump memory to. The int8 version completes in ~1 hour.
Why some users don't hit it
Sci-ATAC-seq3 and similar low-coverage assays (~4 K reads/cell, e.g. Qiu et al. 2026, bioRxiv 10.64898/2026.04.07.717039, 4.3 M-cell mouse atlas) do not hit it — the per-cell fragment count is too low to make the union-peak set large enough to OOM. The bug specifically bites:
- 10x Chromium scATAC / multiome (~25,000+ reads/cell)
- ≥25 K cells/library
- Standard 256 GB cluster nodes
Proposed fix (1-line change)
- matrix = np.zeros((len(unionoverlaps), len(cellids)))
+ matrix = np.zeros((len(unionoverlaps), len(cellids)), dtype=np.int8)
The matrix only ever has matrix[oi, celliddict[cellid]] = 1 writes. All downstream operations (np.sum(matrix, axis=1) in inferRepeats, etc.) work unchanged on int8 — the sum returns int64 by default. Zero algorithmic change.
I will open a PR with this patch shortly. Verified end-to-end: doublet rates produced from float64 vs int8 matrices are identical (3.42–7.76% range across 12 libraries, q < 0.01).
Even better long-term would be scipy.sparse.csr_matrix (the indicator is >99 % zeros for typical scATAC data), but that's a bigger refactor — the int8 dtype change is a strictly-necessary first step that unblocks AMULET on modern high-coverage 10x data without changing any other code.
This issue blocks 10x multiome users entirely. Happy to help iterate on the PR.
Summary
generateMatrixinAMULET.pyallocates a dense cell × peak indicator matrix asnp.zeros((n_peaks, n_cells))(line 47), which defaults to float64 (8 bytes/cell). The matrix only ever stores 0/1 binary peak-in-cell indicators, sodtype=np.int8is sufficient — that's an 8× memory reduction.The same
np.zeroscall is reached again later in main flow for therepfiltered_matrix. Both matrices are built per AMULET run.Why this matters
For libraries with high per-cell coverage and 25,000+ cells, the dense float64 matrix is the binding constraint on AMULET's memory footprint. The union peak set scales with cell count and per-cell fragment depth.
Concretely, on the duck (Anas platyrhynchos) snATAC multiome dataset I am working with (12 cellranger-arc 2.1.0 libraries, 4 K – 35 K cells/lib, median ~25,000 unique HQ fragments/cell):
(SLURM
MaxRSSfrom sacct; full-cohort array on an HMS-O2shortpartition node, 257 GB max cap.)The default-dtype version is unrunnable on standard cluster nodes for these libraries — there is nowhere to bump memory to. The int8 version completes in ~1 hour.
Why some users don't hit it
Sci-ATAC-seq3 and similar low-coverage assays (~4 K reads/cell, e.g. Qiu et al. 2026, bioRxiv 10.64898/2026.04.07.717039, 4.3 M-cell mouse atlas) do not hit it — the per-cell fragment count is too low to make the union-peak set large enough to OOM. The bug specifically bites:
Proposed fix (1-line change)
The matrix only ever has
matrix[oi, celliddict[cellid]] = 1writes. All downstream operations (np.sum(matrix, axis=1)ininferRepeats, etc.) work unchanged on int8 — the sum returns int64 by default. Zero algorithmic change.I will open a PR with this patch shortly. Verified end-to-end: doublet rates produced from float64 vs int8 matrices are identical (3.42–7.76% range across 12 libraries, q < 0.01).
Even better long-term would be
scipy.sparse.csr_matrix(the indicator is >99 % zeros for typical scATAC data), but that's a bigger refactor — the int8 dtype change is a strictly-necessary first step that unblocks AMULET on modern high-coverage 10x data without changing any other code.This issue blocks 10x multiome users entirely. Happy to help iterate on the PR.