OOM in generateMatrix: dense float64 matrix wastes 8x memory on high-coverage data (e.g. 10x multiome)

## Summary

`generateMatrix` in `AMULET.py` allocates a dense cell × peak indicator matrix as `np.zeros((n_peaks, n_cells))` (line 47), which defaults to **float64** (8 bytes/cell). The matrix only ever stores 0/1 binary peak-in-cell indicators, so **`dtype=np.int8`** is sufficient — that's an 8× memory reduction.

The same `np.zeros` call is reached again later in main flow for the `repfiltered_matrix`. Both matrices are built per AMULET run.

## Why this matters

For libraries with high per-cell coverage and 25,000+ cells, the dense float64 matrix is the binding constraint on AMULET's memory footprint. The union peak set scales with cell count and per-cell fragment depth.

Concretely, on the duck (Anas platyrhynchos) snATAC multiome dataset I am working with (12 cellranger-arc 2.1.0 libraries, 4 K – 35 K cells/lib, median ~25,000 unique HQ fragments/cell):

| Library | n_cells | Default (float64) | With dtype=np.int8 |
|---|---|---|---|
| ARC08 | 29,523 | **OOM** at peak >240 GB | 34 GB peak |
| PILOTARC4A | 34,622 | **OOM** at peak >246 GB | 20 GB peak |

(SLURM `MaxRSS` from sacct; full-cohort array on an HMS-O2 `short` partition node, 257 GB max cap.)

The default-dtype version is **unrunnable** on standard cluster nodes for these libraries — there is nowhere to bump memory to. The int8 version completes in ~1 hour.

## Why some users don't hit it

Sci-ATAC-seq3 and similar low-coverage assays (~4 K reads/cell, e.g. Qiu et al. 2026, [bioRxiv 10.64898/2026.04.07.717039](https://www.biorxiv.org/content/10.64898/2026.04.07.717039v1), 4.3 M-cell mouse atlas) do not hit it — the per-cell fragment count is too low to make the union-peak set large enough to OOM. The bug specifically bites:

- 10x Chromium scATAC / multiome (~25,000+ reads/cell)
- ≥25 K cells/library
- Standard 256 GB cluster nodes

## Proposed fix (1-line change)

```diff
-    matrix = np.zeros((len(unionoverlaps), len(cellids)))
+    matrix = np.zeros((len(unionoverlaps), len(cellids)), dtype=np.int8)
```

The matrix only ever has `matrix[oi, celliddict[cellid]] = 1` writes. All downstream operations (`np.sum(matrix, axis=1)` in `inferRepeats`, etc.) work unchanged on int8 — the sum returns int64 by default. Zero algorithmic change.

I will open a PR with this patch shortly. Verified end-to-end: doublet rates produced from float64 vs int8 matrices are identical (3.42–7.76% range across 12 libraries, q < 0.01).

Even better long-term would be `scipy.sparse.csr_matrix` (the indicator is >99 % zeros for typical scATAC data), but that's a bigger refactor — the int8 dtype change is a strictly-necessary first step that unblocks AMULET on modern high-coverage 10x data without changing any other code.

This issue blocks 10x multiome users entirely. Happy to help iterate on the PR.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OOM in generateMatrix: dense float64 matrix wastes 8x memory on high-coverage data (e.g. 10x multiome) #28

Summary

Why this matters

Why some users don't hit it

Proposed fix (1-line change)

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Library	n_cells	Default (float64)	With dtype=np.int8
ARC08	29,523	OOM at peak >240 GB	34 GB peak
PILOTARC4A	34,622	OOM at peak >246 GB	20 GB peak

OOM in generateMatrix: dense float64 matrix wastes 8x memory on high-coverage data (e.g. 10x multiome) #28

Description

Summary

Why this matters

Why some users don't hit it

Proposed fix (1-line change)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions