Reading one chunk loads the entire manifest shard into JS (~70× blowup) — OOMs on large virtual stores

## Summary

Reading a **single chunk** from an array causes icechunk-js to load and parse that array's **entire manifest shard** into JS objects, with very high per-ref overhead (~70× the compressed on-disk size). For stores whose fine-level manifests are large — e.g. a global ~1.19 m virtual GeoZarr backed by COGs — one tile read balloons the heap to multiple GB and OOM-crashes browser tabs (and Node).

`maxManifestCacheSize` does not help, because a *single* manifest already exceeds the memory budget.

## Environment

- `icechunk-js@0.5.0`
- Spec version 2 (virtual chunk refs)
- Reproduced in Node 23 and in Chrome (deck.gl tile rendering)

## Repro

Public store: a global canopy-height model (Meta DINOv3 CHM v2) repackaged as a 7-level multiscale GeoZarr over Icechunk, with virtual chunk refs into COGs (`1x` native ≈ 1.19 m … `64x`):

`https://data.source.coop/tge-labs/meta-chm-v2/zarr/chm.zarr.icechunk`

```js
import { HttpStorage, Repository, IcechunkStore } from "icechunk-js";
import * as zarr from "zarrita";

const url = "https://data.source.coop/tge-labs/meta-chm-v2/zarr/chm.zarr.icechunk";
const repo = await Repository.open({ storage: new HttpStorage(url) });
const session = await repo.checkoutBranch("main", { maxManifestCacheSize: 4 });
const ice = await IcechunkStore.open(session, {
  withRangeCoalescing: zarr.withRangeCoalescing,
  // (a fetchClient that rewrites the s3:// virtual locations to a CORS host is
  //  needed in the browser/Node — see "secondary issue" below)
});
const group = await zarr.open.v3(ice, { kind: "group" });

// Read ONE 512x512 tile at each pyramid level and print heap after GC:
for (const scale of ["64x","32x","16x","8x","4x","2x","1x"]) {
  const a = await zarr.open.v3(group.resolve(`${scale}/chm`), { kind: "array" });
  await zarr.get(a, [zarr.slice(0, 512), zarr.slice(0, 512)]); // any populated region
  global.gc?.();
  console.log(scale, (process.memoryUsage().heapUsed/1e6).toFixed(0)+"MB");
}
```

## Measurements

One tile read per level, `heapUsed` after `global.gc()` (cumulative, `maxManifestCacheSize: 4`):

| Level | ~Resolution | heapUsed after 1 tile |
|-------|-------------|-----------------------|
| 64x   | ~76 m       | 219 MB                |
| 32x   | ~38 m       | 1,050 MB              |
| 16x   | ~19 m       | 1,250 MB              |
| 8x    | ~9.5 m      | 2,020 MB              |
| 4x    | ~4.8 m      | 4,170 MB              |
| 2x    | ~2.4 m      | 7,418 MB              |
| 1x    | ~1.19 m     | 11,313 MB             |

The finest manifest shards are ≤ ~58 MB **compressed on disk**, yet a single one parses to **~4 GB of JS heap** — roughly a 70× blowup.

## Root cause (from a heap snapshot)

The retained graph is:

```
_IcechunkStore.session
  → ReadSession.manifestCache  (LRUCache)
    → Map
      → { nodeId, refs }
        → refs: Array[ ~tens of millions ]   ← full JS objects per chunk ref
          → { inline, chunkId, checksumLastModified, location, offset, length }
```

In a snapshot of a partially-loaded session, this `refs` array and its element objects accounted for ~74% of the heap. The whole manifest shard is materialized on first access to any chunk in the array, and every ref is a heap object with a `location` string etc.

## Suggested directions

- **Lazy / windowed manifest resolution** — resolve only the refs for the requested chunk coordinates rather than materializing the whole shard.
- **Compact in-memory representation** — store refs columnar (typed arrays for offset/length, interned/templated location strings) instead of one JS object per ref. Even without lazy loading this would cut the ~70× blowup dramatically.
- Document that `maxManifestCacheSize` bounds *count*, not bytes — it can't protect against a single oversized manifest.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Reading one chunk loads the entire manifest shard into JS (~70× blowup) — OOMs on large virtual stores #24

Summary

Environment

Repro

Measurements

Root cause (from a heap snapshot)

Suggested directions

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Level	~Resolution	heapUsed after 1 tile
64x	~76 m	219 MB
32x	~38 m	1,050 MB
16x	~19 m	1,250 MB
8x	~9.5 m	2,020 MB
4x	~4.8 m	4,170 MB
2x	~2.4 m	7,418 MB
1x	~1.19 m	11,313 MB

Uh oh!

Reading one chunk loads the entire manifest shard into JS (~70× blowup) — OOMs on large virtual stores #24

Description

Summary

Environment

Repro

Measurements

Root cause (from a heap snapshot)

Suggested directions

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions