Summary
Reading a single chunk from an array causes icechunk-js to load and parse that array's entire manifest shard into JS objects, with very high per-ref overhead (~70× the compressed on-disk size). For stores whose fine-level manifests are large — e.g. a global ~1.19 m virtual GeoZarr backed by COGs — one tile read balloons the heap to multiple GB and OOM-crashes browser tabs (and Node).
maxManifestCacheSize does not help, because a single manifest already exceeds the memory budget.
Environment
icechunk-js@0.5.0
- Spec version 2 (virtual chunk refs)
- Reproduced in Node 23 and in Chrome (deck.gl tile rendering)
Repro
Public store: a global canopy-height model (Meta DINOv3 CHM v2) repackaged as a 7-level multiscale GeoZarr over Icechunk, with virtual chunk refs into COGs (1x native ≈ 1.19 m … 64x):
https://data.source.coop/tge-labs/meta-chm-v2/zarr/chm.zarr.icechunk
import { HttpStorage, Repository, IcechunkStore } from "icechunk-js";
import * as zarr from "zarrita";
const url = "https://data.source.coop/tge-labs/meta-chm-v2/zarr/chm.zarr.icechunk";
const repo = await Repository.open({ storage: new HttpStorage(url) });
const session = await repo.checkoutBranch("main", { maxManifestCacheSize: 4 });
const ice = await IcechunkStore.open(session, {
withRangeCoalescing: zarr.withRangeCoalescing,
// (a fetchClient that rewrites the s3:// virtual locations to a CORS host is
// needed in the browser/Node — see "secondary issue" below)
});
const group = await zarr.open.v3(ice, { kind: "group" });
// Read ONE 512x512 tile at each pyramid level and print heap after GC:
for (const scale of ["64x","32x","16x","8x","4x","2x","1x"]) {
const a = await zarr.open.v3(group.resolve(`${scale}/chm`), { kind: "array" });
await zarr.get(a, [zarr.slice(0, 512), zarr.slice(0, 512)]); // any populated region
global.gc?.();
console.log(scale, (process.memoryUsage().heapUsed/1e6).toFixed(0)+"MB");
}
Measurements
One tile read per level, heapUsed after global.gc() (cumulative, maxManifestCacheSize: 4):
| Level |
~Resolution |
heapUsed after 1 tile |
| 64x |
~76 m |
219 MB |
| 32x |
~38 m |
1,050 MB |
| 16x |
~19 m |
1,250 MB |
| 8x |
~9.5 m |
2,020 MB |
| 4x |
~4.8 m |
4,170 MB |
| 2x |
~2.4 m |
7,418 MB |
| 1x |
~1.19 m |
11,313 MB |
The finest manifest shards are ≤ ~58 MB compressed on disk, yet a single one parses to ~4 GB of JS heap — roughly a 70× blowup.
Root cause (from a heap snapshot)
The retained graph is:
_IcechunkStore.session
→ ReadSession.manifestCache (LRUCache)
→ Map
→ { nodeId, refs }
→ refs: Array[ ~tens of millions ] ← full JS objects per chunk ref
→ { inline, chunkId, checksumLastModified, location, offset, length }
In a snapshot of a partially-loaded session, this refs array and its element objects accounted for ~74% of the heap. The whole manifest shard is materialized on first access to any chunk in the array, and every ref is a heap object with a location string etc.
Suggested directions
- Lazy / windowed manifest resolution — resolve only the refs for the requested chunk coordinates rather than materializing the whole shard.
- Compact in-memory representation — store refs columnar (typed arrays for offset/length, interned/templated location strings) instead of one JS object per ref. Even without lazy loading this would cut the ~70× blowup dramatically.
- Document that
maxManifestCacheSize bounds count, not bytes — it can't protect against a single oversized manifest.
Summary
Reading a single chunk from an array causes icechunk-js to load and parse that array's entire manifest shard into JS objects, with very high per-ref overhead (~70× the compressed on-disk size). For stores whose fine-level manifests are large — e.g. a global ~1.19 m virtual GeoZarr backed by COGs — one tile read balloons the heap to multiple GB and OOM-crashes browser tabs (and Node).
maxManifestCacheSizedoes not help, because a single manifest already exceeds the memory budget.Environment
icechunk-js@0.5.0Repro
Public store: a global canopy-height model (Meta DINOv3 CHM v2) repackaged as a 7-level multiscale GeoZarr over Icechunk, with virtual chunk refs into COGs (
1xnative ≈ 1.19 m …64x):https://data.source.coop/tge-labs/meta-chm-v2/zarr/chm.zarr.icechunkMeasurements
One tile read per level,
heapUsedafterglobal.gc()(cumulative,maxManifestCacheSize: 4):The finest manifest shards are ≤ ~58 MB compressed on disk, yet a single one parses to ~4 GB of JS heap — roughly a 70× blowup.
Root cause (from a heap snapshot)
The retained graph is:
In a snapshot of a partially-loaded session, this
refsarray and its element objects accounted for ~74% of the heap. The whole manifest shard is materialized on first access to any chunk in the array, and every ref is a heap object with alocationstring etc.Suggested directions
maxManifestCacheSizebounds count, not bytes — it can't protect against a single oversized manifest.