Skip to content

Reading one chunk loads the entire manifest shard into JS (~70× blowup) — OOMs on large virtual stores #24

Description

@tylere

Summary

Reading a single chunk from an array causes icechunk-js to load and parse that array's entire manifest shard into JS objects, with very high per-ref overhead (~70× the compressed on-disk size). For stores whose fine-level manifests are large — e.g. a global ~1.19 m virtual GeoZarr backed by COGs — one tile read balloons the heap to multiple GB and OOM-crashes browser tabs (and Node).

maxManifestCacheSize does not help, because a single manifest already exceeds the memory budget.

Environment

  • icechunk-js@0.5.0
  • Spec version 2 (virtual chunk refs)
  • Reproduced in Node 23 and in Chrome (deck.gl tile rendering)

Repro

Public store: a global canopy-height model (Meta DINOv3 CHM v2) repackaged as a 7-level multiscale GeoZarr over Icechunk, with virtual chunk refs into COGs (1x native ≈ 1.19 m … 64x):

https://data.source.coop/tge-labs/meta-chm-v2/zarr/chm.zarr.icechunk

import { HttpStorage, Repository, IcechunkStore } from "icechunk-js";
import * as zarr from "zarrita";

const url = "https://data.source.coop/tge-labs/meta-chm-v2/zarr/chm.zarr.icechunk";
const repo = await Repository.open({ storage: new HttpStorage(url) });
const session = await repo.checkoutBranch("main", { maxManifestCacheSize: 4 });
const ice = await IcechunkStore.open(session, {
  withRangeCoalescing: zarr.withRangeCoalescing,
  // (a fetchClient that rewrites the s3:// virtual locations to a CORS host is
  //  needed in the browser/Node — see "secondary issue" below)
});
const group = await zarr.open.v3(ice, { kind: "group" });

// Read ONE 512x512 tile at each pyramid level and print heap after GC:
for (const scale of ["64x","32x","16x","8x","4x","2x","1x"]) {
  const a = await zarr.open.v3(group.resolve(`${scale}/chm`), { kind: "array" });
  await zarr.get(a, [zarr.slice(0, 512), zarr.slice(0, 512)]); // any populated region
  global.gc?.();
  console.log(scale, (process.memoryUsage().heapUsed/1e6).toFixed(0)+"MB");
}

Measurements

One tile read per level, heapUsed after global.gc() (cumulative, maxManifestCacheSize: 4):

Level ~Resolution heapUsed after 1 tile
64x ~76 m 219 MB
32x ~38 m 1,050 MB
16x ~19 m 1,250 MB
8x ~9.5 m 2,020 MB
4x ~4.8 m 4,170 MB
2x ~2.4 m 7,418 MB
1x ~1.19 m 11,313 MB

The finest manifest shards are ≤ ~58 MB compressed on disk, yet a single one parses to ~4 GB of JS heap — roughly a 70× blowup.

Root cause (from a heap snapshot)

The retained graph is:

_IcechunkStore.session
  → ReadSession.manifestCache  (LRUCache)
    → Map
      → { nodeId, refs }
        → refs: Array[ ~tens of millions ]   ← full JS objects per chunk ref
          → { inline, chunkId, checksumLastModified, location, offset, length }

In a snapshot of a partially-loaded session, this refs array and its element objects accounted for ~74% of the heap. The whole manifest shard is materialized on first access to any chunk in the array, and every ref is a heap object with a location string etc.

Suggested directions

  • Lazy / windowed manifest resolution — resolve only the refs for the requested chunk coordinates rather than materializing the whole shard.
  • Compact in-memory representation — store refs columnar (typed arrays for offset/length, interned/templated location strings) instead of one JS object per ref. Even without lazy loading this would cut the ~70× blowup dramatically.
  • Document that maxManifestCacheSize bounds count, not bytes — it can't protect against a single oversized manifest.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions