Skip to content

gc: block-persistence conservatism inflates peak RSS on tight allocation loops #179

@proggeramlug

Description

@proggeramlug

Summary

Perry's GC includes a "block-persistence" pass at crates/perry-runtime/src/gc.rs:1090 that marks ALL objects in an 8 MB arena block live whenever any single object in the block is root-reachable. This was added to protect against untracked arena refs sitting in caller-saved registers that the conservative stack scan can't capture (issues #43 / #44 pattern — dangling-pointer crashes when GC freed still-reachable arena objects).

The conservatism is catastrophic for tight allocation loops that co-locate fresh per-iteration data with pre-existing long-lived state. The JSON.parse case was traced in detail in #149; summary of the mechanism:

  1. Block 0 (and sometimes block 1) contains long-lived data: interned string keys, shape-cache keys_arrays, the blob being parsed, any caller-level root arrays.
  2. The same block also contains early-iteration allocations (the first new Foo() in the loop always lands adjacent to the setup data).
  3. Every subsequent iteration's allocations go into new blocks.
  4. GC marks block 0 live (because of the long-lived data). Block-persist marks EVERY object in block 0 live — including the dead early-iteration objects.
  5. Those dead objects have field values pointing into later blocks (fresh name strings, tag arrays, nested objects).
  6. The persist pass iterates to fixed point, following pointers across blocks — at iter 49 of bench_json_roundtrip, 2 live blocks / 37 truly-reachable objects cascaded to 30 live blocks / 3 M "live" objects over 11 rounds.
  7. gc() becomes a no-op. RSS grows linearly with iteration count.

Measured impact (bench_json_roundtrip at v0.5.190)

Perry Bun ratio
Speed 315 ms 248 ms 1.27×
Peak RSS 318 MB 83 MB 3.83×

The speed gap is tolerable (recently closed from 2.4× to 1.27×, see #149 closeout). The RSS gap is the architectural cost of block-persist + bump-arena vs. Bun's per-object GC.

Not just JSON

Any tight new ClassName() / new Array() / tight parser loop that runs after meaningful setup hits the same pattern. The setup's long-lived roots anchor block 0 live; the first loop iteration allocates adjacent to them; block-persist pulls in the dead iter-0 objects on every GC and cascades from there.

We've seen the contours of this before:

Fix directions

Ranked by scope.

A. Segregated long-lived arena region (medium)

Allocate intrinsically long-lived data — PARSE_KEY_CACHE strings, shape-cache keys_arrays, transition-cache arrays, the string intern table, stringify scratch — into a DEDICATED arena block (or a small fixed-size region) that block-persist can conservatively retain. Everything else goes into the general arena where per-iter blocks genuinely go dead together.

Small scope (no correctness tradeoff with #43 / #44), but needs careful routing: every long-lived allocation path has to opt in.

B. Weaken block-persist with a stricter pre-condition (high)

Block-persist exists because the conservative stack scan can miss handles in caller-saved registers during mid-parse GC triggers. If we can guarantee (by convention + scaffolding in js_json_parse, js_buffer_alloc, etc.) that all intermediate arena refs from a given path are tracked in an explicit root set (like PARSE_ROOTS) before any internal allocation, we can skip block-persist for those paths.

Would need a comprehensive audit of allocation call sites. Higher risk — re-opens the failure mode #43 / #44 closed.

C. Generational GC (largest)

Young generation = throwaway per-GC-cycle region; old generation = current arena model. Most parse output is trivially young-generation (dies before first GC). Matches what Bun/V8 do.

Weeks of work, changes allocator semantics throughout the runtime.

D. Do nothing (documented trade-off)

Ship Perry's current numbers as-is. Speed is already a win over Node. RSS gap is the cost of the arena model's simplicity. Document it, close the door.

What I'd do next

(A) looks like the right first step — "long-lived data gets its own quarantine block" is a conceptually clean, well-bounded change. Implementing that for just PARSE_KEY_CACHE + shape-cache arrays would prove the concept on the JSON workload and inform whether to extend.

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions