Summary
Writing to a dynamic-key property on a plain object (obj[name] = value) scales O(N²) per row with property count on Perry, versus O(N) on V8/JSC. For objects with more than a handful of keys this makes row-shaped data — database rows, HTTP request bodies, JSON arrays-of-objects, anything resembling a spreadsheet — measurably slower than on a JS host, regardless of JIT warm-up.
The root cause is that js_object_set_field_by_name does a linear scan over the object's keys_array on every property write to find-or-append the slot, and there's no per-call-site or per-shape inline cache short-circuiting the scan. V8's hidden-class + IC design makes the same pattern O(1) amortised after the first object of each shape.
Repro
No external deps. Compile with perry compile and run directly — compares against Node on the same hardware. Full source attached below; headline numbers from Node v25 + Perry 0.5.29 on an Apple M-series:
# 10000 rows × 20 columns, same name sequence every row
Node v25: 8.5 ms/iter
Perry 0.5.29: 43.3 ms/iter (5.1× slower)
The gap grows with column count:
Node v25 Perry 0.5.29
cols=5 0.98 ms 3.00 ms
cols=10 1.29 ms 12.40 ms
cols=20 9.08 ms 40.80 ms
cols=40 13.64 ms 175.00 ms
cols=80 24.78 ms 574.80 ms
Per-cell time on Perry grows from 0.06 µs → 0.72 µs as cols go 5 → 80 (12× increase for a 16× width increase → O(N²)). On Node it stays essentially flat — classic hidden-class IC behavior.
Root cause (confirmed by reading the runtime)
js_object_set_field_by_name at crates/perry-runtime/src/object.rs:1921:
let key_count = crate::array::js_array_length(keys) as usize;
for i in 0..key_count { // ← linear scan
let key_val = crate::array::js_array_get(keys, i as u32);
if key_val.is_string() {
let stored_key = key_val.as_string_ptr();
if crate::string::js_string_equals(key, stored_key) != 0 {
// found — update this slot
…
}
}
}
// not found — append
For a 20-property row object the 20 writes do 0+1+2+…+19 = 190 key compares per row. Across 10 000 rows that's 1.9M compares per iteration. No IC, no shape match short-circuit, no pointer-identity fast index — every write pays the full O(N) scan.
The shape_cache_insert / js_object_alloc_with_shape path DOES share keys_arrays across object-literal shapes known at compile time — so const x = { a: 1, b: 2 } is fast — but the dynamic obj[name] = value pattern produced by most real code (ORMs, result builders, config merges, parsers) falls off that path and hits the linear scan.
Repro (standalone, no deps)
// hidden-class-gap.ts
function nowMs(): number {
// eslint-disable-next-line @typescript-eslint/no-explicit-any
const g = globalThis as any;
return g.performance && g.performance.now ? g.performance.now() : Date.now();
}
const NAMES: string[] = [];
for (let i = 0; i < 20; i++) NAMES.push('c' + i);
const NROWS = 10000;
const NCOLS = 20;
function build(): Record<string, unknown>[] {
const out: Record<string, unknown>[] = new Array(NROWS);
for (let i = 0; i < NROWS; i++) {
const o: Record<string, unknown> = {};
for (let j = 0; j < NCOLS; j++) {
o[NAMES[j]] = i * j;
}
out[i] = o;
}
return out;
}
for (let w = 0; w < 3; w++) build(); // warmup
const ITERS = 10;
const t0 = nowMs();
for (let w = 0; w < ITERS; w++) build();
console.log('10000 x 20 build: ' + ((nowMs() - t0) / ITERS).toFixed(2) + ' ms/iter');
Run:
node --import tsx hidden-class-gap.ts
perry compile hidden-class-gap.ts -o /tmp/hcg && /tmp/hcg
The scaling demo (sweeping column count 5 → 80) is at @perry/postgres repo → /tmp/perry-perf-probe/hidden-class-scaling.ts if you want the O(N²) data.
Why it matters
Any API that produces shape-stable rows of non-trivial width hits this:
@perry/postgres row-decode — bulk decode of 10k rows × 20 cols is ~800 ms on Perry vs ~35 ms on Node (~22× slower). Most of that gap is this scan.
- JSON.parse of an array of uniform objects with more than a handful of fields.
- HTTP/fastify body deserialization and request handler
reply.send({ ... }) paths with wide payloads.
- Anything using
Object.assign({}, src, patch) where src is more than a few keys.
Users don't have a workaround either — the slow path is triggered by plain idiomatic code. Rewriting hot paths to use new Function-generated classes (postgres.js's trick) works on Node / Bun but isn't available on Perry-native.
Fix options, rough ordering by complexity
- Pointer-identity fast path in the scan. When
key is the exact same StringHeader* as a stored_key, skip js_string_equals's validation + byte-compare. For row-shape data this triggers every time — the name strings are interned by the StringPool, so consecutive rows see pointer-equal key arguments. Estimated: 5-10% on the microbench, one-function change.
- Per-shape keys_array cache keyed on the sequence of name pointers at the call site. When
set_field_by_name has seen the same name-pointer sequence before on some object, reuse the shape's slot map (name → index) and skip the scan entirely. Close cousin to V8's transition-tree, doable as a thread-local HashMap<(obj_shape_id, key_ptr), slot_idx>. Estimated: recover the O(N²) → O(N) scaling; closes most of the gap.
- Proper hidden-class ICs, emitted per call site by codegen — the full-fat V8 approach. Biggest win, biggest surgery. Probably the right long-term home; the approach above is the incremental step that buys most of it.
I'd be happy to take a swing at (1) and (2) if you'd like — they're self-contained to crates/perry-runtime/src/object.rs and the existing SHAPE_CACHE infrastructure.
Environment
- Perry 0.5.29 (latest — includes my three perf commits from earlier today:
GC_FLAG_SHAPE_SHARED clone-gating, deferred descriptor-String alloc, bigint i64 fast path)
- Node v25.8.0 / bun 1.3.12 for baseline
- macOS 26 / Apple Silicon
Related
@perry/postgres bench suite documents the user-visible ratio at https://github.com/PerryTS/postgres/blob/main/bench/results/summary.md.
- Prior fixes in the same area landed as
GC_FLAG_SHAPE_SHARED (clone-gating) and the accessor-descriptor defer in v0.5.29; those brought the 10k × 20 case from 896 ms → 764 ms (-15%). The remainder is what's documented above.
Summary
Writing to a dynamic-key property on a plain object (
obj[name] = value) scales O(N²) per row with property count on Perry, versus O(N) on V8/JSC. For objects with more than a handful of keys this makes row-shaped data — database rows, HTTP request bodies, JSON arrays-of-objects, anything resembling a spreadsheet — measurably slower than on a JS host, regardless of JIT warm-up.The root cause is that
js_object_set_field_by_namedoes a linear scan over the object'skeys_arrayon every property write to find-or-append the slot, and there's no per-call-site or per-shape inline cache short-circuiting the scan. V8's hidden-class + IC design makes the same pattern O(1) amortised after the first object of each shape.Repro
No external deps. Compile with
perry compileand run directly — compares against Node on the same hardware. Full source attached below; headline numbers from Node v25 + Perry 0.5.29 on an Apple M-series:The gap grows with column count:
Per-cell time on Perry grows from 0.06 µs → 0.72 µs as cols go 5 → 80 (12× increase for a 16× width increase → O(N²)). On Node it stays essentially flat — classic hidden-class IC behavior.
Root cause (confirmed by reading the runtime)
js_object_set_field_by_nameatcrates/perry-runtime/src/object.rs:1921:For a 20-property row object the 20 writes do 0+1+2+…+19 = 190 key compares per row. Across 10 000 rows that's 1.9M compares per iteration. No IC, no shape match short-circuit, no pointer-identity fast index — every write pays the full O(N) scan.
The
shape_cache_insert/js_object_alloc_with_shapepath DOES share keys_arrays across object-literal shapes known at compile time — soconst x = { a: 1, b: 2 }is fast — but the dynamicobj[name] = valuepattern produced by most real code (ORMs, result builders, config merges, parsers) falls off that path and hits the linear scan.Repro (standalone, no deps)
Run:
node --import tsx hidden-class-gap.ts perry compile hidden-class-gap.ts -o /tmp/hcg && /tmp/hcgThe scaling demo (sweeping column count 5 → 80) is at
@perry/postgresrepo →/tmp/perry-perf-probe/hidden-class-scaling.tsif you want the O(N²) data.Why it matters
Any API that produces shape-stable rows of non-trivial width hits this:
@perry/postgresrow-decode — bulk decode of 10k rows × 20 cols is ~800 ms on Perry vs ~35 ms on Node (~22× slower). Most of that gap is this scan.reply.send({ ... })paths with wide payloads.Object.assign({}, src, patch)wheresrcis more than a few keys.Users don't have a workaround either — the slow path is triggered by plain idiomatic code. Rewriting hot paths to use
new Function-generated classes (postgres.js's trick) works on Node / Bun but isn't available on Perry-native.Fix options, rough ordering by complexity
keyis the exact sameStringHeader*as a stored_key, skipjs_string_equals's validation + byte-compare. For row-shape data this triggers every time — the name strings are interned by the StringPool, so consecutive rows see pointer-equal key arguments. Estimated: 5-10% on the microbench, one-function change.set_field_by_namehas seen the same name-pointer sequence before on some object, reuse the shape's slot map (name → index) and skip the scan entirely. Close cousin to V8's transition-tree, doable as a thread-localHashMap<(obj_shape_id, key_ptr), slot_idx>. Estimated: recover the O(N²) → O(N) scaling; closes most of the gap.I'd be happy to take a swing at (1) and (2) if you'd like — they're self-contained to
crates/perry-runtime/src/object.rsand the existingSHAPE_CACHEinfrastructure.Environment
GC_FLAG_SHAPE_SHAREDclone-gating, deferred descriptor-String alloc, bigint i64 fast path)Related
@perry/postgresbench suite documents the user-visible ratio at https://github.com/PerryTS/postgres/blob/main/bench/results/summary.md.GC_FLAG_SHAPE_SHARED(clone-gating) and the accessor-descriptor defer in v0.5.29; those brought the 10k × 20 case from 896 ms → 764 ms (-15%). The remainder is what's documented above.