Skip to content

perf: plain-object property writes scale O(N²) with column count — no hidden-class IC #37

@proggeramlug

Description

@proggeramlug

Summary

Writing to a dynamic-key property on a plain object (obj[name] = value) scales O(N²) per row with property count on Perry, versus O(N) on V8/JSC. For objects with more than a handful of keys this makes row-shaped data — database rows, HTTP request bodies, JSON arrays-of-objects, anything resembling a spreadsheet — measurably slower than on a JS host, regardless of JIT warm-up.

The root cause is that js_object_set_field_by_name does a linear scan over the object's keys_array on every property write to find-or-append the slot, and there's no per-call-site or per-shape inline cache short-circuiting the scan. V8's hidden-class + IC design makes the same pattern O(1) amortised after the first object of each shape.

Repro

No external deps. Compile with perry compile and run directly — compares against Node on the same hardware. Full source attached below; headline numbers from Node v25 + Perry 0.5.29 on an Apple M-series:

# 10000 rows × 20 columns, same name sequence every row
Node v25:        8.5 ms/iter
Perry 0.5.29:   43.3 ms/iter   (5.1× slower)

The gap grows with column count:

        Node v25               Perry 0.5.29
cols=5    0.98 ms                3.00 ms
cols=10   1.29 ms               12.40 ms
cols=20   9.08 ms               40.80 ms
cols=40  13.64 ms              175.00 ms
cols=80  24.78 ms              574.80 ms

Per-cell time on Perry grows from 0.06 µs → 0.72 µs as cols go 5 → 80 (12× increase for a 16× width increase → O(N²)). On Node it stays essentially flat — classic hidden-class IC behavior.

Root cause (confirmed by reading the runtime)

js_object_set_field_by_name at crates/perry-runtime/src/object.rs:1921:

let key_count = crate::array::js_array_length(keys) as usize;
for i in 0..key_count {                                    // ← linear scan
    let key_val = crate::array::js_array_get(keys, i as u32);
    if key_val.is_string() {
        let stored_key = key_val.as_string_ptr();
        if crate::string::js_string_equals(key, stored_key) != 0 {
            // found — update this slot}
    }
}
// not found — append

For a 20-property row object the 20 writes do 0+1+2+…+19 = 190 key compares per row. Across 10 000 rows that's 1.9M compares per iteration. No IC, no shape match short-circuit, no pointer-identity fast index — every write pays the full O(N) scan.

The shape_cache_insert / js_object_alloc_with_shape path DOES share keys_arrays across object-literal shapes known at compile time — so const x = { a: 1, b: 2 } is fast — but the dynamic obj[name] = value pattern produced by most real code (ORMs, result builders, config merges, parsers) falls off that path and hits the linear scan.

Repro (standalone, no deps)

// hidden-class-gap.ts
function nowMs(): number {
    // eslint-disable-next-line @typescript-eslint/no-explicit-any
    const g = globalThis as any;
    return g.performance && g.performance.now ? g.performance.now() : Date.now();
}

const NAMES: string[] = [];
for (let i = 0; i < 20; i++) NAMES.push('c' + i);
const NROWS = 10000;
const NCOLS = 20;

function build(): Record<string, unknown>[] {
    const out: Record<string, unknown>[] = new Array(NROWS);
    for (let i = 0; i < NROWS; i++) {
        const o: Record<string, unknown> = {};
        for (let j = 0; j < NCOLS; j++) {
            o[NAMES[j]] = i * j;
        }
        out[i] = o;
    }
    return out;
}

for (let w = 0; w < 3; w++) build();  // warmup
const ITERS = 10;
const t0 = nowMs();
for (let w = 0; w < ITERS; w++) build();
console.log('10000 x 20 build: ' + ((nowMs() - t0) / ITERS).toFixed(2) + ' ms/iter');

Run:

node --import tsx hidden-class-gap.ts
perry compile hidden-class-gap.ts -o /tmp/hcg && /tmp/hcg

The scaling demo (sweeping column count 5 → 80) is at @perry/postgres repo → /tmp/perry-perf-probe/hidden-class-scaling.ts if you want the O(N²) data.

Why it matters

Any API that produces shape-stable rows of non-trivial width hits this:

  • @perry/postgres row-decode — bulk decode of 10k rows × 20 cols is ~800 ms on Perry vs ~35 ms on Node (~22× slower). Most of that gap is this scan.
  • JSON.parse of an array of uniform objects with more than a handful of fields.
  • HTTP/fastify body deserialization and request handler reply.send({ ... }) paths with wide payloads.
  • Anything using Object.assign({}, src, patch) where src is more than a few keys.

Users don't have a workaround either — the slow path is triggered by plain idiomatic code. Rewriting hot paths to use new Function-generated classes (postgres.js's trick) works on Node / Bun but isn't available on Perry-native.

Fix options, rough ordering by complexity

  1. Pointer-identity fast path in the scan. When key is the exact same StringHeader* as a stored_key, skip js_string_equals's validation + byte-compare. For row-shape data this triggers every time — the name strings are interned by the StringPool, so consecutive rows see pointer-equal key arguments. Estimated: 5-10% on the microbench, one-function change.
  2. Per-shape keys_array cache keyed on the sequence of name pointers at the call site. When set_field_by_name has seen the same name-pointer sequence before on some object, reuse the shape's slot map (name → index) and skip the scan entirely. Close cousin to V8's transition-tree, doable as a thread-local HashMap<(obj_shape_id, key_ptr), slot_idx>. Estimated: recover the O(N²) → O(N) scaling; closes most of the gap.
  3. Proper hidden-class ICs, emitted per call site by codegen — the full-fat V8 approach. Biggest win, biggest surgery. Probably the right long-term home; the approach above is the incremental step that buys most of it.

I'd be happy to take a swing at (1) and (2) if you'd like — they're self-contained to crates/perry-runtime/src/object.rs and the existing SHAPE_CACHE infrastructure.

Environment

  • Perry 0.5.29 (latest — includes my three perf commits from earlier today: GC_FLAG_SHAPE_SHARED clone-gating, deferred descriptor-String alloc, bigint i64 fast path)
  • Node v25.8.0 / bun 1.3.12 for baseline
  • macOS 26 / Apple Silicon

Related

  • @perry/postgres bench suite documents the user-visible ratio at https://github.com/PerryTS/postgres/blob/main/bench/results/summary.md.
  • Prior fixes in the same area landed as GC_FLAG_SHAPE_SHARED (clone-gating) and the accessor-descriptor defer in v0.5.29; those brought the 10k × 20 case from 896 ms → 764 ms (-15%). The remainder is what's documented above.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions