perf: plain-object property writes scale O(N²) with column count — no hidden-class IC

## Summary

Writing to a dynamic-key property on a plain object (`obj[name] = value`) scales **O(N²) per row** with property count on Perry, versus O(N) on V8/JSC. For objects with more than a handful of keys this makes row-shaped data — database rows, HTTP request bodies, JSON arrays-of-objects, anything resembling a spreadsheet — measurably slower than on a JS host, regardless of JIT warm-up.

The root cause is that `js_object_set_field_by_name` does a linear scan over the object's `keys_array` on every property write to find-or-append the slot, and there's no per-call-site or per-shape inline cache short-circuiting the scan. V8's hidden-class + IC design makes the same pattern O(1) amortised after the first object of each shape.

## Repro

No external deps. Compile with `perry compile` and run directly — compares against Node on the same hardware. Full source attached below; headline numbers from Node v25 + Perry 0.5.29 on an Apple M-series:

```
# 10000 rows × 20 columns, same name sequence every row
Node v25:        8.5 ms/iter
Perry 0.5.29:   43.3 ms/iter   (5.1× slower)
```

The gap grows with column count:

```
        Node v25               Perry 0.5.29
cols=5    0.98 ms                3.00 ms
cols=10   1.29 ms               12.40 ms
cols=20   9.08 ms               40.80 ms
cols=40  13.64 ms              175.00 ms
cols=80  24.78 ms              574.80 ms
```

Per-cell time on Perry grows from 0.06 µs → 0.72 µs as cols go 5 → 80 (**12× increase for a 16× width increase → O(N²)**). On Node it stays essentially flat — classic hidden-class IC behavior.

## Root cause (confirmed by reading the runtime)

`js_object_set_field_by_name` at `crates/perry-runtime/src/object.rs:1921`:

```rust
let key_count = crate::array::js_array_length(keys) as usize;
for i in 0..key_count {                                    // ← linear scan
    let key_val = crate::array::js_array_get(keys, i as u32);
    if key_val.is_string() {
        let stored_key = key_val.as_string_ptr();
        if crate::string::js_string_equals(key, stored_key) != 0 {
            // found — update this slot
            …
        }
    }
}
// not found — append
```

For a 20-property row object the 20 writes do 0+1+2+…+19 = **190 key compares per row**. Across 10 000 rows that's 1.9M compares per iteration. No IC, no shape match short-circuit, no pointer-identity fast index — every write pays the full O(N) scan.

The `shape_cache_insert` / `js_object_alloc_with_shape` path DOES share keys_arrays across object-literal shapes known at compile time — so `const x = { a: 1, b: 2 }` is fast — but the dynamic `obj[name] = value` pattern produced by most real code (ORMs, result builders, config merges, parsers) falls off that path and hits the linear scan.

## Repro (standalone, no deps)

```ts
// hidden-class-gap.ts
function nowMs(): number {
    // eslint-disable-next-line @typescript-eslint/no-explicit-any
    const g = globalThis as any;
    return g.performance && g.performance.now ? g.performance.now() : Date.now();
}

const NAMES: string[] = [];
for (let i = 0; i < 20; i++) NAMES.push('c' + i);
const NROWS = 10000;
const NCOLS = 20;

function build(): Record<string, unknown>[] {
    const out: Record<string, unknown>[] = new Array(NROWS);
    for (let i = 0; i < NROWS; i++) {
        const o: Record<string, unknown> = {};
        for (let j = 0; j < NCOLS; j++) {
            o[NAMES[j]] = i * j;
        }
        out[i] = o;
    }
    return out;
}

for (let w = 0; w < 3; w++) build();  // warmup
const ITERS = 10;
const t0 = nowMs();
for (let w = 0; w < ITERS; w++) build();
console.log('10000 x 20 build: ' + ((nowMs() - t0) / ITERS).toFixed(2) + ' ms/iter');
```

Run:
```bash
node --import tsx hidden-class-gap.ts
perry compile hidden-class-gap.ts -o /tmp/hcg && /tmp/hcg
```

The scaling demo (sweeping column count 5 → 80) is at `@perry/postgres` repo → `/tmp/perry-perf-probe/hidden-class-scaling.ts` if you want the O(N²) data.

## Why it matters

Any API that produces shape-stable rows of non-trivial width hits this:

- **`@perry/postgres`** row-decode — bulk decode of 10k rows × 20 cols is ~800 ms on Perry vs ~35 ms on Node (~22× slower). Most of that gap is this scan.
- **JSON.parse** of an array of uniform objects with more than a handful of fields.
- **HTTP/fastify** body deserialization and request handler `reply.send({ ... })` paths with wide payloads.
- **Anything using `Object.assign({}, src, patch)`** where `src` is more than a few keys.

Users don't have a workaround either — the slow path is triggered by plain idiomatic code. Rewriting hot paths to use `new Function`-generated classes (postgres.js's trick) works on Node / Bun but isn't available on Perry-native.

## Fix options, rough ordering by complexity

1. **Pointer-identity fast path in the scan.** When `key` is the exact same `StringHeader*` as a stored_key, skip `js_string_equals`'s validation + byte-compare. For row-shape data this triggers every time — the name strings are interned by the StringPool, so consecutive rows see pointer-equal key arguments. Estimated: 5-10% on the microbench, one-function change.
2. **Per-shape keys_array cache** keyed on the sequence of name pointers at the call site. When `set_field_by_name` has seen the same name-pointer sequence before on some object, reuse the shape's slot map (name → index) and skip the scan entirely. Close cousin to V8's transition-tree, doable as a thread-local `HashMap<(obj_shape_id, key_ptr), slot_idx>`. Estimated: recover the O(N²) → O(N) scaling; closes most of the gap.
3. **Proper hidden-class ICs**, emitted per call site by codegen — the full-fat V8 approach. Biggest win, biggest surgery. Probably the right long-term home; the approach above is the incremental step that buys most of it.

I'd be happy to take a swing at (1) and (2) if you'd like — they're self-contained to `crates/perry-runtime/src/object.rs` and the existing `SHAPE_CACHE` infrastructure.

## Environment

- Perry 0.5.29 (latest — includes my three perf commits from earlier today: `GC_FLAG_SHAPE_SHARED` clone-gating, deferred descriptor-String alloc, bigint i64 fast path)
- Node v25.8.0 / bun 1.3.12 for baseline
- macOS 26 / Apple Silicon

## Related

- `@perry/postgres` bench suite documents the user-visible ratio at <https://github.com/PerryTS/postgres/blob/main/bench/results/summary.md>.
- Prior fixes in the same area landed as `GC_FLAG_SHAPE_SHARED` (clone-gating) and the accessor-descriptor defer in v0.5.29; those brought the 10k × 20 case from 896 ms → 764 ms (-15%). The remainder is what's documented above.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: plain-object property writes scale O(N²) with column count — no hidden-class IC #37

Summary

Repro

Root cause (confirmed by reading the runtime)

Repro (standalone, no deps)

Why it matters

Fix options, rough ordering by complexity

Environment

Related

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

perf: plain-object property writes scale O(N²) with column count — no hidden-class IC #37

Description

Summary

Repro

Root cause (confirmed by reading the runtime)

Repro (standalone, no deps)

Why it matters

Fix options, rough ordering by complexity

Environment

Related

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions