Skip to content

Driver bench flakiness on 0.5.82 — SIGSEGV on iter 51+ or sample array corruption (n=256 instead of 50) #73

@proggeramlug

Description

@proggeramlug

Summary

Running @perry/postgres's bench/bench-this.ts on Perry 0.5.82 has three possible outcomes, chosen at random each run:

  1. Clean completion — all 4 workloads run, numbers are great (1k×20: 10 ms vs 42 ms on 0.5.29; 10k×20: 95 ms vs 764 ms on 0.5.29). Happens roughly 1 in 4 runs.

  2. SIGSEGV after tiny workload — crashes on the first iteration of param-1row (exit 139). Happens roughly 2 in 4 runs.

  3. Sample-array corruption in large-10k-x-20 — workload completes but samples.length reads as 256 (should be 50), all samples are garbage:

    large-10k-x-20  n=256  min -7229134230240085510…µs  p50 0µs  p95 NaNms  mean NaNms
    

    Happens roughly 1 in 4 runs.

The #72 fix unblocked correctness (queries return rows) and the IC work is clearly paying off on the good runs, so this is a different residual issue — probably GC-related since the failure points are specifically where arrays are filled in loops and consumed later.

Repro

cd @perry/postgres
perry compile bench/bench-this.ts -o /tmp/bench-this
PGHOST=… /tmp/bench-this   # run 3-4 times; you'll see all three modes
# Good run (1 in 4):
@perry/postgres perry  tiny           n=50   p50 2.00ms  …
@perry/postgres perry  param-1row     n=50   p50 2.50ms  …
@perry/postgres perry  medium-1k-x-20 n=50   p50 10.0ms  …
@perry/postgres perry  large-10k-x-20 n=50   p50 95.0ms  …

# SIGSEGV run (2 in 4):
@perry/postgres perry  tiny           n=50   p50 2.00ms  …
[ exit 139 ]

# Corrupted-sample run (1 in 4):
@perry/postgres perry  tiny           n=50   …
@perry/postgres perry  param-1row     n=50   …
@perry/postgres perry  medium-1k-x-20 n=50   p50 10.0ms  …
@perry/postgres perry  large-10k-x-20 n=256  min -7229134230…µs  p50 0µs  p95 NaNms  mean NaNms

Likely cause

Two failure signatures — SIGSEGV and array-length-reads-256 — both involve per-workload samples: number[] = new Array(iters):

const samples: number[] = new Array(iters);
for (let i = 0; i < iters; i++) {
    const t0 = now();
    const r = await conn.query(wl.sql, wl.params);
    const t1 = now();
    samples[i] = t1 - t0;
}
printRow(label, wl.name, computeStats(samples));

The pattern allocate fixed-length array → fill in loop → pass to computeStats is the same shape as the #72 row-accumulator pattern that v0.5.82 fixed. This looks like the same bug-class still reaching another call site — probably the samples array gets scalar-replaced or freed mid-loop, and whatever replaces it in memory has different length + garbage contents.

SIGSEGV flavor: now() probes globalThis.performance (guard added in #70). After many iterations the probe path may be hitting something that's been freed.

Narrowing

The minimal repro that DOES crash has to include all of: ≥55 SELECT 1 queries in a loop, a samples array filled with now() deltas, THEN another workload (param-1row) in a fresh sample array. Isolated single-workload scripts don't crash (I ran 55 tiny + 55 param and 55 tiny + 10 param separately — both fine). The bench's WORKLOADS-array loop + stats call appears necessary.

Why it matters

Blocks stable reporting of the new perf numbers. When it works, Perry 0.5.82 delivers the massive improvements from the IC work (#68, #70, #71, #72 all landing). When it doesn't, the @perry/postgres bench matrix can't include Perry columns reliably.

Environment

  • Perry 0.5.82
  • macOS 26 / Apple Silicon
  • Repro: perry compile bench/bench-this.ts in @perry/postgres repo

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions