Skip to content

JSON.parse + iterate + field access silently drops matches at scale #44

@proggeramlug

Description

@proggeramlug

Repro

// Build input text manually (avoids JSON.stringify, tracked separately)
let parts: string[] = ['['];
for (let i = 0; i < 50000; i++) {
  if (i > 0) parts.push(',');
  parts.push('{"id":' + i + ',"name":"Record_' + i +
    '","email":"r' + i + '@x.com","country":"US",' +
    '"tags":["a","b","c"],"active":true,' +
    '"addr":{"street":"' + i + ' Main","city":"Springfield","zip":' + (10000 + i) + '}}');
}
parts.push(']');
const text = parts.join('');
console.log('input bytes:', text.length);

const parsed = JSON.parse(text);
console.log('parsed length:', parsed.length);

let count = 0;
for (let i = 0; i < parsed.length; i++) {
  if (parsed[i].active === true) count++;
}
console.log('expected', parsed.length, 'active, got', count);

Expected

expected 50000 active, got 50000

Actual (Perry 0.5.30)

parsed length: 49936
expected 49936 active, got 56

Two distinct problems surface here:

  1. parsed.length is 49936 instead of 50000 — parse itself drops ~0.1% of records silently.
  2. Of the 49936 records parse did return, only 56 have .active === true — a loss of 99.9% on field-read.

Both counts vary slightly run-to-run, which is the tell: this is a GC-sweep timing issue, not a parser-state bug.

Observations

  • Below ~2000 records with rich objects, works correctly (50k active = 50k matches).
  • Simple {id, active} objects work at 500k records. The problem scales with object complexity, not just record count — suggesting the sweep is freeing the parser's string/object allocs before the user iteration reaches them.
  • No panic, no stderr output — the corrupted .active reads return false/undefined silently.

Workaround (partial)

Access fields via explicit temporary variables or extract into fresh objects early. Below ~1000-record scale, building fresh object literals from parsed data (rather than mutating parsed records) is reliable.

Likely same underlying root cause as #42 (large Buffer param) and #43 (JSON.stringify panic) — parser-allocated objects aren't reachable from the GC's conservative stack scan when held only in a local array-of-pointers.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions