Skip to content

feat(lazy): record scalar patches to skip materialization on small edits #48

@membphis

Description

@membphis

Background

PR #44 (feat(lazy): structural patching for decode+modify+encode) implements lazy patching to avoid materializing the root container on every __newindex call. The performance idea is sound, but the PR as shipped bundles five interacting changes:

  1. New FFI symbol qjson_cursor_field_bytes (additive)
  2. Scalar replacement patches on existing keys (the actual perf win)
  3. Sidecar container cache (rawset_child_cache) — required only for perf(scan): memchr-based fast path for in-string content #5
  4. Deletion via t.x = nil (no longer materializes)
  5. New-field insertion via t.new_key = ... (no longer materializes)

Comparative bench (before vs PR #44 HEAD) showed:

  • Mid-to-large payloads: +5–95% on the qjson.decode + qjson.encode (unmodified) path
  • github-100k qjson.parse + access fields: −37% regression with significantly higher variance — caused by the three new rawget checks in LazyObject.__index on the read-only hot path
  • Small payloads: minor regressions in the same direction

The regression is real even when no patches are recorded, because __index now scans _patches, checks _deleted, and consults _child_cache before the FFI lookup, regardless of whether the document was ever mutated.

Beyond perf, #44 also enlarges the surface:

  • next(view) silently returns nil for user-visible keys (container children live in _child_cache, not on self)
  • #view returns the stale FFI-reported count after new-field patches
  • encode_proxy decision tree fans out to 5 paths with implicit preconditions
  • lua/qjson/table.lua: 556 → 821 lines (+47%)
  • docs/lazy-patch-spec.md itself notes that the implementation has diverged from the original design

Proposed scope (this issue)

Land only the smallest slice that captures most of the perf benefit with negligible risk:

  • Add FFI qjson_cursor_field_bytes (additive, panic-barrier preserved)
  • Introduce a single new field _patches on LazyObject
  • LazyObject.__newindex records a patch only when all of the following hold:
    • type(v) is "string" | "number" | "boolean" or v == qjson.null (scalar)
    • The key exists in the original buffer (verified via qjson_cursor_field_bytes)
    • Otherwise: fall through to the existing materialization path unchanged
  • LazyObject.__index checks _patches (linear scan; P is bounded by typical workloads ≤ 5) before the FFI cursor lookup
  • LazyObject.__pairs substitutes p.lua_value for patched keys, preserving original key order
  • encode_proxy adds one new branch: if _patches is non-empty and no dirty children, use a splice encoder over sorted byte spans

Out of scope (future, separate issues)

  • Deletion support (t.x = nil) — future PR-2
  • New-field insertion — future PR-3 (may not ship at all if usage data doesn't justify the complexity)
  • Switching container cache from rawset to sidecar — required only by PR-3
  • is_dirty / has_dirty_children / encode_lazy_object_walking_with_patches — required only by PR-2/PR-3

Behavior preservation guarantees

  • next(view) semantics unchanged — containers are still raw-set on self
  • pairs(view) / qjson.pairs(view) semantics unchanged for unpatched views; patched views see substituted values in original key order
  • #view semantics unchanged
  • Any unsupported mutation pattern (delete, new field, or nested-table value on a previously absent key) falls through to the existing materialize_object_contents path, preserving today's behavior verbatim

Expected benefit

Based on the b526d58 vs PR #44 HEAD comparison, the splice path delivers the bulk of the gain on the qjson.decode + qjson.encode (unmodified) benchmark:

Payload Before After (full #44) PR-1 target
100 KB 167k ops/s 177k ops/s ≈ 175k
500 KB 22k 41k ≈ 40k
1 MB 17k 19k ≈ 19k

PR-1 should reach within ~5% of PR #44's measured numbers on the encode path, while eliminating the −37% github-100k parse+access regression because __index adds only one rawget("_patches") short-circuit (nil → skip patch logic) on the unmutated read path.

Acceptance criteria

  • Bench: github-100k and small qjson.parse + access fields show no regression vs main (within ±3% median over 5 runs)
  • Bench: qjson.decode + qjson.encode (unmodified) and a new "decode + modify N scalars + encode" scenario both improve vs main on payloads ≥ 100 KB
  • Test: existing tests/lua/lazy_table_spec.lua and tests/lua/cjson_compat_spec.lua pass unmodified
  • Test: new tests/lua/lazy_patch_spec.lua covers only scalar-replacement cases (no delete/new-field cases in this PR)
  • Lua diff: ≤ 100 net new lines in lua/qjson/table.lua
  • No change to next(view) / #view behavior

Context

This issue is the result of decomposing PR #44 after a code review pointed out that the bundled changes carried disproportionate risk for the marginal perf gain on rare mutation patterns. The discussion and bench numbers backing this decomposition are in PR #44's review thread.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions