You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
PR #44 (feat(lazy): structural patching for decode+modify+encode) implements lazy patching to avoid materializing the root container on every __newindex call. The performance idea is sound, but the PR as shipped bundles five interacting changes:
New FFI symbol qjson_cursor_field_bytes (additive)
Scalar replacement patches on existing keys (the actual perf win)
Mid-to-large payloads: +5–95% on the qjson.decode + qjson.encode (unmodified) path
github-100kqjson.parse + access fields: −37% regression with significantly higher variance — caused by the three new rawget checks in LazyObject.__index on the read-only hot path
Small payloads: minor regressions in the same direction
The regression is real even when no patches are recorded, because __index now scans _patches, checks _deleted, and consults _child_cache before the FFI lookup, regardless of whether the document was ever mutated.
Introduce a single new field _patches on LazyObject
LazyObject.__newindex records a patch only when all of the following hold:
type(v) is "string" | "number" | "boolean" or v == qjson.null (scalar)
The key exists in the original buffer (verified via qjson_cursor_field_bytes)
Otherwise: fall through to the existing materialization path unchanged
LazyObject.__index checks _patches (linear scan; P is bounded by typical workloads ≤ 5) before the FFI cursor lookup
LazyObject.__pairs substitutes p.lua_value for patched keys, preserving original key order
encode_proxy adds one new branch: if _patches is non-empty and no dirty children, use a splice encoder over sorted byte spans
Out of scope (future, separate issues)
Deletion support (t.x = nil) — future PR-2
New-field insertion — future PR-3 (may not ship at all if usage data doesn't justify the complexity)
Switching container cache from rawset to sidecar — required only by PR-3
is_dirty / has_dirty_children / encode_lazy_object_walking_with_patches — required only by PR-2/PR-3
Behavior preservation guarantees
next(view) semantics unchanged — containers are still raw-set on self
pairs(view) / qjson.pairs(view) semantics unchanged for unpatched views; patched views see substituted values in original key order
#view semantics unchanged
Any unsupported mutation pattern (delete, new field, or nested-table value on a previously absent key) falls through to the existing materialize_object_contents path, preserving today's behavior verbatim
Expected benefit
Based on the b526d58 vs PR #44 HEAD comparison, the splice path delivers the bulk of the gain on the qjson.decode + qjson.encode (unmodified) benchmark:
PR-1 should reach within ~5% of PR #44's measured numbers on the encode path, while eliminating the −37% github-100k parse+access regression because __index adds only one rawget("_patches") short-circuit (nil → skip patch logic) on the unmutated read path.
Acceptance criteria
Bench: github-100k and smallqjson.parse + access fields show no regression vs main (within ±3% median over 5 runs)
Bench: qjson.decode + qjson.encode (unmodified) and a new "decode + modify N scalars + encode" scenario both improve vs main on payloads ≥ 100 KB
Test: existing tests/lua/lazy_table_spec.lua and tests/lua/cjson_compat_spec.lua pass unmodified
Test: new tests/lua/lazy_patch_spec.lua covers only scalar-replacement cases (no delete/new-field cases in this PR)
Lua diff: ≤ 100 net new lines in lua/qjson/table.lua
No change to next(view) / #view behavior
Context
This issue is the result of decomposing PR #44 after a code review pointed out that the bundled changes carried disproportionate risk for the marginal perf gain on rare mutation patterns. The discussion and bench numbers backing this decomposition are in PR #44's review thread.
Background
PR #44 (
feat(lazy): structural patching for decode+modify+encode) implements lazy patching to avoid materializing the root container on every__newindexcall. The performance idea is sound, but the PR as shipped bundles five interacting changes:qjson_cursor_field_bytes(additive)rawset→_child_cache) — required only for perf(scan): memchr-based fast path for in-string content #5t.x = nil(no longer materializes)t.new_key = ...(no longer materializes)Comparative bench (before vs PR #44 HEAD) showed:
qjson.decode + qjson.encode (unmodified)pathgithub-100kqjson.parse + access fields: −37% regression with significantly higher variance — caused by the three newrawgetchecks inLazyObject.__indexon the read-only hot pathThe regression is real even when no patches are recorded, because
__indexnow scans_patches, checks_deleted, and consults_child_cachebefore the FFI lookup, regardless of whether the document was ever mutated.Beyond perf, #44 also enlarges the surface:
next(view)silently returns nil for user-visible keys (container children live in_child_cache, not onself)#viewreturns the stale FFI-reported count after new-field patchesencode_proxydecision tree fans out to 5 paths with implicit preconditionslua/qjson/table.lua: 556 → 821 lines (+47%)docs/lazy-patch-spec.mditself notes that the implementation has diverged from the original designProposed scope (this issue)
Land only the smallest slice that captures most of the perf benefit with negligible risk:
qjson_cursor_field_bytes(additive, panic-barrier preserved)_patchesonLazyObjectLazyObject.__newindexrecords a patch only when all of the following hold:type(v)is"string" | "number" | "boolean"orv == qjson.null(scalar)qjson_cursor_field_bytes)LazyObject.__indexchecks_patches(linear scan; P is bounded by typical workloads ≤ 5) before the FFI cursor lookupLazyObject.__pairssubstitutesp.lua_valuefor patched keys, preserving original key orderencode_proxyadds one new branch: if_patchesis non-empty and no dirty children, use a splice encoder over sorted byte spansOut of scope (future, separate issues)
t.x = nil) — future PR-2rawsetto sidecar — required only by PR-3is_dirty/has_dirty_children/encode_lazy_object_walking_with_patches— required only by PR-2/PR-3Behavior preservation guarantees
next(view)semantics unchanged — containers are still raw-set onselfpairs(view)/qjson.pairs(view)semantics unchanged for unpatched views; patched views see substituted values in original key order#viewsemantics unchangedmaterialize_object_contentspath, preserving today's behavior verbatimExpected benefit
Based on the
b526d58vs PR #44 HEAD comparison, the splice path delivers the bulk of the gain on theqjson.decode + qjson.encode (unmodified)benchmark:PR-1 should reach within ~5% of PR #44's measured numbers on the encode path, while eliminating the −37%
github-100k parse+accessregression because__indexadds only onerawget("_patches")short-circuit (nil → skip patch logic) on the unmutated read path.Acceptance criteria
github-100kandsmallqjson.parse + access fieldsshow no regression vsmain(within ±3% median over 5 runs)qjson.decode + qjson.encode (unmodified)and a new "decode + modify N scalars + encode" scenario both improve vsmainon payloads ≥ 100 KBtests/lua/lazy_table_spec.luaandtests/lua/cjson_compat_spec.luapass unmodifiedtests/lua/lazy_patch_spec.luacovers only scalar-replacement cases (no delete/new-field cases in this PR)lua/qjson/table.luanext(view)/#viewbehaviorContext
This issue is the result of decomposing PR #44 after a code review pointed out that the bundled changes carried disproportionate risk for the marginal perf gain on rare mutation patterns. The discussion and bench numbers backing this decomposition are in PR #44's review thread.