feat(lazy): structural patching for decode+modify+encode#44
Conversation
Add lazy patch support to avoid materializing the entire root container when modifying a few fields. Instead of triggering full materialization on __newindex, we now record patches and splice the original buffer during encode. Changes: - Add qjson_cursor_field_bytes FFI function to get field value byte spans - Modify LazyObject.__newindex to record patches instead of materializing - Modify LazyObject.__index to check patches first - Add encode_with_patches for splice-based encoding - Add lazy_patch_spec.lua tests Performance: decode+modify+encode on 100KB payload - Before: ~30 μs/op (full materialization + walking encode) - After: ~32 μs/op (WIP - splice path needs optimization) See docs/lazy-patch-spec.md for detailed design.
There was a problem hiding this comment.
Pull request overview
Adds a "lazy patch" path so that mutating a few fields on a qjson.decode() proxy no longer materializes the whole root container. Writes are recorded in _patches/_deleted side tables, and qjson.encode() splices the original buffer (with new helper FFI qjson_cursor_field_bytes) instead of walking a materialized table.
Changes:
- New FFI
qjson_cursor_field_bytes(Rust + C header + LuaJIT cdef) returning the byte span of a field's value in the original buffer. - Rewrites
LazyObject.__newindex/__index/__pairsto record/read patches and deletions; addsencode_with_patchesandencode_lazy_object_walking_with_patchesto encode lazy objects with patches. - Adds
tests/lua/lazy_patch_spec.luaanddocs/lazy-patch-spec.md.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| src/ffi.rs | Adds qjson_cursor_field_bytes returning a field value's byte range. |
| include/qjson.h | Declares the new FFI entry point. |
| lua/qjson.lua | Adds matching LuaJIT cdef for qjson_cursor_field_bytes. |
| lua/qjson/table.lua | Replaces materialize-on-write with patch recording; new patch-aware encode and iteration paths. |
| tests/lua/lazy_patch_spec.lua | Spec covering basic patch/read/delete/new-field/iteration/nesting/edge cases. |
| docs/lazy-patch-spec.md | Detailed design doc explaining problem, algorithm, edge cases, and benchmarks. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if has_deleted or #new_fields > 0 then | ||
| -- Fall back to walking for complex cases | ||
| return encode_lazy_object_walking_with_patches(t, patches, deleted) | ||
| end | ||
|
|
||
| -- Sort replacements by start offset | ||
| table.sort(replacements, function(a, b) return a.start < b.start end) | ||
|
|
||
| -- Build output by splicing | ||
| local parts = {} | ||
| local pos = t._bs + 1 -- 1-based Lua index | ||
|
|
||
| for _, r in ipairs(replacements) do | ||
| -- Copy unchanged portion (convert 0-based to 1-based) | ||
| local r_start_1based = r.start + 1 | ||
| if r_start_1based > pos then | ||
| parts[#parts + 1] = buf:sub(pos, r_start_1based - 1) | ||
| end | ||
| -- Insert replacement | ||
| parts[#parts + 1] = r.value | ||
| pos = r.end_ + 1 -- end_ is exclusive, so +1 for 1-based | ||
| end | ||
|
|
||
| -- Copy remaining portion | ||
| if pos <= t._be then | ||
| parts[#parts + 1] = buf:sub(pos, t._be) | ||
| end | ||
|
|
||
| return table.concat(parts) | ||
| end | ||
|
|
||
| -- Walk a LazyObject with patches, handling deletions and new fields | ||
| local function encode_lazy_object_walking_with_patches(t, patches, deleted) |
There was a problem hiding this comment.
Fixed in 176558a — encode_lazy_object_walking_with_patches is now defined ahead of encode_with_patches. Verified: tests/lua/lazy_patch_spec.lua 26/26 pass; full Lua suite 112/112.
| -- Check if a lazy view has patches (but not necessarily dirty children) | ||
| local function has_patches(v) | ||
| local patches = rawget(v, "_patches") | ||
| return patches and #patches > 0 | ||
| end | ||
|
|
||
| -- Check if a lazy view has deletions | ||
| local function has_deletions(v) | ||
| local deleted = rawget(v, "_deleted") | ||
| if not deleted then return false end | ||
| for _ in pairs(deleted) do return true end | ||
| return false | ||
| end |
There was a problem hiding this comment.
Confirmed unused. grep -n "has_patches\|has_deletions" lua/qjson/table.lua only matches the definitions. Removing both.
| LazyObject.__newindex = function(t, k, v) | ||
| local contents = materialize_object_contents(t) | ||
| -- Snapshot user-key cache BEFORE nilling internals. | ||
| -- Use next() for raw iteration: pairs() invokes __pairs on lazy tables, | ||
| -- walking the full JSON via FFI instead of the Lua-side rawget cache. | ||
| local cache = {} | ||
| local ck, cv = next(t) | ||
| while ck ~= nil do | ||
| if not INTERNAL_KEYS[ck] then | ||
| cache[ck] = cv | ||
| end | ||
| ck, cv = next(t, ck) | ||
| -- Initialize patches table if needed | ||
| if not rawget(t, "_patches") then | ||
| rawset(t, "_patches", {}) | ||
| rawset(t, "_deleted", {}) | ||
| end | ||
| t._doc, t._cur_box, t._cur, t._bs, t._be = nil, nil, nil, nil, nil | ||
| setmetatable(t, nil) | ||
| for _, kv in ipairs(contents) do | ||
| rawset(t, kv[1], cache[kv[1]] or kv[2]) | ||
|
|
||
| local patches = rawget(t, "_patches") | ||
| local deleted = rawget(t, "_deleted") | ||
|
|
||
| if v == nil then | ||
| -- Mark key as deleted | ||
| deleted[k] = true | ||
| -- Remove from patches if previously patched | ||
| for i, p in ipairs(patches) do | ||
| if p.key == k then | ||
| table.remove(patches, i) | ||
| break | ||
| end | ||
| end | ||
| else | ||
| -- Encode the new value (we need both encoded and lua_value) | ||
| local encoded = encode(v) | ||
|
|
||
| -- Update or add patch | ||
| local found = false | ||
| for _, p in ipairs(patches) do | ||
| if p.key == k then | ||
| p.encoded_value = encoded | ||
| p.lua_value = v | ||
| found = true | ||
| break | ||
| end | ||
| end | ||
| if not found then | ||
| patches[#patches + 1] = { key = k, encoded_value = encoded, lua_value = v } | ||
| end | ||
|
|
||
| -- Remove from deleted if previously deleted | ||
| deleted[k] = nil | ||
| end | ||
| rawset(t, k, v) | ||
| end |
There was a problem hiding this comment.
Real bug. t[1] = "x" on a LazyObject records {key=1, ...}, then read_object_field returns nil for non-string keys (line 117) so the read is silently lost, and encode_with_patches errors on #p.key later. Fix: raise in LazyObject.__newindex for non-string keys — matches the existing encode_object policy at line 774.
| -- For each patch, find the field's byte range in the original buffer | ||
| for _, p in ipairs(patches) do | ||
| local rc = C.qjson_cursor_field_bytes(t._cur, p.key, #p.key, sz_a, sz_b) | ||
| if rc == QJSON_OK then | ||
| -- Existing field: replace value | ||
| replacements[#replacements + 1] = { | ||
| start = tonumber(sz_a[0]), | ||
| end_ = tonumber(sz_b[0]), | ||
| value = p.encoded_value, | ||
| } | ||
| end | ||
| -- If NOT_FOUND, it's a new field - handled separately below | ||
| end | ||
|
|
||
| -- Collect deleted field spans (we need to find the full field span including key) | ||
| -- For simplicity, we'll handle deletions by walking and skipping deleted keys | ||
| local has_deleted = false | ||
| for _ in pairs(deleted) do has_deleted = true; break end | ||
|
|
||
| -- If we have deletions or new fields, fall back to walking | ||
| -- (splicing deletions is complex due to comma handling) | ||
| local new_fields = {} | ||
| for _, p in ipairs(patches) do | ||
| local rc = C.qjson_cursor_field_bytes(t._cur, p.key, #p.key, sz_a, sz_b) | ||
| if rc == QJSON_NOT_FOUND then | ||
| new_fields[#new_fields + 1] = p | ||
| end | ||
| end | ||
|
|
||
| if has_deleted or #new_fields > 0 then | ||
| -- Fall back to walking for complex cases | ||
| return encode_lazy_object_walking_with_patches(t, patches, deleted) | ||
| end |
There was a problem hiding this comment.
Valid. Collapsing the two passes into one — classify each patch into replacements or new_fields in a single qjson_cursor_field_bytes call.
| -- Check patches first | ||
| local patches = rawget(self, "_patches") | ||
| if patches then | ||
| for _, p in ipairs(patches) do | ||
| if p.key == key then | ||
| return p.lua_value | ||
| end | ||
| end | ||
| end | ||
|
|
||
| -- Check deleted | ||
| local deleted = rawget(self, "_deleted") | ||
| if deleted and deleted[key] then | ||
| return nil | ||
| end |
There was a problem hiding this comment.
Declining for now. Typical patch counts in this workflow are 1–3 modified fields; a 3-entry linear scan is faster than maintaining a parallel hash table (the FFI cost from encode/qjson_cursor_field_bytes already dominates). The O(n·m) only matters when m grows large, which the lazy-patch design discourages (assigning many fields → user is better off materializing). Happy to revisit if a profile shows it becoming hot; adding to README "Roadmap / Deferred" as a noted optimization.
📝 WalkthroughWalkthroughAdds lazy patch tracking to qjson lazy proxies, a new FFI cursor API to obtain field byte spans, patch-aware encoding paths (splicing or walking fallback), a design/spec doc, and comprehensive Lua tests validating read/write/iterate/encode behavior. ChangesLazy Patching Implementation
🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 5 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 10
🧹 Nitpick comments (3)
src/ffi.rs (1)
840-867: 💤 Low valueConsider extracting the byte-range calculation into a shared helper.
The match block for computing
[start, end)from a cursor (lines 846-867) duplicates the logic inqjson_cursor_bytes(lines 738-759). Both check the lead byte, handle containers/strings by usingidx_end, and delegate scalars toscalar_byte_range.Extracting this into a helper like
cursor_byte_range(d: &Document, cur: Cursor) -> Result<(usize, usize), qjson_err>would reduce duplication and ensure consistency.♻️ Suggested helper extraction
/// Compute the byte range [start, end) for a cursor's value in the original buffer. unsafe fn cursor_byte_range(d: &Document<'_>, cur: Cursor) -> Result<(usize, usize), qjson_err> { let pos = d.indices[cur.idx_start as usize] as usize; let lead = match d.buf.get(pos) { Some(b) => *b, None => return Err(qjson_err::QJSON_PARSE_ERROR), }; match lead { b'{' | b'[' | b'"' => { let end = d.indices[cur.idx_end as usize] as usize; if end >= d.buf.len() { return Err(qjson_err::QJSON_PARSE_ERROR); } Ok((pos, end + 1)) } _ => scalar_byte_range(d, cur), } }Then both
qjson_cursor_bytesandqjson_cursor_field_bytescan call this helper.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/ffi.rs` around lines 840 - 867, The byte-range computation for a cursor is duplicated; extract it into a shared unsafe helper like cursor_byte_range(d: &Document<'_>, cur: Cursor) -> Result<(usize, usize), qjson_err> that performs the lead-byte check, handles containers/strings by using cur.idx_end and bounds-checking against d.buf.len(), and delegates scalars to scalar_byte_range; then replace the duplicated logic in qjson_cursor_bytes and qjson_cursor_field_bytes (or any other sites) to call cursor_byte_range and map its Ok/Err to the existing *value_start/*value_end and qjson_err return conventions.tests/lua/lazy_patch_spec.lua (1)
3-44: ⚡ Quick winMissing test case: deletion of non-existent field.
The test suite covers deleting existing fields and re-adding them, but doesn't test deleting a field that was never in the original JSON. This edge case should verify that deleting a non-existent field doesn't cause errors and doesn't affect encoding.
🧪 Proposed additional test
it("deletes a non-existent field without error", function() local t = qjson.decode('{"a":1}') t.b = nil -- b never existed assert.is_nil(t.b) assert.are.equal(1, t.a) local out = qjson.encode(t) local cjson = require("cjson") local parsed = cjson.decode(out) assert.is_nil(parsed.b) assert.are.equal(1, parsed.a) end)Also applies to: 72-97
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/lua/lazy_patch_spec.lua` around lines 3 - 44, Add a new test case named "deletes a non-existent field without error" to the Lazy Patch suite that decodes '{"a":1}' via qjson.decode, sets t.b = nil (where b was never present), asserts t.b is nil and t.a remains 1, then qjson.encode(t) and cjson.decode the output and assert parsed.b is nil and parsed.a equals 1; place this alongside the other "it" blocks so the behavior of qjson.decode/encode and the lazy-patch logic for deleting non-existent fields is validated.docs/lazy-patch-spec.md (1)
196-208: ⚡ Quick winThe pseudocode helper functions in the specification need clarification or mapping to actual implementation.
Lines 196 and 212 reference
find_field_value_span()andfind_field_span()which are not defined in the spec. While the lazy patch feature is implemented and functional (confirmed bytests/lua/lazy_patch_spec.lua), the actual implementation uses C FFI calls (C.qjson_cursor_field_bytes()) rather than these abstract function names.The specification should either:
- Define what these abstract helper functions conceptually do (for algorithmic clarity)
- Map them explicitly to the actual C FFI interface used in the implementation
- Clarify upfront that this is pseudocode describing the algorithm rather than directly executable pseudocode
This will help developers understand the algorithm without needing to cross-reference the Lua implementation.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/lazy-patch-spec.md` around lines 196 - 208, The spec references undefined helpers find_field_value_span and find_field_span; update the document to either (a) add brief conceptual definitions of those helpers (what inputs they take and that they return byte offsets for a field's value/span), (b) explicitly map them to the real implementation (e.g., call sites using C.qjson_cursor_field_bytes() via FFI) and show the equivalent return values, or (c) add a clear note at the top that the listing is high-level pseudocode and that the real implementation uses C FFI (see tests/lua/lazy_patch_spec.lua); include the helper names find_field_value_span and find_field_span and the C symbol C.qjson_cursor_field_bytes in the explanation so readers can locate the implementation.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@docs/lazy-patch-spec.md`:
- Around line 332-341: Specify and add a concrete implementation for
LazyObject.__pairs that merges original fields with patch entries and excludes
deletions: read raw _patches and _deleted tables, build a patch_map mapping
patch.key -> lua_value and a seen set, iterate the original pairs (using the
preserved original__pairs iterator) yielding keys not marked deleted and not
overridden by patches (returning patch value when key exists in patch_map), then
iterate remaining patch_map keys to yield new/overridden fields, ensuring each
key is yielded only once; reference LazyObject.__pairs, rawget(t, "_patches"),
rawget(t, "_deleted"), and original__pairs to locate and implement this
behavior.
- Around line 254-256: The current splice logic uses result:find("}%s*$") to
locate the closing brace after concatenating parts, which is fragile; modify the
code handling variables result, parts and close_pos so that if
result:find("}%s*$") returns nil you immediately fallback by returning
encode_lazy_object_walking(t) (or otherwise abort the splice), ensuring you do
not proceed with unsafe splicing when the closing brace cannot be reliably
located; update any callers that rely on the splice to handle this early return
and reference t._be only when the brace is confirmed.
- Around line 244-264: The prefix handling can produce invalid JSON when all
original fields were deleted and new_fields is non-empty; in the block that
computes result/close_pos/prefix (using variables result, close_pos, prefix,
parts, new_fields and function find_field_value_span), strip any trailing commas
and whitespace from prefix before appending new_fields (e.g., normalize prefix
by removing trailing "," and spaces so you never produce "{,"), or alternatively
compute whether any fields remain after applying deletions and only insert a
comma when appropriate; update the logic that currently uses
prefix:match("[^{%s]%s*$") to perform this normalization and add unit tests for
the edge case "all fields deleted + new fields added."
- Around line 86-143: The patch structure is inconsistent: LazyObject.__newindex
currently stores only encoded_value but LazyObject.__index expects p.lua_value;
modify __newindex so that when setting a non-nil value it stores both
encoded_value = encode_value(v) and lua_value = v in the patch entry (and when
updating an existing patch update both fields), and ensure deletions still
remove patch entries and clear deleted[k]; also update the patch-record
documentation to list lua_value alongside encoded_value and key so
readers/consumers know both fields exist.
In `@lua/qjson/table.lua`:
- Around line 712-717: The loop calling C.qjson_cursor_field_bytes on t._cur
should handle unexpected return codes the same way other calls do: after getting
rc, keep the existing branch for QJSON_NOT_FOUND that appends the new part, but
add an elseif checking for rc not equal to QJSON_OK (and not QJSON_NOT_FOUND)
and propagate/report the error (e.g., return nil / raise error / log with rc and
context) so failures from C.qjson_cursor_field_bytes are surfaced; locate the
loop over patches and modify the rc handling around C.qjson_cursor_field_bytes,
referencing variables/function names: patches, p, C.qjson_cursor_field_bytes,
QJSON_NOT_FOUND, parts, encode_string, p.encoded_value, t._cur, sz_a, sz_b.
- Around line 613-623: The call to C.qjson_cursor_field_bytes in the loop
(inside the function handling patches) only checks for QJSON_OK and ignores any
other non-NOT_FOUND return codes; update the loop to explicitly handle
unexpected error codes (e.g., QJSON_PARSE_ERROR) returned by
qjson_cursor_field_bytes: after calling C.qjson_cursor_field_bytes(t._cur,
p.key, `#p.key`, sz_a, sz_b) check rc and if rc is QJSON_OK proceed to add the
replacement as now, if rc is QJSON_NOT_FOUND skip to the new-field handling as
before, but for any other rc propagate or surface the error (for example return
nil plus an error message, or log and abort) so the caller of the function can
react instead of silently continuing; reference the symbols
qjson_cursor_field_bytes, QJSON_OK, QJSON_NOT_FOUND (and QJSON_PARSE_ERROR) and
the replacements table when making the change.
- Around line 209-213: The call to C.qjson_cursor_field_bytes (inside the loop
using view._cur and p.key) only handles QJSON_NOT_FOUND and silently ignores
other error codes; update the code to check rc for non-success values (e.g.,
QJSON_TYPE_MISMATCH, QJSON_PARSE_ERROR) and handle them explicitly instead of
skipping the field: after calling qjson_cursor_field_bytes, if rc ==
QJSON_NOT_FOUND keep the existing behavior, else if rc != 0 surface the error
(return an error tuple or raise error with a descriptive message including rc
and p.key, or log it via the module logger) so callers can react to parse/type
errors rather than silently skipping fields. Ensure to reference
qjson_cursor_field_bytes, view._cur, p.key, and the QJSON_* constants when
implementing the check.
In `@tests/lua/lazy_patch_spec.lua`:
- Around line 229-275: Add a new test case inside the describe("Lazy Patch -
edge cases", ...) block named it("handles all fields deleted with new fields
added") that uses qjson.decode to load '{"a":1,"b":2}', sets t.a = nil and t.b =
nil, then assigns t.c = 3 and t.d = 4, asserts the nil and value expectations,
calls qjson.encode and parses with cjson.decode and asserts parsed.a and
parsed.b are nil and parsed.c and parsed.d equal 3 and 4; this verifies
qjson.encode handles the "all fields deleted then new fields added" edge case
without emitting invalid JSON.
- Around line 246-254: The test "handles unicode in values" currently only uses
ASCII; update the test inside the it block that calls qjson.decode/qjson.encode
to use a multi-byte UTF-8 string (e.g., include accented characters and CJK or
emoji) for t.a so the round-trip via qjson.encode and cjson.decode actually
validates Unicode handling; ensure the asserts (assert.are.equal) compare the
original Unicode string to both t.a after assignment and parsed.a after
re-decoding to confirm multi-byte sequences are preserved.
- Around line 240-244: The test "handles special characters in keys" decodes
'{"a.b":1}', patches t["a.b"]=10 and only checks the in-memory value; extend the
test to also call qjson.encode(t) and assert the encoded string matches the
expected JSON with the special key correctly quoted/escaped (e.g.,
'{"a.b":10}'), so the splice/encoding logic in qjson.encode is validated; update
the it block that uses qjson.decode, the local variable t, and add an assertion
against qjson.encode(t).
---
Nitpick comments:
In `@docs/lazy-patch-spec.md`:
- Around line 196-208: The spec references undefined helpers
find_field_value_span and find_field_span; update the document to either (a) add
brief conceptual definitions of those helpers (what inputs they take and that
they return byte offsets for a field's value/span), (b) explicitly map them to
the real implementation (e.g., call sites using C.qjson_cursor_field_bytes() via
FFI) and show the equivalent return values, or (c) add a clear note at the top
that the listing is high-level pseudocode and that the real implementation uses
C FFI (see tests/lua/lazy_patch_spec.lua); include the helper names
find_field_value_span and find_field_span and the C symbol
C.qjson_cursor_field_bytes in the explanation so readers can locate the
implementation.
In `@src/ffi.rs`:
- Around line 840-867: The byte-range computation for a cursor is duplicated;
extract it into a shared unsafe helper like cursor_byte_range(d: &Document<'_>,
cur: Cursor) -> Result<(usize, usize), qjson_err> that performs the lead-byte
check, handles containers/strings by using cur.idx_end and bounds-checking
against d.buf.len(), and delegates scalars to scalar_byte_range; then replace
the duplicated logic in qjson_cursor_bytes and qjson_cursor_field_bytes (or any
other sites) to call cursor_byte_range and map its Ok/Err to the existing
*value_start/*value_end and qjson_err return conventions.
In `@tests/lua/lazy_patch_spec.lua`:
- Around line 3-44: Add a new test case named "deletes a non-existent field
without error" to the Lazy Patch suite that decodes '{"a":1}' via qjson.decode,
sets t.b = nil (where b was never present), asserts t.b is nil and t.a remains
1, then qjson.encode(t) and cjson.decode the output and assert parsed.b is nil
and parsed.a equals 1; place this alongside the other "it" blocks so the
behavior of qjson.decode/encode and the lazy-patch logic for deleting
non-existent fields is validated.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 39a453a2-e4f7-49ae-9d3e-5ab1c9602e90
📒 Files selected for processing (6)
docs/lazy-patch-spec.mdinclude/qjson.hlua/qjson.lualua/qjson/table.luasrc/ffi.rstests/lua/lazy_patch_spec.lua
| -- Handle new fields (not in original) | ||
| local new_fields = {} | ||
| for _, p in ipairs(patches) do | ||
| if not find_field_value_span(t, p.key) then | ||
| new_fields[#new_fields + 1] = '"' .. p.key .. '":' .. p.encoded_value | ||
| end | ||
| end | ||
|
|
||
| if #new_fields > 0 then | ||
| -- Insert before closing brace | ||
| local result = table.concat(parts) | ||
| local close_pos = result:find("}%s*$") | ||
| if close_pos then | ||
| local prefix = result:sub(1, close_pos - 1) | ||
| -- Add comma if there are existing fields | ||
| if prefix:match("[^{%s]%s*$") then | ||
| prefix = prefix .. "," | ||
| end | ||
| return prefix .. table.concat(new_fields, ",") .. "}" | ||
| end | ||
| end |
There was a problem hiding this comment.
Risk of invalid JSON when all fields are deleted.
When all original fields are deleted and new fields are added, the logic at line 259 checks if prefix:match("[^{%s]%s*$") to decide whether to add a comma. However, if the prefix is just {, this pattern won't match, and no comma is added. But if replacements included deletions that left trailing commas, you could end up with malformed JSON like {,"new_field":1}.
Consider a more robust approach:
- Track whether any fields remain after applying deletions
- Normalize comma handling by removing any trailing comma in the prefix before appending new fields
- Add unit tests for this specific edge case (all fields deleted + new fields added)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@docs/lazy-patch-spec.md` around lines 244 - 264, The prefix handling can
produce invalid JSON when all original fields were deleted and new_fields is
non-empty; in the block that computes result/close_pos/prefix (using variables
result, close_pos, prefix, parts, new_fields and function
find_field_value_span), strip any trailing commas and whitespace from prefix
before appending new_fields (e.g., normalize prefix by removing trailing "," and
spaces so you never produce "{,"), or alternatively compute whether any fields
remain after applying deletions and only insert a comma when appropriate; update
the logic that currently uses prefix:match("[^{%s]%s*$") to perform this
normalization and add unit tests for the edge case "all fields deleted + new
fields added."
There was a problem hiding this comment.
The implementation does not use the spec's }%s*$ splice path for this case. When deletions or new fields are present, encode_with_patches falls through to encode_lazy_object_walking_with_patches (table.lua:641-643), which walks original keys (skipping deleted) and then appends new fields — it cannot produce {,. Splice path is only used for the "all-existing-key replacements, no deletions" case. Will add a clarifying note to the spec; adding a regression test for the all-deleted-plus-new edge case (R8) regardless.
There was a problem hiding this comment.
@membphis, thanks for the detailed clarification. That makes sense — since encode_proxy routes any case involving deletions or new fields to encode_lazy_object_walking_with_patches before reaching the splice path, the {, scenario can't arise in practice. My concern was against the spec's pseudo-code in isolation, which doesn't make the fallthrough guard explicit in that section.
Adding a clarifying note to the spec (e.g., a comment in the encode_with_patches block stating "only reached when patches are pure replacements of existing keys with no deletions/new fields") would make the invariant self-documenting and prevent future confusion. The R8 regression test is a good defensive addition regardless.
|
|
||
| ```lua | ||
| tab.model = "gpt4o" | ||
| for k, v in qjson.pairs(tab) do | ||
| print(k, v) | ||
| end | ||
| ``` | ||
|
|
||
| **Handling**: `__pairs` must merge original fields with patches and exclude deleted fields. | ||
|
|
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win
Missing algorithm specification for __pairs iteration.
The edge case section mentions that "__pairs must merge original fields with patches and exclude deleted fields" (line 340), but the algorithm for implementing this is not provided. Given the complexity of correctly merging three sources (original fields, patches, deletions), this algorithm should be specified.
Add a section showing the __pairs implementation, for example:
LazyObject.__pairs = function(t)
local patches = rawget(t, "_patches") or {}
local deleted = rawget(t, "_deleted") or {}
local seen = {}
local patch_map = {}
-- Build patch map
for _, p in ipairs(patches) do
patch_map[p.key] = p.lua_value
seen[p.key] = true
end
-- Iterator state
local orig_iter, orig_state, orig_key = original__pairs(t)
local patch_idx = 1
local patch_keys = {}
for k in pairs(patch_map) do
patch_keys[`#patch_keys` + 1] = k
end
return function(state, k)
-- Iterate original fields first (skipping deleted & patched)
while true do
local next_k, next_v = orig_iter(orig_state, k)
if next_k == nil then break end
if not deleted[next_k] and not seen[next_k] then
seen[next_k] = true
return next_k, patch_map[next_k] or next_v
end
k = next_k
end
-- Then iterate patched/new fields
while patch_idx <= `#patch_keys` do
local pk = patch_keys[patch_idx]
patch_idx = patch_idx + 1
if not seen[pk] then
seen[pk] = true
return pk, patch_map[pk]
end
end
return nil
end, t, nil
end🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@docs/lazy-patch-spec.md` around lines 332 - 341, Specify and add a concrete
implementation for LazyObject.__pairs that merges original fields with patch
entries and excludes deletions: read raw _patches and _deleted tables, build a
patch_map mapping patch.key -> lua_value and a seen set, iterate the original
pairs (using the preserved original__pairs iterator) yielding keys not marked
deleted and not overridden by patches (returning patch value when key exists in
patch_map), then iterate remaining patch_map keys to yield new/overridden
fields, ensuring each key is yielded only once; reference LazyObject.__pairs,
rawget(t, "_patches"), rawget(t, "_deleted"), and original__pairs to locate and
implement this behavior.
There was a problem hiding this comment.
__pairs is implemented at lua/qjson/table.lua:169-225 (lazy_object_iter + LazyObject.__pairs): iterates original entries via qjson_cursor_object_entry_at, skips deleted, returns patched value when a patch covers the key, then iterates remaining patches and yields only those whose key is QJSON_NOT_FOUND in the original. Adding a pointer to this in the spec.
There was a problem hiding this comment.
@membphis, thanks for the clarification! The implementation at lua/qjson/table.lua:169-225 covers the algorithm well — using qjson_cursor_object_entry_at to iterate originals, skipping deleted keys, returning patched values inline, and then draining new-only patches at the end is exactly the right approach. Adding a cross-reference from the spec's edge-case section to that implementation will close the documentation gap nicely.
- Move encode_lazy_object_walking_with_patches before encode_with_patches (Lua local function defined later was unresolvable at compile time, causing nil-call errors on new-field/deletion paths). - Update tests in lazy_patch_spec.lua and lazy_table_spec.lua to match the spec's edge-case-1 design: writing to a LazyObject records a patch on the object instead of materializing it, so the metatable stays LazyObject after assignment.
|
Responses to the three nitpicks in the top-level review:
|
- LazyObject.__newindex: raise on non-string keys instead of silently recording an unreadable patch (Copilot C3). - Remove unused has_patches / has_deletions helpers (Copilot C2). - encode_with_patches: classify each patch in a single qjson_cursor_field_bytes pass (was doing two) and propagate unexpected return codes via check() (Copilot C4 + CodeRabbit R6). - encode_lazy_object_walking_with_patches: propagate unexpected qjson_cursor_field_bytes return codes (CodeRabbit R7). - lazy_object_iter: same error-propagation fix (CodeRabbit R5). - tests/lua/lazy_patch_spec.lua: extend special-key test to verify encoded output, swap the unicode test to multi-byte UTF-8, add "delete non-existent field", "all originals deleted + new fields added", and "non-string key raises" cases (CodeRabbit R8/R9/R10 + nitpick N2). - docs/lazy-patch-spec.md: add upfront note that the listing is design pseudocode (pointing helper names at qjson_cursor_field_bytes), and correct the patch-record example to include lua_value alongside encoded_value (CodeRabbit R1 + nitpick N3).
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
tests/lua/lazy_patch_spec.lua (1)
120-128: ⚡ Quick winAdd a regression test for mutating a patched table after assignment.
Please add a case like
t.a = {x=1}; t.a.x = 2;and assertqjson.encode(t)containsx=2. Current coverage only checks immediate assignment snapshots.Also applies to: 251-260
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/lua/lazy_patch_spec.lua` around lines 120 - 128, Add a regression case to the "changes scalar to object" test in lazy_patch_spec.lua that verifies mutations to a patched table after assignment are preserved: after creating t via qjson.decode and assigning t.a = {x = 1}, mutate the nested field (t.a.x = 2) and then qjson.encode(t) and decode with cjson (or assert the encoded string) to assert the value is 2; update the existing it("changes scalar to object"...) block (and similar block around lines 251-260) to perform t.a = {x=1}; t.a.x = 2; and assert the encoded/decoded result shows x == 2 so the lazy patching preserves later mutations.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@docs/lazy-patch-spec.md`:
- Line 472: In docs/lazy-patch-spec.md update the two fenced code blocks that
currently have bare backticks (the blocks showing the
"decode/modify/encode/total" timings around the examples including "decode:
5.97 μs (20%)" and "decode: 5.97 μs (55%)") to include a language identifier
(e.g., change ``` to ```text or ```lua) so the MD040 lint rule is satisfied;
locate the fences that wrap the timing lines and add the chosen language token
to the opening fence.
In `@lua/qjson/table.lua`:
- Around line 351-366: The code stores a one-time snapshot in p.encoded_value
which becomes stale if p.lua_value (a table) is mutated later; instead stop
caching encoded bytes: remove p.encoded_value from patches and always call
encode(p.lua_value) at emit time (replace reads of p.encoded_value with
encode(p.lua_value)). Update the creation/update site (where patches[`#patches`+1]
and the loop that sets p.encoded_value and p.lua_value) to only store lua_value
= v, and modify all emit sites that reference p.encoded_value (including the
places noted around the other ranges) to re-encode via encode(p.lua_value) so
mutations are serialized correctly.
---
Nitpick comments:
In `@tests/lua/lazy_patch_spec.lua`:
- Around line 120-128: Add a regression case to the "changes scalar to object"
test in lazy_patch_spec.lua that verifies mutations to a patched table after
assignment are preserved: after creating t via qjson.decode and assigning t.a =
{x = 1}, mutate the nested field (t.a.x = 2) and then qjson.encode(t) and decode
with cjson (or assert the encoded string) to assert the value is 2; update the
existing it("changes scalar to object"...) block (and similar block around lines
251-260) to perform t.a = {x=1}; t.a.x = 2; and assert the encoded/decoded
result shows x == 2 so the lazy patching preserves later mutations.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: d05f2b5c-890c-485e-9098-2ca7cf5b41ac
📒 Files selected for processing (4)
docs/lazy-patch-spec.mdlua/qjson/table.luatests/lua/lazy_patch_spec.luatests/lua/lazy_table_spec.lua
|
|
||
| ### Current Implementation Profile | ||
|
|
||
| ``` |
There was a problem hiding this comment.
Add language identifiers to fenced code blocks (Line 472, Line 487).
These fences trigger MD040 and can fail markdown linting in CI. Add a language like text (or lua if preferred).
Proposed fix
-```
+```text
decode: 5.97 μs (20%)
modify: 0.37 μs (1%) -- triggers materialization
encode: 12.55 μs (41%) -- walks materialized root
total: 30.34 μs
@@
-```
+```text
decode: 5.97 μs (55%)
modify: 0.10 μs (1%) -- just records patch
encode: 4.80 μs (44%) -- splices original buffer
total: 10.87 μsAlso applies to: 487-487
🧰 Tools
🪛 markdownlint-cli2 (0.22.1)
[warning] 472-472: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@docs/lazy-patch-spec.md` at line 472, In docs/lazy-patch-spec.md update the
two fenced code blocks that currently have bare backticks (the blocks showing
the "decode/modify/encode/total" timings around the examples including "decode:
5.97 μs (20%)" and "decode: 5.97 μs (55%)") to include a language identifier
(e.g., change ``` to ```text or ```lua) so the MD040 lint rule is satisfied;
locate the fences that wrap the timing lines and add the chosen language token
to the opening fence.
| -- Encode the new value (we need both encoded and lua_value) | ||
| local encoded = encode(v) | ||
|
|
||
| -- Update or add patch | ||
| local found = false | ||
| for _, p in ipairs(patches) do | ||
| if p.key == k then | ||
| p.encoded_value = encoded | ||
| p.lua_value = v | ||
| found = true | ||
| break | ||
| end | ||
| end | ||
| if not found then | ||
| patches[#patches + 1] = { key = k, encoded_value = encoded, lua_value = v } | ||
| end |
There was a problem hiding this comment.
Patched table values can be serialized from stale snapshots.
Line 352 snapshots encoded_value once, but later mutations to p.lua_value (e.g. t.a = {x=1}; t.a.x = 2) are not re-encoded. The emit sites then use stale bytes and can drop updates.
💡 Minimal fix approach
+local function patch_encoded_value(p)
+ if type(p.lua_value) == "table" then
+ return encode(p.lua_value)
+ end
+ return p.encoded_value
+end
...
- parts[`#parts` + 1] = encode_string(k) .. ":" .. patch.encoded_value
+ parts[`#parts` + 1] = encode_string(k) .. ":" .. patch_encoded_value(patch)
...
- parts[`#parts` + 1] = encode_string(p.key) .. ":" .. p.encoded_value
+ parts[`#parts` + 1] = encode_string(p.key) .. ":" .. patch_encoded_value(p)
...
- value = p.encoded_value,
+ value = patch_encoded_value(p),Also applies to: 610-611, 629-629, 653-657
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@lua/qjson/table.lua` around lines 351 - 366, The code stores a one-time
snapshot in p.encoded_value which becomes stale if p.lua_value (a table) is
mutated later; instead stop caching encoded bytes: remove p.encoded_value from
patches and always call encode(p.lua_value) at emit time (replace reads of
p.encoded_value with encode(p.lua_value)). Update the creation/update site
(where patches[`#patches`+1] and the loop that sets p.encoded_value and
p.lua_value) to only store lua_value = v, and modify all emit sites that
reference p.encoded_value (including the places noted around the other ranges)
to re-encode via encode(p.lua_value) so mutations are serialized correctly.
Summary
Add lazy patch support to avoid materializing the entire root container when modifying a few fields on a
qjson.decode()result.Problem
Current behavior when modifying a field:
This defeats the lazy decode advantage for the common "decode → modify few fields → encode" workflow.
Solution
Record patches instead of materializing, then splice the original buffer during encode:
Changes
qjson_cursor_field_bytesto get field value byte spansLazyObject.__newindexto record patchesLazyObject.__indexto check patches firstencode_with_patchesfor splice-based encodinglazy_patch_spec.luadocs/lazy-patch-spec.mdwith detailed designStatus
Test Results
Summary by CodeRabbit
New Features
Documentation
Tests