Skip to content

feat(lazy): structural patching for decode+modify+encode#44

Open
membphis wants to merge 3 commits into
mainfrom
worktree-crystalline-gliding-volcano
Open

feat(lazy): structural patching for decode+modify+encode#44
membphis wants to merge 3 commits into
mainfrom
worktree-crystalline-gliding-volcano

Conversation

@membphis
Copy link
Copy Markdown
Collaborator

@membphis membphis commented May 18, 2026

Summary

Add lazy patch support to avoid materializing the entire root container when modifying a few fields on a qjson.decode() result.

Problem

Current behavior when modifying a field:

local tab = qjson.decode(json_str)  -- Returns LazyObject proxy
tab.model = "gpt4o"                 -- Triggers __newindex → materializes entire root
local out = qjson.encode(tab)       -- Walks materialized table

This defeats the lazy decode advantage for the common "decode → modify few fields → encode" workflow.

Solution

Record patches instead of materializing, then splice the original buffer during encode:

local tab = qjson.decode(json_str)  -- Returns LazyObject proxy
tab.model = "gpt4o"                 -- Records patch, no materialization
local out = qjson.encode(tab)       -- Splices original buffer with patched values

Changes

  • Rust FFI: Add qjson_cursor_field_bytes to get field value byte spans
  • Lua: Modify LazyObject.__newindex to record patches
  • Lua: Modify LazyObject.__index to check patches first
  • Lua: Add encode_with_patches for splice-based encoding
  • Tests: Add lazy_patch_spec.lua
  • Docs: Add docs/lazy-patch-spec.md with detailed design

Status

  • Basic functionality working (patch, read, encode)
  • Lazy subtrees preserved after root modification
  • Performance optimization (currently ~32μs, target ~10μs)
  • Delete field support
  • Array patch support

Test Results

Test 1 - Basic patch: PASS
Test 2 - Read patched value: PASS
Test 3 - Multiple patches: PASS
Test 4 - Preserve lazy subtrees: PASS
Test 5 - Performance: 32μs/op (target <15μs) - needs optimization

Summary by CodeRabbit

  • New Features

    • Lazy patching for efficient JSON field updates without fully materializing structures.
    • Ability to retrieve a field's byte-range from JSON cursors to enable targeted encoding.
  • Documentation

    • New design memo specifying lazy-patch semantics, encoding strategy, edge cases, and rollout plan.
  • Tests

    • Comprehensive test suite covering patching, iteration, nested behaviors, edge cases, and regressions.

Review Change Stack

Add lazy patch support to avoid materializing the entire root container
when modifying a few fields. Instead of triggering full materialization
on __newindex, we now record patches and splice the original buffer
during encode.

Changes:
- Add qjson_cursor_field_bytes FFI function to get field value byte spans
- Modify LazyObject.__newindex to record patches instead of materializing
- Modify LazyObject.__index to check patches first
- Add encode_with_patches for splice-based encoding
- Add lazy_patch_spec.lua tests

Performance: decode+modify+encode on 100KB payload
- Before: ~30 μs/op (full materialization + walking encode)
- After: ~32 μs/op (WIP - splice path needs optimization)

See docs/lazy-patch-spec.md for detailed design.
Copilot AI review requested due to automatic review settings May 18, 2026 10:49
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a "lazy patch" path so that mutating a few fields on a qjson.decode() proxy no longer materializes the whole root container. Writes are recorded in _patches/_deleted side tables, and qjson.encode() splices the original buffer (with new helper FFI qjson_cursor_field_bytes) instead of walking a materialized table.

Changes:

  • New FFI qjson_cursor_field_bytes (Rust + C header + LuaJIT cdef) returning the byte span of a field's value in the original buffer.
  • Rewrites LazyObject.__newindex/__index/__pairs to record/read patches and deletions; adds encode_with_patches and encode_lazy_object_walking_with_patches to encode lazy objects with patches.
  • Adds tests/lua/lazy_patch_spec.lua and docs/lazy-patch-spec.md.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
src/ffi.rs Adds qjson_cursor_field_bytes returning a field value's byte range.
include/qjson.h Declares the new FFI entry point.
lua/qjson.lua Adds matching LuaJIT cdef for qjson_cursor_field_bytes.
lua/qjson/table.lua Replaces materialize-on-write with patch recording; new patch-aware encode and iteration paths.
tests/lua/lazy_patch_spec.lua Spec covering basic patch/read/delete/new-field/iteration/nesting/edge cases.
docs/lazy-patch-spec.md Detailed design doc explaining problem, algorithm, edge cases, and benchmarks.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread lua/qjson/table.lua Outdated
Comment on lines +641 to +673
if has_deleted or #new_fields > 0 then
-- Fall back to walking for complex cases
return encode_lazy_object_walking_with_patches(t, patches, deleted)
end

-- Sort replacements by start offset
table.sort(replacements, function(a, b) return a.start < b.start end)

-- Build output by splicing
local parts = {}
local pos = t._bs + 1 -- 1-based Lua index

for _, r in ipairs(replacements) do
-- Copy unchanged portion (convert 0-based to 1-based)
local r_start_1based = r.start + 1
if r_start_1based > pos then
parts[#parts + 1] = buf:sub(pos, r_start_1based - 1)
end
-- Insert replacement
parts[#parts + 1] = r.value
pos = r.end_ + 1 -- end_ is exclusive, so +1 for 1-based
end

-- Copy remaining portion
if pos <= t._be then
parts[#parts + 1] = buf:sub(pos, t._be)
end

return table.concat(parts)
end

-- Walk a LazyObject with patches, handling deletions and new fields
local function encode_lazy_object_walking_with_patches(t, patches, deleted)
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 176558aencode_lazy_object_walking_with_patches is now defined ahead of encode_with_patches. Verified: tests/lua/lazy_patch_spec.lua 26/26 pass; full Lua suite 112/112.

Comment thread lua/qjson/table.lua Outdated
Comment on lines +518 to +530
-- Check if a lazy view has patches (but not necessarily dirty children)
local function has_patches(v)
local patches = rawget(v, "_patches")
return patches and #patches > 0
end

-- Check if a lazy view has deletions
local function has_deletions(v)
local deleted = rawget(v, "_deleted")
if not deleted then return false end
for _ in pairs(deleted) do return true end
return false
end
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirmed unused. grep -n "has_patches\|has_deletions" lua/qjson/table.lua only matches the definitions. Removing both.

Comment thread lua/qjson/table.lua
Comment on lines 326 to 367
LazyObject.__newindex = function(t, k, v)
local contents = materialize_object_contents(t)
-- Snapshot user-key cache BEFORE nilling internals.
-- Use next() for raw iteration: pairs() invokes __pairs on lazy tables,
-- walking the full JSON via FFI instead of the Lua-side rawget cache.
local cache = {}
local ck, cv = next(t)
while ck ~= nil do
if not INTERNAL_KEYS[ck] then
cache[ck] = cv
end
ck, cv = next(t, ck)
-- Initialize patches table if needed
if not rawget(t, "_patches") then
rawset(t, "_patches", {})
rawset(t, "_deleted", {})
end
t._doc, t._cur_box, t._cur, t._bs, t._be = nil, nil, nil, nil, nil
setmetatable(t, nil)
for _, kv in ipairs(contents) do
rawset(t, kv[1], cache[kv[1]] or kv[2])

local patches = rawget(t, "_patches")
local deleted = rawget(t, "_deleted")

if v == nil then
-- Mark key as deleted
deleted[k] = true
-- Remove from patches if previously patched
for i, p in ipairs(patches) do
if p.key == k then
table.remove(patches, i)
break
end
end
else
-- Encode the new value (we need both encoded and lua_value)
local encoded = encode(v)

-- Update or add patch
local found = false
for _, p in ipairs(patches) do
if p.key == k then
p.encoded_value = encoded
p.lua_value = v
found = true
break
end
end
if not found then
patches[#patches + 1] = { key = k, encoded_value = encoded, lua_value = v }
end

-- Remove from deleted if previously deleted
deleted[k] = nil
end
rawset(t, k, v)
end
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Real bug. t[1] = "x" on a LazyObject records {key=1, ...}, then read_object_field returns nil for non-string keys (line 117) so the read is silently lost, and encode_with_patches errors on #p.key later. Fix: raise in LazyObject.__newindex for non-string keys — matches the existing encode_object policy at line 774.

Comment thread lua/qjson/table.lua Outdated
Comment on lines +612 to +644
-- For each patch, find the field's byte range in the original buffer
for _, p in ipairs(patches) do
local rc = C.qjson_cursor_field_bytes(t._cur, p.key, #p.key, sz_a, sz_b)
if rc == QJSON_OK then
-- Existing field: replace value
replacements[#replacements + 1] = {
start = tonumber(sz_a[0]),
end_ = tonumber(sz_b[0]),
value = p.encoded_value,
}
end
-- If NOT_FOUND, it's a new field - handled separately below
end

-- Collect deleted field spans (we need to find the full field span including key)
-- For simplicity, we'll handle deletions by walking and skipping deleted keys
local has_deleted = false
for _ in pairs(deleted) do has_deleted = true; break end

-- If we have deletions or new fields, fall back to walking
-- (splicing deletions is complex due to comma handling)
local new_fields = {}
for _, p in ipairs(patches) do
local rc = C.qjson_cursor_field_bytes(t._cur, p.key, #p.key, sz_a, sz_b)
if rc == QJSON_NOT_FOUND then
new_fields[#new_fields + 1] = p
end
end

if has_deleted or #new_fields > 0 then
-- Fall back to walking for complex cases
return encode_lazy_object_walking_with_patches(t, patches, deleted)
end
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Valid. Collapsing the two passes into one — classify each patch into replacements or new_fields in a single qjson_cursor_field_bytes call.

Comment thread lua/qjson/table.lua
Comment on lines +119 to +133
-- Check patches first
local patches = rawget(self, "_patches")
if patches then
for _, p in ipairs(patches) do
if p.key == key then
return p.lua_value
end
end
end

-- Check deleted
local deleted = rawget(self, "_deleted")
if deleted and deleted[key] then
return nil
end
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Declining for now. Typical patch counts in this workflow are 1–3 modified fields; a 3-entry linear scan is faster than maintaining a parallel hash table (the FFI cost from encode/qjson_cursor_field_bytes already dominates). The O(n·m) only matters when m grows large, which the lazy-patch design discourages (assigning many fields → user is better off materializing). Happy to revisit if a profile shows it becoming hot; adding to README "Roadmap / Deferred" as a noted optimization.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 18, 2026

📝 Walkthrough

Walkthrough

Adds lazy patch tracking to qjson lazy proxies, a new FFI cursor API to obtain field byte spans, patch-aware encoding paths (splicing or walking fallback), a design/spec doc, and comprehensive Lua tests validating read/write/iterate/encode behavior.

Changes

Lazy Patching Implementation

Layer / File(s) Summary
FFI Cursor Extension for Byte-Range Lookup
include/qjson.h, lua/qjson.lua, src/ffi.rs
New qjson_cursor_field_bytes C API returns [value_start, value_end) for a given object-field key to support splice-based encoding.
Design spec: Lazy Patch
docs/lazy-patch-spec.md
Design memo describing patch data structures (_patches, _deleted), read/write/iteration semantics, encode_with_patches splice algorithm, edge cases, phased implementation plan, tests, and benchmarks.
Lua: Patch Tracking, Read/Write, Iterate, Encode
lua/qjson/table.lua
Adds per-view _patches/_deleted; __index checks patches/deletions before cursor lookups; __newindex records encoded patches instead of materializing; __pairs merges original fields with patches; is_dirty and encode_proxy route to slice, splice, or walking encoders as appropriate.
Comprehensive Lua Tests
tests/lua/lazy_patch_spec.lua, tests/lua/lazy_table_spec.lua
New tests cover single/multi-field patches, additions, deletions, type changes, nested modifications, iteration semantics with patches, edge cases, and metatable preservation; __newindex tests updated to expect patch-tracking.

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 5 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
E2e Test Quality Review ⚠️ Warning Critical bug: lines 612, 629 use stale patch.encoded_value. When patched Lua table is modified post-assignment, encode() outputs old value. Missing test coverage for this scenario. Add test: t.a={x:10}; t.a.x=20; encode(t) should output x=20. Fix: re-encode patch.lua_value if it's a mutable table, not just use cached patch.encoded_value.
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The pull request title clearly and specifically describes the main change: adding structural patching support that allows efficient decode→modify→encode workflows.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Security Check ✅ Passed No security vulnerabilities found. JSON library with no logging, database, auth, or crypto code. Error messages reveal only type info, not sensitive data.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch worktree-crystalline-gliding-volcano

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 10

🧹 Nitpick comments (3)
src/ffi.rs (1)

840-867: 💤 Low value

Consider extracting the byte-range calculation into a shared helper.

The match block for computing [start, end) from a cursor (lines 846-867) duplicates the logic in qjson_cursor_bytes (lines 738-759). Both check the lead byte, handle containers/strings by using idx_end, and delegate scalars to scalar_byte_range.

Extracting this into a helper like cursor_byte_range(d: &Document, cur: Cursor) -> Result<(usize, usize), qjson_err> would reduce duplication and ensure consistency.

♻️ Suggested helper extraction
/// Compute the byte range [start, end) for a cursor's value in the original buffer.
unsafe fn cursor_byte_range(d: &Document<'_>, cur: Cursor) -> Result<(usize, usize), qjson_err> {
    let pos = d.indices[cur.idx_start as usize] as usize;
    let lead = match d.buf.get(pos) {
        Some(b) => *b,
        None => return Err(qjson_err::QJSON_PARSE_ERROR),
    };
    match lead {
        b'{' | b'[' | b'"' => {
            let end = d.indices[cur.idx_end as usize] as usize;
            if end >= d.buf.len() {
                return Err(qjson_err::QJSON_PARSE_ERROR);
            }
            Ok((pos, end + 1))
        }
        _ => scalar_byte_range(d, cur),
    }
}

Then both qjson_cursor_bytes and qjson_cursor_field_bytes can call this helper.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/ffi.rs` around lines 840 - 867, The byte-range computation for a cursor
is duplicated; extract it into a shared unsafe helper like cursor_byte_range(d:
&Document<'_>, cur: Cursor) -> Result<(usize, usize), qjson_err> that performs
the lead-byte check, handles containers/strings by using cur.idx_end and
bounds-checking against d.buf.len(), and delegates scalars to scalar_byte_range;
then replace the duplicated logic in qjson_cursor_bytes and
qjson_cursor_field_bytes (or any other sites) to call cursor_byte_range and map
its Ok/Err to the existing *value_start/*value_end and qjson_err return
conventions.
tests/lua/lazy_patch_spec.lua (1)

3-44: ⚡ Quick win

Missing test case: deletion of non-existent field.

The test suite covers deleting existing fields and re-adding them, but doesn't test deleting a field that was never in the original JSON. This edge case should verify that deleting a non-existent field doesn't cause errors and doesn't affect encoding.

🧪 Proposed additional test
it("deletes a non-existent field without error", function()
    local t = qjson.decode('{"a":1}')
    t.b = nil  -- b never existed
    assert.is_nil(t.b)
    assert.are.equal(1, t.a)
    local out = qjson.encode(t)
    local cjson = require("cjson")
    local parsed = cjson.decode(out)
    assert.is_nil(parsed.b)
    assert.are.equal(1, parsed.a)
end)

Also applies to: 72-97

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/lua/lazy_patch_spec.lua` around lines 3 - 44, Add a new test case named
"deletes a non-existent field without error" to the Lazy Patch suite that
decodes '{"a":1}' via qjson.decode, sets t.b = nil (where b was never present),
asserts t.b is nil and t.a remains 1, then qjson.encode(t) and cjson.decode the
output and assert parsed.b is nil and parsed.a equals 1; place this alongside
the other "it" blocks so the behavior of qjson.decode/encode and the lazy-patch
logic for deleting non-existent fields is validated.
docs/lazy-patch-spec.md (1)

196-208: ⚡ Quick win

The pseudocode helper functions in the specification need clarification or mapping to actual implementation.

Lines 196 and 212 reference find_field_value_span() and find_field_span() which are not defined in the spec. While the lazy patch feature is implemented and functional (confirmed by tests/lua/lazy_patch_spec.lua), the actual implementation uses C FFI calls (C.qjson_cursor_field_bytes()) rather than these abstract function names.

The specification should either:

  1. Define what these abstract helper functions conceptually do (for algorithmic clarity)
  2. Map them explicitly to the actual C FFI interface used in the implementation
  3. Clarify upfront that this is pseudocode describing the algorithm rather than directly executable pseudocode

This will help developers understand the algorithm without needing to cross-reference the Lua implementation.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/lazy-patch-spec.md` around lines 196 - 208, The spec references
undefined helpers find_field_value_span and find_field_span; update the document
to either (a) add brief conceptual definitions of those helpers (what inputs
they take and that they return byte offsets for a field's value/span), (b)
explicitly map them to the real implementation (e.g., call sites using
C.qjson_cursor_field_bytes() via FFI) and show the equivalent return values, or
(c) add a clear note at the top that the listing is high-level pseudocode and
that the real implementation uses C FFI (see tests/lua/lazy_patch_spec.lua);
include the helper names find_field_value_span and find_field_span and the C
symbol C.qjson_cursor_field_bytes in the explanation so readers can locate the
implementation.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/lazy-patch-spec.md`:
- Around line 332-341: Specify and add a concrete implementation for
LazyObject.__pairs that merges original fields with patch entries and excludes
deletions: read raw _patches and _deleted tables, build a patch_map mapping
patch.key -> lua_value and a seen set, iterate the original pairs (using the
preserved original__pairs iterator) yielding keys not marked deleted and not
overridden by patches (returning patch value when key exists in patch_map), then
iterate remaining patch_map keys to yield new/overridden fields, ensuring each
key is yielded only once; reference LazyObject.__pairs, rawget(t, "_patches"),
rawget(t, "_deleted"), and original__pairs to locate and implement this
behavior.
- Around line 254-256: The current splice logic uses result:find("}%s*$") to
locate the closing brace after concatenating parts, which is fragile; modify the
code handling variables result, parts and close_pos so that if
result:find("}%s*$") returns nil you immediately fallback by returning
encode_lazy_object_walking(t) (or otherwise abort the splice), ensuring you do
not proceed with unsafe splicing when the closing brace cannot be reliably
located; update any callers that rely on the splice to handle this early return
and reference t._be only when the brace is confirmed.
- Around line 244-264: The prefix handling can produce invalid JSON when all
original fields were deleted and new_fields is non-empty; in the block that
computes result/close_pos/prefix (using variables result, close_pos, prefix,
parts, new_fields and function find_field_value_span), strip any trailing commas
and whitespace from prefix before appending new_fields (e.g., normalize prefix
by removing trailing "," and spaces so you never produce "{,"), or alternatively
compute whether any fields remain after applying deletions and only insert a
comma when appropriate; update the logic that currently uses
prefix:match("[^{%s]%s*$") to perform this normalization and add unit tests for
the edge case "all fields deleted + new fields added."
- Around line 86-143: The patch structure is inconsistent: LazyObject.__newindex
currently stores only encoded_value but LazyObject.__index expects p.lua_value;
modify __newindex so that when setting a non-nil value it stores both
encoded_value = encode_value(v) and lua_value = v in the patch entry (and when
updating an existing patch update both fields), and ensure deletions still
remove patch entries and clear deleted[k]; also update the patch-record
documentation to list lua_value alongside encoded_value and key so
readers/consumers know both fields exist.

In `@lua/qjson/table.lua`:
- Around line 712-717: The loop calling C.qjson_cursor_field_bytes on t._cur
should handle unexpected return codes the same way other calls do: after getting
rc, keep the existing branch for QJSON_NOT_FOUND that appends the new part, but
add an elseif checking for rc not equal to QJSON_OK (and not QJSON_NOT_FOUND)
and propagate/report the error (e.g., return nil / raise error / log with rc and
context) so failures from C.qjson_cursor_field_bytes are surfaced; locate the
loop over patches and modify the rc handling around C.qjson_cursor_field_bytes,
referencing variables/function names: patches, p, C.qjson_cursor_field_bytes,
QJSON_NOT_FOUND, parts, encode_string, p.encoded_value, t._cur, sz_a, sz_b.
- Around line 613-623: The call to C.qjson_cursor_field_bytes in the loop
(inside the function handling patches) only checks for QJSON_OK and ignores any
other non-NOT_FOUND return codes; update the loop to explicitly handle
unexpected error codes (e.g., QJSON_PARSE_ERROR) returned by
qjson_cursor_field_bytes: after calling C.qjson_cursor_field_bytes(t._cur,
p.key, `#p.key`, sz_a, sz_b) check rc and if rc is QJSON_OK proceed to add the
replacement as now, if rc is QJSON_NOT_FOUND skip to the new-field handling as
before, but for any other rc propagate or surface the error (for example return
nil plus an error message, or log and abort) so the caller of the function can
react instead of silently continuing; reference the symbols
qjson_cursor_field_bytes, QJSON_OK, QJSON_NOT_FOUND (and QJSON_PARSE_ERROR) and
the replacements table when making the change.
- Around line 209-213: The call to C.qjson_cursor_field_bytes (inside the loop
using view._cur and p.key) only handles QJSON_NOT_FOUND and silently ignores
other error codes; update the code to check rc for non-success values (e.g.,
QJSON_TYPE_MISMATCH, QJSON_PARSE_ERROR) and handle them explicitly instead of
skipping the field: after calling qjson_cursor_field_bytes, if rc ==
QJSON_NOT_FOUND keep the existing behavior, else if rc != 0 surface the error
(return an error tuple or raise error with a descriptive message including rc
and p.key, or log it via the module logger) so callers can react to parse/type
errors rather than silently skipping fields. Ensure to reference
qjson_cursor_field_bytes, view._cur, p.key, and the QJSON_* constants when
implementing the check.

In `@tests/lua/lazy_patch_spec.lua`:
- Around line 229-275: Add a new test case inside the describe("Lazy Patch -
edge cases", ...) block named it("handles all fields deleted with new fields
added") that uses qjson.decode to load '{"a":1,"b":2}', sets t.a = nil and t.b =
nil, then assigns t.c = 3 and t.d = 4, asserts the nil and value expectations,
calls qjson.encode and parses with cjson.decode and asserts parsed.a and
parsed.b are nil and parsed.c and parsed.d equal 3 and 4; this verifies
qjson.encode handles the "all fields deleted then new fields added" edge case
without emitting invalid JSON.
- Around line 246-254: The test "handles unicode in values" currently only uses
ASCII; update the test inside the it block that calls qjson.decode/qjson.encode
to use a multi-byte UTF-8 string (e.g., include accented characters and CJK or
emoji) for t.a so the round-trip via qjson.encode and cjson.decode actually
validates Unicode handling; ensure the asserts (assert.are.equal) compare the
original Unicode string to both t.a after assignment and parsed.a after
re-decoding to confirm multi-byte sequences are preserved.
- Around line 240-244: The test "handles special characters in keys" decodes
'{"a.b":1}', patches t["a.b"]=10 and only checks the in-memory value; extend the
test to also call qjson.encode(t) and assert the encoded string matches the
expected JSON with the special key correctly quoted/escaped (e.g.,
'{"a.b":10}'), so the splice/encoding logic in qjson.encode is validated; update
the it block that uses qjson.decode, the local variable t, and add an assertion
against qjson.encode(t).

---

Nitpick comments:
In `@docs/lazy-patch-spec.md`:
- Around line 196-208: The spec references undefined helpers
find_field_value_span and find_field_span; update the document to either (a) add
brief conceptual definitions of those helpers (what inputs they take and that
they return byte offsets for a field's value/span), (b) explicitly map them to
the real implementation (e.g., call sites using C.qjson_cursor_field_bytes() via
FFI) and show the equivalent return values, or (c) add a clear note at the top
that the listing is high-level pseudocode and that the real implementation uses
C FFI (see tests/lua/lazy_patch_spec.lua); include the helper names
find_field_value_span and find_field_span and the C symbol
C.qjson_cursor_field_bytes in the explanation so readers can locate the
implementation.

In `@src/ffi.rs`:
- Around line 840-867: The byte-range computation for a cursor is duplicated;
extract it into a shared unsafe helper like cursor_byte_range(d: &Document<'_>,
cur: Cursor) -> Result<(usize, usize), qjson_err> that performs the lead-byte
check, handles containers/strings by using cur.idx_end and bounds-checking
against d.buf.len(), and delegates scalars to scalar_byte_range; then replace
the duplicated logic in qjson_cursor_bytes and qjson_cursor_field_bytes (or any
other sites) to call cursor_byte_range and map its Ok/Err to the existing
*value_start/*value_end and qjson_err return conventions.

In `@tests/lua/lazy_patch_spec.lua`:
- Around line 3-44: Add a new test case named "deletes a non-existent field
without error" to the Lazy Patch suite that decodes '{"a":1}' via qjson.decode,
sets t.b = nil (where b was never present), asserts t.b is nil and t.a remains
1, then qjson.encode(t) and cjson.decode the output and assert parsed.b is nil
and parsed.a equals 1; place this alongside the other "it" blocks so the
behavior of qjson.decode/encode and the lazy-patch logic for deleting
non-existent fields is validated.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 39a453a2-e4f7-49ae-9d3e-5ab1c9602e90

📥 Commits

Reviewing files that changed from the base of the PR and between 11a644a and 69e6f77.

📒 Files selected for processing (6)
  • docs/lazy-patch-spec.md
  • include/qjson.h
  • lua/qjson.lua
  • lua/qjson/table.lua
  • src/ffi.rs
  • tests/lua/lazy_patch_spec.lua

Comment thread docs/lazy-patch-spec.md
Comment thread docs/lazy-patch-spec.md
Comment on lines +244 to +264
-- Handle new fields (not in original)
local new_fields = {}
for _, p in ipairs(patches) do
if not find_field_value_span(t, p.key) then
new_fields[#new_fields + 1] = '"' .. p.key .. '":' .. p.encoded_value
end
end

if #new_fields > 0 then
-- Insert before closing brace
local result = table.concat(parts)
local close_pos = result:find("}%s*$")
if close_pos then
local prefix = result:sub(1, close_pos - 1)
-- Add comma if there are existing fields
if prefix:match("[^{%s]%s*$") then
prefix = prefix .. ","
end
return prefix .. table.concat(new_fields, ",") .. "}"
end
end
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot May 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚖️ Poor tradeoff

Risk of invalid JSON when all fields are deleted.

When all original fields are deleted and new fields are added, the logic at line 259 checks if prefix:match("[^{%s]%s*$") to decide whether to add a comma. However, if the prefix is just {, this pattern won't match, and no comma is added. But if replacements included deletions that left trailing commas, you could end up with malformed JSON like {,"new_field":1}.

Consider a more robust approach:

  1. Track whether any fields remain after applying deletions
  2. Normalize comma handling by removing any trailing comma in the prefix before appending new fields
  3. Add unit tests for this specific edge case (all fields deleted + new fields added)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/lazy-patch-spec.md` around lines 244 - 264, The prefix handling can
produce invalid JSON when all original fields were deleted and new_fields is
non-empty; in the block that computes result/close_pos/prefix (using variables
result, close_pos, prefix, parts, new_fields and function
find_field_value_span), strip any trailing commas and whitespace from prefix
before appending new_fields (e.g., normalize prefix by removing trailing "," and
spaces so you never produce "{,"), or alternatively compute whether any fields
remain after applying deletions and only insert a comma when appropriate; update
the logic that currently uses prefix:match("[^{%s]%s*$") to perform this
normalization and add unit tests for the edge case "all fields deleted + new
fields added."

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation does not use the spec's }%s*$ splice path for this case. When deletions or new fields are present, encode_with_patches falls through to encode_lazy_object_walking_with_patches (table.lua:641-643), which walks original keys (skipping deleted) and then appends new fields — it cannot produce {,. Splice path is only used for the "all-existing-key replacements, no deletions" case. Will add a clarifying note to the spec; adding a regression test for the all-deleted-plus-new edge case (R8) regardless.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@membphis, thanks for the detailed clarification. That makes sense — since encode_proxy routes any case involving deletions or new fields to encode_lazy_object_walking_with_patches before reaching the splice path, the {, scenario can't arise in practice. My concern was against the spec's pseudo-code in isolation, which doesn't make the fallthrough guard explicit in that section.

Adding a clarifying note to the spec (e.g., a comment in the encode_with_patches block stating "only reached when patches are pure replacements of existing keys with no deletions/new fields") would make the invariant self-documenting and prevent future confusion. The R8 regression test is a good defensive addition regardless.

Comment thread docs/lazy-patch-spec.md
Comment thread docs/lazy-patch-spec.md
Comment on lines +332 to +341

```lua
tab.model = "gpt4o"
for k, v in qjson.pairs(tab) do
print(k, v)
end
```

**Handling**: `__pairs` must merge original fields with patches and exclude deleted fields.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot May 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Missing algorithm specification for __pairs iteration.

The edge case section mentions that "__pairs must merge original fields with patches and exclude deleted fields" (line 340), but the algorithm for implementing this is not provided. Given the complexity of correctly merging three sources (original fields, patches, deletions), this algorithm should be specified.

Add a section showing the __pairs implementation, for example:

LazyObject.__pairs = function(t)
    local patches = rawget(t, "_patches") or {}
    local deleted = rawget(t, "_deleted") or {}
    local seen = {}
    local patch_map = {}
    
    -- Build patch map
    for _, p in ipairs(patches) do
        patch_map[p.key] = p.lua_value
        seen[p.key] = true
    end
    
    -- Iterator state
    local orig_iter, orig_state, orig_key = original__pairs(t)
    local patch_idx = 1
    local patch_keys = {}
    for k in pairs(patch_map) do
        patch_keys[`#patch_keys` + 1] = k
    end
    
    return function(state, k)
        -- Iterate original fields first (skipping deleted & patched)
        while true do
            local next_k, next_v = orig_iter(orig_state, k)
            if next_k == nil then break end
            if not deleted[next_k] and not seen[next_k] then
                seen[next_k] = true
                return next_k, patch_map[next_k] or next_v
            end
            k = next_k
        end
        
        -- Then iterate patched/new fields
        while patch_idx <= `#patch_keys` do
            local pk = patch_keys[patch_idx]
            patch_idx = patch_idx + 1
            if not seen[pk] then
                seen[pk] = true
                return pk, patch_map[pk]
            end
        end
        
        return nil
    end, t, nil
end
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/lazy-patch-spec.md` around lines 332 - 341, Specify and add a concrete
implementation for LazyObject.__pairs that merges original fields with patch
entries and excludes deletions: read raw _patches and _deleted tables, build a
patch_map mapping patch.key -> lua_value and a seen set, iterate the original
pairs (using the preserved original__pairs iterator) yielding keys not marked
deleted and not overridden by patches (returning patch value when key exists in
patch_map), then iterate remaining patch_map keys to yield new/overridden
fields, ensuring each key is yielded only once; reference LazyObject.__pairs,
rawget(t, "_patches"), rawget(t, "_deleted"), and original__pairs to locate and
implement this behavior.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

__pairs is implemented at lua/qjson/table.lua:169-225 (lazy_object_iter + LazyObject.__pairs): iterates original entries via qjson_cursor_object_entry_at, skips deleted, returns patched value when a patch covers the key, then iterates remaining patches and yields only those whose key is QJSON_NOT_FOUND in the original. Adding a pointer to this in the spec.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@membphis, thanks for the clarification! The implementation at lua/qjson/table.lua:169-225 covers the algorithm well — using qjson_cursor_object_entry_at to iterate originals, skipping deleted keys, returning patched values inline, and then draining new-only patches at the end is exactly the right approach. Adding a cross-reference from the spec's edge-case section to that implementation will close the documentation gap nicely.

Comment thread lua/qjson/table.lua
Comment thread lua/qjson/table.lua Outdated
Comment thread lua/qjson/table.lua Outdated
Comment thread tests/lua/lazy_patch_spec.lua
Comment thread tests/lua/lazy_patch_spec.lua
Comment thread tests/lua/lazy_patch_spec.lua
- Move encode_lazy_object_walking_with_patches before encode_with_patches
  (Lua local function defined later was unresolvable at compile time,
  causing nil-call errors on new-field/deletion paths).
- Update tests in lazy_patch_spec.lua and lazy_table_spec.lua to match
  the spec's edge-case-1 design: writing to a LazyObject records a patch
  on the object instead of materializing it, so the metatable stays
  LazyObject after assignment.
@membphis
Copy link
Copy Markdown
Collaborator Author

Responses to the three nitpicks in the top-level review:

  • src/ffi.rs 840-867 (extract cursor_byte_range helper) — declining for now. Two call sites with different return conventions (qjson_cursor_bytes writes via *mut usize outparams; qjson_cursor_field_bytes does the same but after first resolving a child cursor). The shared logic is ~10 lines and only one path. Adding the helper introduces an indirection without consolidating any divergent logic. Will revisit if a third call site appears.

  • tests/lua/lazy_patch_spec.lua (delete non-existent field test) — adding.

  • docs/lazy-patch-spec.md 196-208 (undefined find_field_value_span / find_field_span helpers) — these are pseudocode names in the design listing; the actual implementation calls C.qjson_cursor_field_bytes (rust FFI defined in src/ffi.rs:821). Adding an upfront note that the doc is high-level pseudocode and pointing at the FFI symbol.

- LazyObject.__newindex: raise on non-string keys instead of silently
  recording an unreadable patch (Copilot C3).
- Remove unused has_patches / has_deletions helpers (Copilot C2).
- encode_with_patches: classify each patch in a single qjson_cursor_field_bytes
  pass (was doing two) and propagate unexpected return codes via check()
  (Copilot C4 + CodeRabbit R6).
- encode_lazy_object_walking_with_patches: propagate unexpected
  qjson_cursor_field_bytes return codes (CodeRabbit R7).
- lazy_object_iter: same error-propagation fix (CodeRabbit R5).
- tests/lua/lazy_patch_spec.lua: extend special-key test to verify encoded
  output, swap the unicode test to multi-byte UTF-8, add "delete non-existent
  field", "all originals deleted + new fields added", and "non-string key
  raises" cases (CodeRabbit R8/R9/R10 + nitpick N2).
- docs/lazy-patch-spec.md: add upfront note that the listing is design
  pseudocode (pointing helper names at qjson_cursor_field_bytes), and
  correct the patch-record example to include lua_value alongside
  encoded_value (CodeRabbit R1 + nitpick N3).
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
tests/lua/lazy_patch_spec.lua (1)

120-128: ⚡ Quick win

Add a regression test for mutating a patched table after assignment.

Please add a case like t.a = {x=1}; t.a.x = 2; and assert qjson.encode(t) contains x=2. Current coverage only checks immediate assignment snapshots.

Also applies to: 251-260

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/lua/lazy_patch_spec.lua` around lines 120 - 128, Add a regression case
to the "changes scalar to object" test in lazy_patch_spec.lua that verifies
mutations to a patched table after assignment are preserved: after creating t
via qjson.decode and assigning t.a = {x = 1}, mutate the nested field (t.a.x =
2) and then qjson.encode(t) and decode with cjson (or assert the encoded string)
to assert the value is 2; update the existing it("changes scalar to object"...)
block (and similar block around lines 251-260) to perform t.a = {x=1}; t.a.x =
2; and assert the encoded/decoded result shows x == 2 so the lazy patching
preserves later mutations.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/lazy-patch-spec.md`:
- Line 472: In docs/lazy-patch-spec.md update the two fenced code blocks that
currently have bare backticks (the blocks showing the
"decode/modify/encode/total" timings around the examples including "decode:  
5.97 μs (20%)" and "decode:   5.97 μs (55%)") to include a language identifier
(e.g., change ``` to ```text or ```lua) so the MD040 lint rule is satisfied;
locate the fences that wrap the timing lines and add the chosen language token
to the opening fence.

In `@lua/qjson/table.lua`:
- Around line 351-366: The code stores a one-time snapshot in p.encoded_value
which becomes stale if p.lua_value (a table) is mutated later; instead stop
caching encoded bytes: remove p.encoded_value from patches and always call
encode(p.lua_value) at emit time (replace reads of p.encoded_value with
encode(p.lua_value)). Update the creation/update site (where patches[`#patches`+1]
and the loop that sets p.encoded_value and p.lua_value) to only store lua_value
= v, and modify all emit sites that reference p.encoded_value (including the
places noted around the other ranges) to re-encode via encode(p.lua_value) so
mutations are serialized correctly.

---

Nitpick comments:
In `@tests/lua/lazy_patch_spec.lua`:
- Around line 120-128: Add a regression case to the "changes scalar to object"
test in lazy_patch_spec.lua that verifies mutations to a patched table after
assignment are preserved: after creating t via qjson.decode and assigning t.a =
{x = 1}, mutate the nested field (t.a.x = 2) and then qjson.encode(t) and decode
with cjson (or assert the encoded string) to assert the value is 2; update the
existing it("changes scalar to object"...) block (and similar block around lines
251-260) to perform t.a = {x=1}; t.a.x = 2; and assert the encoded/decoded
result shows x == 2 so the lazy patching preserves later mutations.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: d05f2b5c-890c-485e-9098-2ca7cf5b41ac

📥 Commits

Reviewing files that changed from the base of the PR and between 69e6f77 and 208d8b9.

📒 Files selected for processing (4)
  • docs/lazy-patch-spec.md
  • lua/qjson/table.lua
  • tests/lua/lazy_patch_spec.lua
  • tests/lua/lazy_table_spec.lua

Comment thread docs/lazy-patch-spec.md

### Current Implementation Profile

```
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add language identifiers to fenced code blocks (Line 472, Line 487).

These fences trigger MD040 and can fail markdown linting in CI. Add a language like text (or lua if preferred).

Proposed fix
-```
+```text
 decode:   5.97 μs (20%)
 modify:   0.37 μs (1%)   -- triggers materialization
 encode:  12.55 μs (41%)  -- walks materialized root
 total:   30.34 μs
@@
-```
+```text
 decode:   5.97 μs (55%)
 modify:   0.10 μs (1%)   -- just records patch
 encode:   4.80 μs (44%)  -- splices original buffer
 total:   10.87 μs

Also applies to: 487-487

🧰 Tools
🪛 markdownlint-cli2 (0.22.1)

[warning] 472-472: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/lazy-patch-spec.md` at line 472, In docs/lazy-patch-spec.md update the
two fenced code blocks that currently have bare backticks (the blocks showing
the "decode/modify/encode/total" timings around the examples including "decode: 
5.97 μs (20%)" and "decode:   5.97 μs (55%)") to include a language identifier
(e.g., change ``` to ```text or ```lua) so the MD040 lint rule is satisfied;
locate the fences that wrap the timing lines and add the chosen language token
to the opening fence.

Comment thread lua/qjson/table.lua
Comment on lines +351 to +366
-- Encode the new value (we need both encoded and lua_value)
local encoded = encode(v)

-- Update or add patch
local found = false
for _, p in ipairs(patches) do
if p.key == k then
p.encoded_value = encoded
p.lua_value = v
found = true
break
end
end
if not found then
patches[#patches + 1] = { key = k, encoded_value = encoded, lua_value = v }
end
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Patched table values can be serialized from stale snapshots.

Line 352 snapshots encoded_value once, but later mutations to p.lua_value (e.g. t.a = {x=1}; t.a.x = 2) are not re-encoded. The emit sites then use stale bytes and can drop updates.

💡 Minimal fix approach
+local function patch_encoded_value(p)
+    if type(p.lua_value) == "table" then
+        return encode(p.lua_value)
+    end
+    return p.encoded_value
+end
...
-                parts[`#parts` + 1] = encode_string(k) .. ":" .. patch.encoded_value
+                parts[`#parts` + 1] = encode_string(k) .. ":" .. patch_encoded_value(patch)
...
-            parts[`#parts` + 1] = encode_string(p.key) .. ":" .. p.encoded_value
+            parts[`#parts` + 1] = encode_string(p.key) .. ":" .. patch_encoded_value(p)
...
-                value = p.encoded_value,
+                value = patch_encoded_value(p),

Also applies to: 610-611, 629-629, 653-657

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lua/qjson/table.lua` around lines 351 - 366, The code stores a one-time
snapshot in p.encoded_value which becomes stale if p.lua_value (a table) is
mutated later; instead stop caching encoded bytes: remove p.encoded_value from
patches and always call encode(p.lua_value) at emit time (replace reads of
p.encoded_value with encode(p.lua_value)). Update the creation/update site
(where patches[`#patches`+1] and the loop that sets p.encoded_value and
p.lua_value) to only store lua_value = v, and modify all emit sites that
reference p.encoded_value (including the places noted around the other ranges)
to re-encode via encode(p.lua_value) so mutations are serialized correctly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants