Skip to content

Initial v1: Rust JSON decoder for LuaJIT FFI#1

Merged
membphis merged 30 commits into
masterfrom
worktree-rust-quick-json-decode-v1
May 15, 2026
Merged

Initial v1: Rust JSON decoder for LuaJIT FFI#1
membphis merged 30 commits into
masterfrom
worktree-rust-quick-json-decode-v1

Conversation

@membphis
Copy link
Copy Markdown
Collaborator

Summary

  • Rust cdylib JSON decoder (libquickdecode.so) exposed to LuaJIT via FFI. Optimized for "parse once, extract a few fields, discard" workloads — skips the full Lua-table construction that lua-cjson has to pay.
  • Two-phase architecture: Phase 1 runs a single SIMD structural scan (scalar fallback + AVX2 with PCLMUL, runtime-dispatched) and records byte offsets of structural chars; Phase 2 lazily resolves paths and decodes values, with a per-container sibling-skip cache for shared-prefix access.
  • Ships the C header (include/lua_quick_decode.h), a LuaJIT wrapper module (lua/quickdecode.lua with Doc + Cursor OO API), busted spec files, and a vs lua-cjson benchmark script.

Design spec: docs/superpowers/specs/2026-05-15-rust-quick-json-decode-design.md
Implementation plan: docs/superpowers/plans/2026-05-15-rust-quick-json-decode.md

Test plan

  • cargo test --release — 95 tests passing (65 unit + 29 integration + 1 proptest with 2000 cases)
  • cargo test --features test-panic --release — verifies the catch_unwind panic barrier
  • proptest cross-check: Scalar vs AVX2 produce bit-identical output across 2000 random inputs
  • AVX2 tail-bug repro (193-byte [{}{}...,]) confirmed fixed
  • Lua side: busted tests/lua --lpath='./lua/?.lua' --cpath='./target/release/lib?.so' (requires LuaJIT + busted; not run in CI here)
  • luajit benches/lua_bench.lua against lua-cjson for end-to-end timing/allocation comparison

Roadmap (deferred items captured in README.md)

ARM64 NEON backend, SmallVec fast path, SIMD backslash search, lexical float parser, lossless 64-bit int cdata mode, skip-cache LRU, Phase 1 error position, AVX2 tail-bypass optimization.

membphis added 30 commits May 15, 2026 12:07
Captures the two-phase architecture (SIMD structural scan + lazy
field decode), C ABI shape for LuaJIT FFI, sibling-skip cache for
shared-prefix path access, and benchmark targets vs lua-cjson.

Also seeds README Roadmap with NEON backend deferred to a later
iteration.
18 tasks covering: project scaffold, ScalarScanner with shallow JSON
validation, Document + Phase 1 FFI, zero-alloc PathIter, Cursor with
lazy sibling-skip cache, string escape decode, number decode, typed
getters and cursor C ABI, panic safety, AVX2 scanner staged through
four sub-tasks (structural mask, escape, PCLMUL inside-string,
multi-chunk carry + dispatch + proptest cross-check), and LuaJIT
wrapper with busted tests and lua-cjson benchmark.
Two jobs:
- rust: cargo build --release, cargo test --release, and cargo test
  --features test-panic --release on ubuntu-latest with stable Rust.
- lua: depends on rust passing; installs LuaJIT 2.1 + LuaRocks via
  leafo/gh-actions-{lua,luarocks}, then busted and lua-cjson, and
  runs busted tests/lua against the built libquickdecode.so.

Triggers on push to master/main and on all pull requests.
leafo/gh-actions-lua@v10 was failing with 404 when downloading
luajit-2.1.0-beta3 (the LuaJIT project removed that tag). Install
LuaJIT, lua5.1 dev headers, and luarocks from Ubuntu's apt instead;
LuaJIT is ABI-compatible with Lua 5.1 so rocks built against 5.1
headers load fine under luajit. busted runs the tests via
'--lua=\$(which luajit)'.
@membphis membphis merged commit 63d3e00 into master May 15, 2026
2 checks passed
@membphis membphis deleted the worktree-rust-quick-json-decode-v1 branch May 15, 2026 14:45
membphis added a commit that referenced this pull request May 15, 2026
Address the Important #1 + three Minor findings from the local review
of PR #4 (the docs drifted from the code introduced in the same PR):

- benches/lua_bench.lua: rewrite the size-accuracy comment. The prior
  draft referenced a `remaining + slack` upper-bound expression that
  the final code does not use; replace with the two-branch behaviour
  actually implemented (normal `min(500K, remaining)`; final image
  falls through to `max(1024, remaining)`) and the real worst-case
  overshoot (~1 KB, not "~10 KB").
- Makefile: explain the `:[^#]*## ` FS choice and its tradeoff (targets
  with `#` in the prerequisite list won't render — none today).
- tests/scanner_crosscheck.rs: clarify that the indices assertion holds
  for both Ok and Err cases because scan_emit_resume always completes
  emission before any potential Err, and validate_brackets does not
  modify the index list.
- src/scan/avx2.rs: spell out the scalar_start invariant including the
  exact boundary case (scalar_start == buf.len() when i == buf.len()-1
  && in_string != 0 && bs_carry != 0) that scan_emit_resume's post-loop
  in_str check covers.

No code changes. All tests still pass.
membphis added a commit that referenced this pull request May 15, 2026
* chore: address PR #3 review hygiene items

Five Important findings + four small Minor follow-ups from the PR #3
local code review:

Important
- tests/scanner_crosscheck.rs — drop the stale "AVX2 does not validate
  brackets" comment and tighten the proptest to require full Result
  equality (Ok/Err verdict + error offset) and indices equality on every
  case, not just on Ok. After the tail-bypass fix scalar and AVX2 run the
  same scan_emit_resume + validate_brackets pipeline, so this is now
  enforceable. Still passes the 2000-case proptest.
- benches/lua_bench.lua — switch image-size RNG from math.random (which
  delegates to libc rand() and varies across machines) to a deterministic
  Park-Miller LCG. Same target_bytes now produces byte-identical output
  on any LuaJIT 2.1 host.
- benches/lua_bench.lua — tighten the loop so the actual payload size
  matches its label. Cap the per-iteration `upper` at `remaining` and
  allow the last image to shrink below the 50 KB floor when fewer bytes
  remain. Observed: every scenario now lands within ~0.1% of its label
  (100k -> 102351 bytes, 1m -> 1048527, 10m -> 10485711) vs up to +49%
  before.
- src/scan/avx2.rs — remove the dead `else if in_string != 0` branch
  inside the tail handler. `i < buf.len()` makes the `scalar_start <=
  buf.len()` check trivially true, and scan_emit_resume already returns
  Err(buf.len()) when start == buf.len() and in_string is set. Replace
  the unreachable branch with a comment that documents the invariant.
- src/scan/mod.rs — drop the inaccurate "the check is defensive" wording
  from validate_brackets. The function is correctness-coupled with the
  scanner that produced its index list; a forged quote would flip
  in_string and mask later mismatches.

Minor
- Makefile — match the spec's "target — description" help format (em-dash
  separator) and tighten the awk FS pattern from `:.*## ` to `:[^#]*## `
  so descriptions containing `##` aren't truncated by the greedy `.*`.
- benches/lua_bench.lua — bump 2m / 5m / 10m iters from 10 to 20 so
  bigger-payload measurements ride out one-shot allocator / page-fault
  noise.
- src/scan/avx2.rs — rename `escaped_quotes_do_not_trip_fastpath` to
  `escaped_quotes_remain_correct_with_fastpath`. The test asserts parity
  with scalar, not that the branch was taken (we have no counter to
  observe that), so the name should reflect what's actually checked.

cargo test --release: 70+3+10+1+5+3+12+1+1 = 106 unit/integration tests
plus the 2000-case proptest, all pass. cargo test --release
--no-default-features (scalar-only build): 60+3+10+1+5+3+12+1+1 = 96
tests, all pass.

* docs: align review-followup comments with actual behavior

Address the Important #1 + three Minor findings from the local review
of PR #4 (the docs drifted from the code introduced in the same PR):

- benches/lua_bench.lua: rewrite the size-accuracy comment. The prior
  draft referenced a `remaining + slack` upper-bound expression that
  the final code does not use; replace with the two-branch behaviour
  actually implemented (normal `min(500K, remaining)`; final image
  falls through to `max(1024, remaining)`) and the real worst-case
  overshoot (~1 KB, not "~10 KB").
- Makefile: explain the `:[^#]*## ` FS choice and its tradeoff (targets
  with `#` in the prerequisite list won't render — none today).
- tests/scanner_crosscheck.rs: clarify that the indices assertion holds
  for both Ok and Err cases because scan_emit_resume always completes
  emission before any potential Err, and validate_brackets does not
  modify the index list.
- src/scan/avx2.rs: spell out the scalar_start invariant including the
  exact boundary case (scalar_start == buf.len() when i == buf.len()-1
  && in_string != 0 && bs_carry != 0) that scan_emit_resume's post-loop
  in_str check covers.

No code changes. All tests still pass.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant