perf: skip redundant validation in eager decode, inline depth check#51
Conversation
|
Warning Rate limit exceeded
You’ve run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (4)
📝 WalkthroughWalkthroughThis PR refactors validation control in the qjson parser to make structural and value validation conditional based on eager mode, enforces depth limits in eager validation, updates FFI exports to propagate the validation flag, and refreshes benchmark documentation with new hardware results and simdjson comparisons removed. ChangesEager validation control and benchmarks
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 5 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Pull request overview
This PR optimizes the eager decode path by avoiding redundant value validation during typed access and by inlining the max-depth check into eager validation, aiming to improve throughput on medium-to-large JSON payloads.
Changes:
- Add an
eager_validatedflag toDocumentand propagate it through FFI getters to skip re-validating strings/numbers during decode. - Inline
max_depthenforcement intovalidate_eager_valuesfor eager mode (lazy mode keeps the standalone depth validator). - Update benchmark tables/documentation to reflect new performance numbers and environment.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| src/validate/mod.rs | Adds max_depth to eager validation and performs depth checking during container descent; updates validator tests accordingly. |
| src/ffi.rs | Passes d.eager_validated into string/number decode helpers at FFI access sites. |
| src/doc.rs | Introduces eager_validated on Document and adjusts eager vs. lazy validation flow. |
| src/decode/string.rs | Adds skip_validation to decode_string to optionally bypass validate_string_span. |
| src/decode/number.rs | Adds skip_validation to parse_i64/parse_f64 to optionally bypass validate_number. |
| README.md | Updates benchmark table and environment description (but still references simdjson in the intro text). |
| docs/benchmarks.md | Updates benchmark environment/results tables and narrative (but contains a now-inaccurate “n/a rows” note). |
Comments suppressed due to low confidence (1)
src/decode/number.rs:12
parse_i64now indexesbytes[0]without any guard whenskip_validationis true, soparse_i64(b"", true)will panic. Consider enforcing the invariant (e.g.,debug_assert!(!bytes.is_empty())/ documenting the precondition) or handling empty input explicitly even in the skip path to keep the function panic-free.
pub(crate) fn parse_i64(bytes: &[u8], skip_validation: bool) -> Result<i64, qjson_err> {
if !skip_validation {
crate::validate::validate_number(bytes)?;
}
// After ABNF validation, integer-only inputs have no `.`/`e`/`E`.
if bytes.iter().any(|&b| b == b'.' || b == b'e' || b == b'E') {
return Err(qjson_err::QJSON_TYPE_MISMATCH);
}
let (neg, rest) = match bytes[0] {
b'-' => (true, &bytes[1..]),
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| pub(crate) fn decode_string( | ||
| buf: &[u8], start: usize, end: usize, scratch: &mut Vec<u8>, | ||
| skip_validation: bool, | ||
| ) -> Result<(*const u8, usize), qjson_err> { | ||
| let slice = &buf[start..end]; | ||
| crate::validate::validate_string_span(slice)?; | ||
| if !skip_validation { | ||
| crate::validate::validate_string_span(slice)?; | ||
| } |
| `qjson` vs. `lua-cjson` and `lua-resty-simdjson` on multimodal | ||
| chat-completion payloads, "parse + access model, temperature, and all | ||
| messages[*].content paths" workload (median ops/s under OpenResty LuaJIT 2.1, | ||
| Intel Core i5-9400; 5 rounds, deterministic payload): | ||
|
|
||
| | Size | cjson | simdjson | `qjson.parse` | `qjson.decode + access content` | speedup vs. cjson | | ||
| |---:|---:|---:|---:|---:|---:| | ||
| | 2 KB | 106,646 | 137,427 | 135,296 | 97,574 | 1.3× / 0.9× | | ||
| | 100 KB | 6,045 | 46,577 | 137,931 | 134,590 | 22.8× / 22.3× | | ||
| | 1 MB | 594 | 4,408 | 16,447 | 16,340 | 27.7× / 27.5× | | ||
| | 10 MB | 59 | 356 | 1,035 | 1,028 | 17.5× / 17.4× | | ||
| AMD EPYC Rome (Zen 2, 4 vCPUs); 5 rounds, deterministic payload): | ||
|
|
||
| | Size | cjson | `qjson.parse` | `qjson.decode + access content` | speedup vs. cjson | | ||
| |---:|---:|---:|---:|---:| |
| (`lua-resty-simdjson` was not available on the benchmark host; rows are marked | ||
| "n/a" where it would appear.) |
| @@ -439,50 +443,50 @@ mod tests { | |||
| &b"[true,false,null]"[..], &b"\"hi\""[..], &b"42"[..], | |||
| &b"{\"a\":[1,{\"b\":2}]}"[..], | |||
| ] { | |||
| assert!(validate_eager_values(buf, &ix(buf)).is_ok(), | |||
| assert!(validate_eager_values(buf, &ix(buf), 1024).is_ok(), | |||
| "grammar should accept {:?}", buf); | |||
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/decode/number.rs (1)
3-24:⚠️ Potential issue | 🔴 Critical | ⚡ Quick win
skip_validationcurrently allows invalidi64parsing and panic paths.On Line 11,
bytes[0]can panic whenskip_validationis true and input is empty. Also, non-number scalars can be accepted as integers in this mode (e.g.,b"true"), which is a correctness break.Proposed fix
pub(crate) fn parse_i64(bytes: &[u8], skip_validation: bool) -> Result<i64, qjson_err> { if !skip_validation { crate::validate::validate_number(bytes)?; + } else { + if bytes.is_empty() { + return Err(qjson_err::QJSON_INVALID_NUMBER); + } + let rest = if bytes[0] == b'-' { &bytes[1..] } else { bytes }; + if rest.is_empty() || rest.iter().any(|b| !b.is_ascii_digit()) { + return Err(qjson_err::QJSON_INVALID_NUMBER); + } } // After ABNF validation, integer-only inputs have no `.`/`e`/`E`. if bytes.iter().any(|&b| b == b'.' || b == b'e' || b == b'E') { return Err(qjson_err::QJSON_TYPE_MISMATCH); }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/decode/number.rs` around lines 3 - 24, parse_i64 currently can panic on bytes[0] when skip_validation is true and accepts non-numeric scalars (e.g., "true"); fix by adding minimal input checks when skip_validation==true: first return Err on empty bytes, then determine sign safely (check leading '-' only if bytes.len()>0), ensure the remaining slice is non-empty and contains only ASCII digits and no '.'/'e'/'E' (reject any other chars), then proceed with the existing checked_mul/checked_add logic; update parse_i64 to validate these conditions (using the same byte-slice checks as the validated path) so bytes[0] is never accessed unsafely and non-numeric tokens are rejected.
🧹 Nitpick comments (2)
src/decode/number.rs (1)
45-79: ⚡ Quick winAdd tests for the new
skip_validation=truebranch.Current tests only cover
skip_validation=false, so the new fast path is unguarded. Please add explicit cases for non-numeric input and empty input withskip_validation=true.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/decode/number.rs` around lines 45 - 79, Add tests exercising the new skip_validation=true fast path: extend the test module to call parse_i64 and parse_f64 with the skip_validation flag set to true and assert the correct error behavior for non-numeric and empty input (e.g., parse_i64(b"", true) and parse_i64(b"abc", true) should return QJSON_INVALID_NUMBER or the same errors you expect; similarly add parse_f64(b"", true) and parse_f64(b"hello", true) checks). Use the existing test names as templates (e.g., i64_rejects_empty and f64_rejects_garbage) but create distinct names indicating skip_validation_true to clearly cover the new branch for parse_i64 and parse_f64.src/validate/mod.rs (1)
431-492: ⚡ Quick winAdd eager-depth boundary tests for the new
max_depthcontract.Line 146 introduces eager depth enforcement, but current updates only test with
1024. Please add one “at limit” and one “over limit” case to lock the depth-counting behavior.Proposed test additions
+ #[test] + fn grammar_depth_at_limit_ok() { + let buf = b"[[1]]"; // depth 2 + assert!(validate_eager_values(buf, &ix(buf), 2).is_ok()); + } + + #[test] + fn grammar_depth_over_limit_rejected() { + let buf = b"[[1]]"; // depth 2 + assert_eq!( + validate_eager_values(buf, &ix(buf), 1), + Err(qjson_err::QJSON_NESTING_TOO_DEEP), + ); + }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/validate/mod.rs` around lines 431 - 492, Add two tests that assert eager-depth enforcement around the max_depth boundary by constructing a nested array/object buffer at the allowed depth and one at allowed+1; call validate_eager_values(buf, &ix(buf), 1024) (use the same 1024 used elsewhere) and assert Ok for the "at limit" buffer and Err(qjson_err::QJSON_PARSE_ERROR) for the "over limit" buffer. Place them in the same test module alongside the other grammar tests and name them clearly (e.g., grammar_accepts_at_max_depth and grammar_rejects_over_max_depth) so reviewers can find uses of validate_eager_values and ix easily. Ensure the nested buffers are generated so they produce exactly 1024 nesting levels and 1025 nesting levels respectively.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@docs/benchmarks.md`:
- Around line 5-6: Update the benchmarks wording to match the actual tables:
either add explicit "n/a" rows/columns for simdjson in the tables or change the
sentence that currently reads (`lua-resty-simdjson` was not available on the
benchmark host; rows are marked "n/a" where it would appear.) to say “simdjson
results are omitted for this run.”; modify the exact text in docs/benchmarks.md
to use the chosen phrasing or add the missing "n/a" entries so the prose and
tables are consistent.
In `@README.md`:
- Around line 101-104: The README sentence listing "qjson vs. lua-cjson and
lua-resty-simdjson" overstates the comparators because the table only shows
`qjson` and `lua-cjson`; update the phrasing in the README (the sentence
referencing qjson, lua-cjson, lua-resty-simdjson) to reflect that
`lua-resty-simdjson` was omitted for this run—for example, mention that simdjson
is optional or not included in this particular table so the text matches the
displayed metrics.
---
Outside diff comments:
In `@src/decode/number.rs`:
- Around line 3-24: parse_i64 currently can panic on bytes[0] when
skip_validation is true and accepts non-numeric scalars (e.g., "true"); fix by
adding minimal input checks when skip_validation==true: first return Err on
empty bytes, then determine sign safely (check leading '-' only if
bytes.len()>0), ensure the remaining slice is non-empty and contains only ASCII
digits and no '.'/'e'/'E' (reject any other chars), then proceed with the
existing checked_mul/checked_add logic; update parse_i64 to validate these
conditions (using the same byte-slice checks as the validated path) so bytes[0]
is never accessed unsafely and non-numeric tokens are rejected.
---
Nitpick comments:
In `@src/decode/number.rs`:
- Around line 45-79: Add tests exercising the new skip_validation=true fast
path: extend the test module to call parse_i64 and parse_f64 with the
skip_validation flag set to true and assert the correct error behavior for
non-numeric and empty input (e.g., parse_i64(b"", true) and parse_i64(b"abc",
true) should return QJSON_INVALID_NUMBER or the same errors you expect;
similarly add parse_f64(b"", true) and parse_f64(b"hello", true) checks). Use
the existing test names as templates (e.g., i64_rejects_empty and
f64_rejects_garbage) but create distinct names indicating skip_validation_true
to clearly cover the new branch for parse_i64 and parse_f64.
In `@src/validate/mod.rs`:
- Around line 431-492: Add two tests that assert eager-depth enforcement around
the max_depth boundary by constructing a nested array/object buffer at the
allowed depth and one at allowed+1; call validate_eager_values(buf, &ix(buf),
1024) (use the same 1024 used elsewhere) and assert Ok for the "at limit" buffer
and Err(qjson_err::QJSON_PARSE_ERROR) for the "over limit" buffer. Place them in
the same test module alongside the other grammar tests and name them clearly
(e.g., grammar_accepts_at_max_depth and grammar_rejects_over_max_depth) so
reviewers can find uses of validate_eager_values and ix easily. Ensure the
nested buffers are generated so they produce exactly 1024 nesting levels and
1025 nesting levels respectively.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 4f275ab5-00c2-4598-ab4e-f7d19e3957b0
📒 Files selected for processing (7)
README.mddocs/benchmarks.mdsrc/decode/number.rssrc/decode/string.rssrc/doc.rssrc/ffi.rssrc/validate/mod.rs
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.
Comments suppressed due to low confidence (2)
src/decode/number.rs:18
parse_i64now skipsvalidate_numberwhenskip_validationis true, but it still unconditionally indexesbytes[0]and subtractsc - b'0'. Ifqjson_get_i64/qjson_cursor_get_i64is called on a non-number (e.g., string/object/array),scalar_bytescan yield an empty slice or non-digit bytes; withskip_validation=truethis can panic in debug builds (empty slice / u8 underflow) or compute garbage. Add a fast guard whenskip_validationis true (at minimum handlebytes.is_empty()and non-digit bytes) or ensure callers only passskip_validation=truefor known-valid number spans.
pub(crate) fn parse_i64(bytes: &[u8], skip_validation: bool) -> Result<i64, qjson_err> {
if !skip_validation {
crate::validate::validate_number(bytes)?;
}
// After ABNF validation, integer-only inputs have no `.`/`e`/`E`.
if bytes.iter().any(|&b| b == b'.' || b == b'e' || b == b'E') {
return Err(qjson_err::QJSON_TYPE_MISMATCH);
}
let (neg, rest) = match bytes[0] {
b'-' => (true, &bytes[1..]),
_ => (false, bytes),
};
// ABNF guarantees `rest` is non-empty and digit-only here.
let mut v: i64 = 0;
for &c in rest {
let d = (c - b'0') as i64;
src/decode/number.rs:38
- With
skip_validation=true,parse_f64can returnQJSON_DECODE_FAILEDfor non-number inputs (e.g.,"true"/null/empty slice) because it bypassesvalidate_numberand relies onstr::parse. This makes FFI error codes mode-dependent (eager may return DECODE_FAILED where lazy returned INVALID_NUMBER). Consider adding a cheap precheck when skipping validation (empty slice / leading byte not-/digit) and/or mappingparsefailures back toQJSON_INVALID_NUMBERto preserve existing semantics on type-mismatch calls.
pub(crate) fn parse_f64(bytes: &[u8], skip_validation: bool) -> Result<f64, qjson_err> {
if !skip_validation {
crate::validate::validate_number(bytes)?;
}
let s = std::str::from_utf8(bytes).map_err(|_| qjson_err::QJSON_DECODE_FAILED)?;
match s.parse::<f64>() {
Ok(v) if v.is_finite() => Ok(v),
Ok(_) => Err(qjson_err::QJSON_NUMBER_OUT_OF_RANGE),
Err(_) => Err(qjson_err::QJSON_DECODE_FAILED),
}
| // Transition parent to AfterValue ahead of the | ||
| // descent; the inner container's close pops back. | ||
| *cur = parent_after_value(*cur); | ||
| if stack.len() > max_depth as usize { | ||
| return Err(qjson_err::QJSON_NESTING_TOO_DEEP); | ||
| } | ||
| stack.push(if b == b'{' { | ||
| CtxKind::ObjAfterOpen |
Summary
Two changes that eliminate redundant work in the eager decode path, yielding 13-53% throughput improvement on medium-large payloads.
P0-1: Skip redundant validation in decode phase
Documentgets aneager_validated: boolflag set totrueafter eager parse.decode_string,parse_i64, andparse_f64now accept askip_validationparameter that bypassesvalidate_string_span/validate_numberwhen the document has already been eagerly validated — eliminating the largest source of redundant work (every string/number was validated twice: once during the eager validation pass and again during decode).P0-2: Inline max_depth check into validate_eager_values
The eager path no longer calls a standalone
validate_depthpass over indices. The depth check is now performed inline duringvalidate_eager_valueswhen pushing a new container context onto the state machine stack. Lazy mode keeps the separatevalidate_depthcall unchanged.Benchmark (AMD EPYC Rome, Zen 2)
qjson.parse(before→after)qjson.decode+content(before→after)Speedup vs. cjson at 200 KB: 34× for
qjson.parse, 4.7× vs. simdjson.Full size ladder (throughput, memory delta, speedup vs. both baselines) in updated
docs/benchmarks.md.Files changed
src/doc.rs— addeager_validatedflag, restructure validation flowsrc/decode/string.rs— addskip_validationparametersrc/decode/number.rs— addskip_validationparameter to both functionssrc/ffi.rs— passd.eager_validatedat all 7 call sitessrc/validate/mod.rs— inline depth check intovalidate_eager_valuesREADME.md,docs/benchmarks.md— updated benchmark tables