Motivation
As a JSON parser library, correctness and stability are fundamental requirements. Before wider adoption, we should systematically audit our validation coverage against:
- RFC 8259 (The JSON Data Interchange Format) - the authoritative specification
- JSONTestSuite (https://github.com/nst/JSONTestSuite) - the de facto industry standard for parser validation, containing 300+ edge cases
- Peer implementations - lua-cjson and lua-resty-simdjson both have mature test suites we can learn from
This audit will identify gaps in our validation logic and help prioritize fixes before they become breaking changes.
Current State
What We Validate Well
| Category |
Status |
Location |
| Bracket/brace pairing |
✅ Complete |
scan/scalar.rs, scan/mod.rs::validate_brackets |
| String escape sequences |
✅ Complete |
decode/string.rs - all 8 escapes + \uXXXX + surrogate pairs |
| Numeric parsing (i64/f64) |
✅ Complete |
decode/number.rs - overflow detection, type mismatch |
| Path resolution |
✅ Complete |
path.rs - keys, indices, nesting |
| Type detection |
✅ Complete |
doc.rs::type_of |
| SIMD/scalar parity |
✅ Complete |
scanner_crosscheck.rs - proptest with 2000 cases |
| FFI safety |
✅ Complete |
ffi.rs - panic barrier, null pointer checks |
What We Don't Validate (Potential Gaps)
| Category |
Current Behavior |
RFC 8259 Requirement |
Risk |
| Leading zeros in numbers |
Accepted (007 -> 7) |
MUST reject |
Medium |
| Leading plus sign |
Accepted (+1 -> 1) |
MUST reject |
Medium |
| Bare decimal point |
Accepted (.5, 1.) |
MUST reject |
Medium |
| Max nesting depth |
Unlimited |
Implementation-defined |
Medium (stack overflow) |
| Control chars in strings |
Accepted (0x00-0x1F) |
MUST be escaped |
Low |
| Invalid UTF-8 sequences |
Passed through |
MUST be valid UTF-8 |
Low |
| Trailing content after root |
Ignored |
Should be rejected |
Low |
| UTF-8 BOM |
Not handled |
Implementation-defined |
Low |
| Duplicate object keys |
Last wins (implicit) |
Implementation-defined |
Low |
Test Coverage Comparison
lua-cjson tests:
- RFC 4627 example files
- Configurable nesting depth limits (default 5, max 1000)
- Invalid number detection (hex, leading zeros, Inf/NaN)
- Locale handling (comma decimal separators)
- Comment support (single/multi-line)
lua-resty-simdjson tests:
- Deep nesting (10 levels)
- Large payloads (2100+ elements)
- Reentrancy behavior
- Numeric precision (14-16 digits)
- Null compatibility (ngx.null, cjson.null)
JSONTestSuite categories:
y_* - 100+ cases parsers MUST accept
n_* - 200+ cases parsers MUST reject
i_* - 50+ implementation-defined edge cases
Spec
Phase 1: RFC 8259 Compliance Test Suite (Next Step)
Goal: Build a comprehensive test suite based on RFC 8259, referencing lua-cjson's test approach.
Reference: https://github.com/openresty/lua-cjson/tree/master/tests
Test Categories to Cover:
1.1 Valid JSON (MUST accept)
// Primitive values
"null"
"true"
"false"
"0"
"-0"
"123"
"-456"
"3.14"
"-2.718"
"1e10"
"1E10"
"1e+10"
"1e-10"
"1.5e2"
"\"\"" // empty string
"\"hello\""
"\"hello\\nworld\"" // escaped newline
"\"\\u0041\"" // unicode escape -> "A"
"\"\\uD83D\\uDE00\"" // surrogate pair -> emoji
"[]" // empty array
"[1,2,3]"
"[1, 2, 3]" // with whitespace
"{}" // empty object
"{\"a\":1}"
"{\"a\": 1, \"b\": 2}" // with whitespace
"[{\"a\":[1,{\"b\":2}]}]" // nested structures
1.2 Invalid JSON (MUST reject)
// Structural errors
"" // empty input
"{" // unclosed brace
"[" // unclosed bracket
"{]" // mismatched brackets
"[}" // mismatched brackets
"{\"a\":}" // missing value
"{\"a\"}" // missing colon and value
"[,]" // leading comma
"[1,]" // trailing comma
"{\"a\":1,}" // trailing comma in object
"[1 2]" // missing comma
// Invalid numbers
"+1" // leading plus
"01" // leading zero
"00" // leading zeros
".5" // no integer part
"1." // no fraction part
"1.e5" // no fraction digits
"0x1F" // hex notation
"NaN" // not a JSON value
"Infinity" // not a JSON value
"-Infinity" // not a JSON value
"1e" // incomplete exponent
"1e+" // incomplete exponent
// Invalid strings
"\"hello" // unclosed string
"'hello'" // single quotes
"\"\\x41\"" // invalid escape sequence
"\"\\u00G0\"" // invalid hex in unicode
"\"\\uD800\"" // lone high surrogate
"\"\\uDC00\"" // lone low surrogate
// Invalid literals
"TRUE" // wrong case
"False" // wrong case
"NULL" // wrong case
"nil" // not JSON
"undefined" // not JSON
// Trailing content
"{}[]" // multiple values
"1 2" // multiple values
"true false" // multiple values
1.3 Whitespace Handling
// Valid whitespace (space, tab, newline, carriage return)
" { } "
"\t{\t}\t"
"\n{\n}\n"
"\r{\r}\r"
"{ \"a\" : 1 }"
"[\n 1,\n 2\n]"
1.4 String Edge Cases
// All valid escape sequences
"\"\\\"\"" // \"
"\"\\\\\"" // \\
"\"\\/\"" // \/
"\"\\b\"" // \b (backspace)
"\"\\f\"" // \f (form feed)
"\"\\n\"" // \n (newline)
"\"\\r\"" // \r (carriage return)
"\"\\t\"" // \t (tab)
// Unicode edge cases
"\"\\u0000\"" // U+0000 (NUL)
"\"\\u007F\"" // U+007F (DEL)
"\"\\u0080\"" // U+0080 (first non-ASCII)
"\"\\uFFFF\"" // U+FFFF (BMP limit)
"\"\\uD834\\uDD1E\"" // U+1D11E (surrogate pair)
1.5 Number Edge Cases
// Valid numbers
"0"
"-0"
"1"
"-1"
"123456789"
"1.0"
"1.5"
"-1.5"
"1e1"
"1E1"
"1e+1"
"1e-1"
"1.5e10"
"-1.5e-10"
"1E100" // large exponent
"1e-100" // small exponent
"9223372036854775807" // i64::MAX
"-9223372036854775808" // i64::MIN
// Invalid numbers (to be rejected)
"+1"
"01"
"1."
".1"
"1e"
"1e+"
"1e-"
"Infinity"
"-Infinity"
"NaN"
Implementation:
- Create
tests/rfc8259_compliance.rs
- Organize tests by category with clear documentation
- Each test should reference the relevant RFC section
Acceptance Criteria:
Phase 2: Number Format Validation
Goal: Reject non-RFC-compliant number formats during Phase 2 decode.
Spec:
number = [ "-" ] int [ frac ] [ exp ]
int = "0" / ( digit1-9 *digit )
frac = "." 1*digit
exp = ("e" / "E") [ "+" / "-" ] 1*digit
Reject:
- Leading zeros:
007, 00.5
- Leading plus:
+1, +0
- Bare decimal:
.5, 1., -.5
- Hex notation:
0x1F
- Special values:
NaN, Infinity, -Infinity
Implementation: Add validate_number_format() in decode/number.rs, called before parse_i64/parse_f64.
Phase 3: Nesting Depth Limit
Goal: Prevent stack overflow on maliciously deep input.
Spec:
- Default max depth: 128 (matches simdjson)
- Configurable via
Document::parse_with_options(buf, Options { max_depth: 512 })
- Error:
QJD_NESTING_TOO_DEEP (new error code)
Implementation: Track depth in validate_brackets() or add a separate pass.
Phase 4: String Content Validation
Goal: Reject unescaped control characters per RFC 8259.
Spec:
- Reject bytes 0x00-0x1F inside strings unless escaped
- Reject bytes 0x7F (DEL) inside strings
Implementation: Add check in scan/scalar.rs during string scanning.
Phase 5: UTF-8 Validation
Goal: Reject invalid UTF-8 sequences in string values.
Spec:
- Validate UTF-8 during Phase 2 string decode
- Reject overlong encodings, surrogate halves, out-of-range codepoints
Implementation: Use std::str::from_utf8() or a dedicated validator in decode/string.rs.
Phase 6: Trailing Content Detection
Goal: Reject input with non-whitespace after the root value.
Spec:
- After Phase 1 scan, verify only whitespace follows the root closer
- Error:
QJD_TRAILING_CONTENT (new error code)
Implementation: Check in Document::parse() after scan() returns.
Phase 7: JSONTestSuite Integration
Goal: Automated regression testing against the industry standard.
Spec:
- Add
tests/json_test_suite.rs
- Download/vendor JSONTestSuite test files
- Run all
y_* files: assert parse succeeds
- Run all
n_* files: assert parse fails
- Run
i_* files: document our behavior (no assertion)
Non-Goals (Explicitly Out of Scope)
- BOM handling - Callers should strip BOM before passing to us
- Comment support - Not part of RFC 8259
- Trailing commas - Not part of RFC 8259
- Duplicate key policy - Current "last wins" behavior is acceptable
- Streaming/incremental parsing - Different API surface
Success Criteria
Motivation
As a JSON parser library, correctness and stability are fundamental requirements. Before wider adoption, we should systematically audit our validation coverage against:
This audit will identify gaps in our validation logic and help prioritize fixes before they become breaking changes.
Current State
What We Validate Well
scan/scalar.rs,scan/mod.rs::validate_bracketsdecode/string.rs- all 8 escapes +\uXXXX+ surrogate pairsdecode/number.rs- overflow detection, type mismatchpath.rs- keys, indices, nestingdoc.rs::type_ofscanner_crosscheck.rs- proptest with 2000 casesffi.rs- panic barrier, null pointer checksWhat We Don't Validate (Potential Gaps)
007-> 7)+1-> 1).5,1.)Test Coverage Comparison
lua-cjson tests:
lua-resty-simdjson tests:
JSONTestSuite categories:
y_*- 100+ cases parsers MUST acceptn_*- 200+ cases parsers MUST rejecti_*- 50+ implementation-defined edge casesSpec
Phase 1: RFC 8259 Compliance Test Suite (Next Step)
Goal: Build a comprehensive test suite based on RFC 8259, referencing lua-cjson's test approach.
Reference: https://github.com/openresty/lua-cjson/tree/master/tests
Test Categories to Cover:
1.1 Valid JSON (MUST accept)
1.2 Invalid JSON (MUST reject)
1.3 Whitespace Handling
1.4 String Edge Cases
1.5 Number Edge Cases
Implementation:
tests/rfc8259_compliance.rsAcceptance Criteria:
Phase 2: Number Format Validation
Goal: Reject non-RFC-compliant number formats during Phase 2 decode.
Spec:
Reject:
007,00.5+1,+0.5,1.,-.50x1FNaN,Infinity,-InfinityImplementation: Add
validate_number_format()indecode/number.rs, called beforeparse_i64/parse_f64.Phase 3: Nesting Depth Limit
Goal: Prevent stack overflow on maliciously deep input.
Spec:
Document::parse_with_options(buf, Options { max_depth: 512 })QJD_NESTING_TOO_DEEP(new error code)Implementation: Track depth in
validate_brackets()or add a separate pass.Phase 4: String Content Validation
Goal: Reject unescaped control characters per RFC 8259.
Spec:
Implementation: Add check in
scan/scalar.rsduring string scanning.Phase 5: UTF-8 Validation
Goal: Reject invalid UTF-8 sequences in string values.
Spec:
Implementation: Use
std::str::from_utf8()or a dedicated validator indecode/string.rs.Phase 6: Trailing Content Detection
Goal: Reject input with non-whitespace after the root value.
Spec:
QJD_TRAILING_CONTENT(new error code)Implementation: Check in
Document::parse()afterscan()returns.Phase 7: JSONTestSuite Integration
Goal: Automated regression testing against the industry standard.
Spec:
tests/json_test_suite.rsy_*files: assert parse succeedsn_*files: assert parse failsi_*files: document our behavior (no assertion)Non-Goals (Explicitly Out of Scope)
Success Criteria
y_*tests from JSONTestSuite passn_*tests from JSONTestSuite fail with appropriate errori_*edge cases