Context
Each qd.parse(payload) currently creates a fresh Document with a fresh indices: Vec<u32> reserved at buf.len() / 6. For a 10 MB input that's ~1.7 MB reserved per call → goes through the mmap path → ~10–50 μs overhead per parse, plus dealloc on drop.
simdjson's documentation explicitly warns against creating parser instances per call:
Do not create a parser instance for each run to avoid frequent memory allocations. Reuse the same parser instance and call the :decode method repeatedly.
Proposal
Add a new Lua API:
local decoder = qd.new_decoder() -- holds reusable internal buffers
local doc = decoder:parse(payload)
-- ... access fields on doc ...
-- next decoder:parse() truncates and re-fills the same buffers
Implementation:
Decoder owns indices: Vec<u32> + scratch buffer + skip cache
parse(buf) truncates (not deallocates) and re-emits
Document becomes a view into Decoder's state
- Lifetime:
Document borrows Decoder mutably; only one live Document per Decoder at a time
API design decisions to resolve before implementation
Estimated impact
| size |
est. speedup |
| small (2 KB) |
~5–10% |
| 100 KB – 1 MB |
~5–15% |
| 10 MB |
~1–3% (alloc is a small fraction of 2.9 ms) |
Validation plan
Context
Each
qd.parse(payload)currently creates a freshDocumentwith a freshindices: Vec<u32>reserved atbuf.len() / 6. For a 10 MB input that's ~1.7 MB reserved per call → goes through the mmap path → ~10–50 μs overhead per parse, plus dealloc on drop.simdjson's documentation explicitly warns against creating parser instances per call:
Proposal
Add a new Lua API:
Implementation:
Decoderownsindices: Vec<u32>+ scratch buffer + skip cacheparse(buf)truncates (not deallocates) and re-emitsDocumentbecomes a view intoDecoder's stateDocumentborrowsDecodermutably; only one liveDocumentperDecoderat a timeAPI design decisions to resolve before implementation
qd.parse()or add as parallel API (favor parallel; keep existing for compat)Decoderbe reused for different payloads concurrently? (no — single document at a time; document down to its complete lifetime)decoder:destroy()like simdjson? or rely on Lua GC?)Estimated impact
Validation plan
make benchadapted to use new API + 3-run median comparison