Skip to content

perf: decoder/document instance pooling to amortize per-parse allocations #6

@membphis

Description

@membphis

Context

Each qd.parse(payload) currently creates a fresh Document with a fresh indices: Vec<u32> reserved at buf.len() / 6. For a 10 MB input that's ~1.7 MB reserved per call → goes through the mmap path → ~10–50 μs overhead per parse, plus dealloc on drop.

simdjson's documentation explicitly warns against creating parser instances per call:

Do not create a parser instance for each run to avoid frequent memory allocations. Reuse the same parser instance and call the :decode method repeatedly.

Proposal

Add a new Lua API:

local decoder = qd.new_decoder()  -- holds reusable internal buffers
local doc = decoder:parse(payload)
-- ... access fields on doc ...
-- next decoder:parse() truncates and re-fills the same buffers

Implementation:

  • Decoder owns indices: Vec<u32> + scratch buffer + skip cache
  • parse(buf) truncates (not deallocates) and re-emits
  • Document becomes a view into Decoder's state
  • Lifetime: Document borrows Decoder mutably; only one live Document per Decoder at a time

API design decisions to resolve before implementation

  • Replace qd.parse() or add as parallel API (favor parallel; keep existing for compat)
  • Can the same Decoder be reused for different payloads concurrently? (no — single document at a time; document down to its complete lifetime)
  • Drop semantics — when does the decoder release its memory? (explicit decoder:destroy() like simdjson? or rely on Lua GC?)
  • Reset on parse error — partial state could leak

Estimated impact

size est. speedup
small (2 KB) ~5–10%
100 KB – 1 MB ~5–15%
10 MB ~1–3% (alloc is a small fraction of 2.9 ms)

Validation plan

  • Unit tests: alloc count (custom allocator or perf counter) before/after
  • make bench adapted to use new API + 3-run median comparison
  • Validation semantics unchanged

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions