Skip to content

[Feature]: lossless 64-bit integer access — return cdata int64_t/uint64_t to LuaJIT #94

@membphis

Description

@membphis

Background

LuaJIT's number type is a double; integers above 2⁵³ lose precision when
returned as a Lua number. JSON payloads with 64-bit IDs (Snowflake IDs, DB
row IDs, unsigned bigint) are silently truncated.

Current state of the code (this is the key reframing vs the original report):

  • Rust already parses integers directly. parse_i64 in
    src/decode/number.rs accumulates digits with checked_mul/add and never
    touches f64.
  • The C/FFI boundary is already lossless. qjson_get_i64 writes a full
    int64_t into the caller's box.
  • The precision loss is one line of Lua. Doc:get_i64 / Cursor:get_i64
    call tonumber(i64_box[0]) (lua/qjson.lua:93, :143), collapsing the
    cdata int64_t down to a double.
  • There is no unsigned path at all — no parse_u64, no qjson_get_u64.

So the work is mostly Lua-side, plus a greenfield (but mechanical) unsigned
mirror in Rust.

Supersedes #22, which predates the qjd → qjson rename and assumed work that
is already done (see Notes).

Goal

get_i64 returns a lossless cdata int64_t; a new get_u64 returns a lossless
cdata uint64_t; get_f64 is unchanged and remains the Lua-number path.
A >2⁵³ integer round-trips through decode and encode without precision loss.
Naming follows simdjson's typed-accessor convention (method name == exact
returned type).

Non-goals

  • Arbitrary-precision / bignum beyond 64 bits.
  • Changing get_f64 semantics (it already returns a lossless Lua double).

Acceptance Criteria

  • Doc:get_i64 / Cursor:get_i64 return a cdata int64_t (no tonumber), preserving full precision for |v| > 2⁵³
  • New Doc:get_u64 / Cursor:get_u64 return a cdata uint64_t, lossless up to u64::MAX
  • get_f64 unchanged — still returns a Lua number
  • A JSON int > 2⁵³ (e.g. 9007199254740993) round-trips losslessly via get_i64; a u64 > i64::MAX (e.g. 18446744073709551615) round-trips via get_u64
  • get_i64 on a value overflowing i64OUT_OF_RANGE; get_u64 on a negative value or value > u64::MAXOUT_OF_RANGE
  • get_i64 / get_u64 on a float/bool/string/null → TYPE_MISMATCH
  • qjson.encode accepts cdata int64_t / uint64_t and emits them as decimal JSON integers (no LL/ULL suffix, no precision loss)
  • include/qjson.h, src/error.rs numbering, and the lua/qjson/lib.lua cdef stay in sync (new qjson_get_u64 / qjson_cursor_get_u64)

Task Checklist

1. Rust: unsigned parse + FFI

  • 1.1 Add parse_u64 in src/decode/number.rs (reject leading -; u64 accumulation with checked ops → OUT_OF_RANGE on overflow; reject ./e/ETYPE_MISMATCH)
  • 1.2 Rust unit tests: 0, u64::MAX, u64::MAX + 1 overflow, negative → error, float/exponent → TYPE_MISMATCH
  • 1.3 Add qjson_get_u64 in src/ffi.rs (mirror qjson_get_i64, keep panic barrier)
  • 1.4 Add qjson_cursor_get_u64 in src/ffi.rs (mirror qjson_cursor_get_i64)
  • 1.5 Declare both in include/qjson.h

2. Lua wrapper: lossless return

  • 2.1 Add uint64_t[1] box; add the two new cdefs in lua/qjson/lib.lua
  • 2.2 Doc:get_i64 returns i64_box[0] cdata directly (drop tonumber)
  • 2.3 Cursor:get_i64 same
  • 2.4 Add Doc:get_u64 / Cursor:get_u64 returning uint64_t cdata

3. Encode round-trip

  • 3.1 Add a cdata branch in the encode dispatcher (lua/qjson/table.lua:702): detect int64_t/uint64_t via ffi.istype, emit decimal (strip LL/ULL)
  • 3.2 Test: encode a table holding an int64/uint64 cdata value → correct decimal JSON

4. Lua tests + docs

  • 4.1 busted: a known >2⁵³ int round-trips losslessly via get_i64
  • 4.2 busted: a u64 > i64::MAX round-trips via get_u64
  • 4.3 busted: get_i64 overflow / get_u64 negative → error; type-mismatch cases
  • 4.4 Update docs (cjson migration guide / README): get_i64 now returns cdata; get_f64 is the Lua-number escape hatch

Notes / Decisions

  • Supersedes feat: lossless 64-bit integer mode — return cdata int64_t to LuaJIT #22. That issue predates the qjd → qjson rename and assumed
    (a) integers were parsed via f64 and (b) a new C function was needed. Both
    are already false: parse_i64 parses i64 directly with checked arithmetic,
    and qjson_get_i64 already returns a lossless int64_t across FFI. The only
    precision loss is the Lua tonumber() call.
  • Naming follows simdjson's typed accessors (get_int64/get_uint64/
    get_double): the method name equals the exact returned type. get_f64
    already returns a Lua number (== double) and serves as the convenient path,
    so no separate get_number twin is needed. lua-cjson, by contrast, has no
    typed getter and silently truncates on LuaJIT — the baseline we're fixing.
  • get_i64 returning cdata is a breaking change vs today (returns a Lua
    number). Acceptable pre-1.0; migration = get_f64(path) or
    tonumber(doc:get_i64(path)).
  • Overflow keeps OUT_OF_RANGE (not PARSE_ERROR as old feat: lossless 64-bit integer mode — return cdata int64_t to LuaJIT #22 suggested) —
    more precise, and already the current i64 behavior.
  • Encode fast path already helps: encode_proxy slices the original buffer
    bytes for untouched subtrees, so an unmodified big int already round-trips
    losslessly. The new cdata branch only matters when a caller explicitly sets a
    field to a cdata int.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions