Skip to content

v0.0.10

Choose a tag to compare

@MagicalTux MagicalTux released this 31 May 10:22
· 33 commits to master since this release

Other

  • round-191 (parent-dispatch r191) against ISO/IEC FDIS 18181-1:2021 — Annex E / §H.5.2 Weighted-Predictor oracle test driven by clean-room behavioural trace at noise-64x64-lossless sample 194
  • round-190 (parent-dispatch r190) against ISO/IEC FDIS 18181-1:2021 — typed per-pass NonZeros(x, y) grid container above the round-183 per-channel primitive
  • round-183 (parent-dispatch r183) against ISO/IEC FDIS 18181-1:2021 — typed per-channel NonZeros(x, y) grid container layered above round-177 single-channel primitive
  • round-177 (parent-dispatch r177) against ISO/IEC FDIS 18181-1:2021 — typed NonZeros(x, y) grid bookkeeping + per-varblock decode driver
  • round-164 (parent-dispatch r164) against ISO/IEC FDIS 18181-1:2021 — TransformType-driven entry points for the §C.8.3 per-block HF coefficient decode loop
  • round-159 (parent-dispatch r159) against ISO/IEC FDIS 18181-1:2021 — §C.8.3 per-block HF coefficient decode loop scaffolding (Listings C.13 + C.14)
  • round-150 (parent-dispatch r150) against ISO/IEC FDIS 18181-1:2021 — Annex I.2.3.8 Listing I.13 Inverse AFV transform wired into idct dispatch
  • round-147 (parent-dispatch r147) against ISO/IEC FDIS 18181-1:2021 — Annex I.2.2 AFV basis + AFV_IDCT pure-math primitive (Listings I.5 + I.6)
  • round-144 (parent-dispatch r144) against ISO/IEC FDIS 18181-1:2021 — Annex J.3 edge-preserving-filter pure-math primitive
  • round-141 (parent-dispatch r141) against ISO/IEC FDIS 18181-1:2021 — Annex J.2 Gabor-like-transform pure-math primitive
  • round-138 (parent-dispatch r138) against ISO/IEC FDIS 18181-1:2021 — Annex G Chroma-from-Luma pure-math primitive (Listing G.1)
  • round-133 (parent-dispatch r133) against ISO/IEC FDIS 18181-1:2021 — §C.7.1 DecodePermutation() for used_orders != 0
  • Round 129: per-varblock LF→LLF composition glue (§I.2.5 plumbing)
  • Round 126: WP deep-trace plumbing + sample-194 hand-derivation
  • round-121 (parent-dispatch r121) against ISO/IEC FDIS 18181-1:2021 — §I.2.5 LLF-from-LF pure-math step (Listings I.15 + I.16)
  • round-95 (parent-dispatch r95) against ISO/IEC FDIS 18181-1:2021 — §F.3 HF dequantisation pure-math step
  • round-90 (parent-dispatch r90) against ISO/IEC 18181-1:2021 FDIS — HfPass + PassGroup HF structural parsers
  • round-89 (parent-dispatch r89) against ISO/IEC 18181-1:2024 — GetDCTQuantWeights + Table I.6 default dequantization-matrix materialisation
  • rewrite lf_dequant comment to remove libjxl numeric-defaults citation
  • round-77 fixup — inline animation-3frame fixture under crate-local tests/fixtures/
  • round-77 (parent-dispatch r17) against ISO/IEC 18181-1:2024 — animation-3frame SPECDIFF audit harness
  • round-32 (parent-dispatch r17) against ISO/IEC 18181-1:2024 — noise-64x64-lossless pixel-divergence bisected to WP at first predictor=6 sample with WW/NN both in-image; fix deferred pending libjxl-WP behavioural trace
  • round-31 (parent-dispatch r16) against ISO/IEC 18181-1:2024 — §F.3 zero-pad uniformly applied to single-TOC-entry LfGlobal fast path
  • round-30 (parent-dispatch r15) against ISO/IEC 18181-1:2024 — bit-depth-16 RGB pixel-correct + 16-bit LE plane-pack convention
  • round-29 (parent-dispatch r14) against ISO/IEC 18181-1:2024 — alpha-64x64 RGBA pixel-correct + ISOBMFF FF 0A strip
  • round-28 (parent-dispatch r13) against ISO/IEC 18181-1:2024 — non-DCT IDCT helpers (Annex I.9.3..I.9.7)
  • round-27 (parent-dispatch r12) against ISO/IEC 18181-1:2024 — IDCT dispatch (Annex I.2.1 + I.2.2 Listing I.4)
  • round-26 (parent-dispatch r11) against ISO/IEC 18181-1:2024 — Annex L colour transforms (XYB inverse + YCbCr inverse)
  • round-25 (Auditor mode) against ISO/IEC 18181-1:2024 — d1 LfCoefficients per-sample rich-state range dump 22..=79
  • round-24 (Auditor mode) against ISO/IEC 18181-1:2024 — d1 per-cluster D[] byte trace + per-call alias-mapping invariant audit
  • round-23 (Auditor mode) against ISO/IEC 18181-1:2024 — d1 leaf-pick property dump at Y' sample 22 + WP y=0 boundary audit
  • round-22 (Auditor mode) against ISO/IEC 18181-1:2024 — d1 lf_quant sample dump + WP rounding bias toggle
  • round-21 (Auditor mode) against ISO/IEC 18181-1:2024 — d1 per-cluster distribution + alias-table self-map audit
  • round-20 followup — refresh round-19 trace eprintln with corrected DC_GROUP budget
  • round-20 (Auditor pivot) against ISO/IEC 18181-1:2024 — DC_GROUP boundary recount + ANS-final-state oracle
  • round-19 (Auditor mode) — d1 cluster + ANS state evolution audit
  • round-18 (Auditor mode) against ISO/IEC 18181-1:2024 — per-token bit accounting trace + drift narrowed

Added

  • Round 191 (2021-FDIS) — Annex E / §H.5.2 Weighted-Predictor
    oracle test driven by clean-room behavioural trace at sample 194 of
    noise-64x64-lossless.
    New tests/r191_wp_trace_oracle.rs (5
    tests) and new pub fn modular_fdis::wp_predict_pub test wrapper
    around the production wp_predict. The oracle consumes the
    docs/image/jpegxl/fixtures/noise-64x64-lossless/wp-trace-sample-194.md
    trace (provenance recorded alongside as wp-trace-provenance.md),
    which records the FDIS-conformant per-listing intermediates an
    instrumented reference decoder produces at the
    (channel 0, x=2, y=3) divergence point bisected in rounds 31..126:

    • r191_wp_predict_matches_trace_at_sample_194 — drives the
      production wp_predict with the trace's WpState/Neighbours
      inputs; asserts the four sub-predictions [1248, 747, 420, 559],
      the final pre-round prediction 709, and max_error = 737 all
      reproduce exactly. Result: PASS — proves Annex E.2 Listings
      E.1 (sub-predictions), E.2 (err_sum_i + error2weight), E.3
      (weighted sum + same-sign clamp), and E.4 (max_error) are
      spec-correct in wp_predict, isolating the still-unfixed
      sample-194 wp_pred8 = 717 vs trace 709 off-by-8 divergence to
      upstream state evolution (set_true_err / set_sub_err
      calls fired across samples 0..193) rather than the predictor
      arithmetic itself.
    • r191_trace_err_sum_self_consistency — pure-arithmetic sanity
      check on the trace's sub_err_{i,N/NE/NW} table summing to the
      reported err_sum_i ([438, 330, 416, 240]).
    • r191_trace_weights_match_error2weight — hand-derives the
      trace's weight_i = [495694, 599189, 474830, 825112] from
      FDIS-literal error2weight(err_sum_i, wp_w_i); documents a
      1-unit inner-Idiv-vs-multiplication-first discrepancy with the
      production reading that does NOT affect sample 194's shifted
      weights (both readings give [3, 4, 3, 6] after the Listing E.3
      >> sh step).
    • r191_trace_prediction_matches_listing_e3 — independent
      hand-derivation of prediction = 709 from Listing E.3 inputs,
      including verification that the same-sign clamp predicate fires
      but is a no-op (pre-clamp 709 ∈ [min(W,N,NE)=584, max(W,N,NE)=
      1232]).
    • r191_pin_state_evolution_gap — pins the production-vs-trace
      delta as a roadmap for the next round's bisect: Δ te_w = +21,
      Δ te_nw = -21 (symmetric pair → likely a single upstream
      defect), Δ wp_pred8 = +8 in 8x scale = +1 in un-shifted pixel
      space (matches r126_first_divergence_scan dec=35 vs exp=34).
      Spec citations and provenance attestation embedded in the test
      module docstring; references the in-repo FDIS §E.1-E.4 line
      numbers and the trace doc's stated prediction − true_value
      sign convention. Trace doc is the newly-staged
      docs/image/jpegxl/fixtures/noise-64x64-lossless/wp-trace-*.md
      pair landed alongside this round (tasks #820 + #1077). Issues #6,
      #64, #799.
  • Round 190 (2021-FDIS) — typed per-pass NonZeros(x, y) grid
    container (FDIS §C.8.3 + Listing C.13 per-pass keying).
    New
    per_pass_non_zeros module that owns one
    per_channel_non_zeros::PerChannelNonZerosGrids per pass index
    p ∈ [0, num_passes), layered above the round-183 per-channel
    container. A VarDCT frame is decoded in num_passes ordered passes
    (declared in FrameHeader.passes.num_passes); each pass scans every
    PassGroup once and §C.8.3 specifies that within a pass each
    channel of each varblock maintains its own NonZeros(x, y) state.
    Between passes the per-channel bookkeeping is reset because the
    per-pass histogram is selected by hfp from the per-pass HfPass
    array — a different pass uses a different histogram and the
    prediction recurrence is keyed against the current pass's own
    coefficient counts. The new module captures the per-pass routing
    layer above round 183's per-channel routing layer:

    • PerPassNonZerosGrids::new(pass_dims: &[&[(u32, u32)]]) -> Result<Self>
      — per-pass per-channel (width, height) slice, validated
      entry-by-entry via PerChannelNonZerosGrids::new (zero / oversize
      dims rejected per channel; empty pass-list rejected).
    • PerPassNonZerosGrids::new_uniform(num_passes, num_channels, width, height) -> Result<Self> — convenience builder for the
      uniform-per-pass case.
    • PerPassNonZerosGrids::{num_passes, pass, pass_mut, predicted, get, set, update_after_block, update_after_block_for_transform}
      per-pass routing accessors; out-of-range p errors cleanly.
    • PerPassNonZerosGrids::decode_block_at_for_pass_channel(p, c, x, y, t, block_ctx, nb_block_ctx, read_non_zeros, decode_symbol) -> Result<(DecodedHfBlock, u32)> — typed per-pass per-channel
      driver that wraps the round-183
      PerChannelNonZerosGrids::decode_block_at_for_channel with pass
      routing. Caller pre-computes block_ctx via
      pass_group_hf::block_context with the matching c; the
      container is a pure storage + routing primitive and does not
      re-derive pass_group_hf::block_context nor materialise the
      per-pass histogram.
    • Per-pass per-channel shapes are independent — ragged per-pass
      channel counts are tolerated.

    41 new tests (28 unit in per_pass_non_zeros::tests + 13 integration
    in tests/round190_per_pass_non_zeros.rs) pin: empty-pass-list /
    zero-channel-pass / zero-dim rejection; two-pass chroma-subsampled
    construction; new_uniform convenience; out-of-range pass index
    errors on every accessor (8 paths); PredictedNonZeros(0, 0) = 32
    on every (pass, channel); per-pass write isolation; per-pass
    predicted propagation reads back each pass's own history (not
    another pass's); per-pass update_after_block_for_transform
    dispatch (raw non_zeros = 17{17, 5, 2} at DCT8×8 / DCT16×16 /
    DCT32×32 on three independent passes); per-pass
    decode_block_at_for_pass_channel routing; two-pass three-channel
    raster walk at (0, 0) / (1, 0) with distinct [4, 8, 12] /
    [3, 6, 9] per-pass per-channel raw_non_zeros sequences preserves
    cross-pass isolation; ragged per-pass channel counts (one-channel
    DC-only preview followed by three-channel main); u32::MAX
    no-panic saturating-add chain through the per-pass route. Lib
    tests 608 → 636 (+28).

  • Round 183 (2021-FDIS) — typed per-channel NonZeros(x, y) grid
    container (FDIS §C.8.3 + Listing C.13 channel-keying).
    New
    per_channel_non_zeros module that owns one
    non_zeros_grid::NonZerosGrid per channel, layered above the
    round-177 single-channel primitive. Listing C.13's
    BlockContext() factors c into (c < 2 ? c ^ 1 : 2) × 13 + s,
    so the NonZeros(x, y) bookkeeping is keyed per-channel because
    chroma subsampling + TransformType heterogeneity means each
    channel's varblock-grid shape can differ:

    • PerChannelNonZerosGrids::new(dims: &[(u32, u32)]) -> Result<Self>
      — per-channel (width, height) slice, validated entry-by-entry
      via NonZerosGrid::new (zero / > 65535 dims rejected; empty
      slice rejected).
    • PerChannelNonZerosGrids::new_uniform(num_channels, width, height) -> Result<Self> — convenience builder for the
      unsubsampled 4:4:4-style container.
    • PerChannelNonZerosGrids::{num_channels, grid, grid_mut, predicted, get, set, update_after_block, update_after_block_for_transform} — per-channel routing
      accessors; out-of-range c errors cleanly.
    • PerChannelNonZerosGrids::decode_block_at_for_channel(c, x, y, t, block_ctx, nb_block_ctx, read_non_zeros, decode_symbol) -> Result<(DecodedHfBlock, u32)> — typed per-channel driver
      that wraps the round-177 non_zeros_grid::decode_block_at
      with channel routing. Caller pre-computes block_ctx via
      pass_group_hf::block_context with the matching c; the
      container is a pure storage + routing primitive.
    • DEFAULT_NUM_CHANNELS = 3 — the YCbCr / XYB canonical channel
      count.

    36 new tests (24 unit in per_channel_non_zeros::tests + 12
    integration in tests/round183_per_channel_non_zeros.rs) pin:
    empty-channel-list rejection; zero-dim / oversize-dim rejection
    on any channel; three-channel chroma-subsampled construction at
    [(16, 16), (8, 8), (8, 8)]; new_uniform convenience;
    out-of-range channel index errors on every accessor (8 paths);
    PredictedNonZeros(0, 0) = 32 on every channel; per-channel
    write isolation; per-channel predicted horizontal chain on a
    seeded channel-1 grid; update_after_block_for_transform
    dispatch (raw non_zeros = 17{17, 5, 2} at DCT8×8 /
    DCT16×16 / DCT32×32 on three independent channels);
    decode_block_at_for_channel routes the round-177 typed driver
    per channel; post-update cell feeds the next-position predicted
    value back per-channel; OOB (x, y) past the per-channel grid
    errors cleanly; a two-step three-channel raster walk at
    (0, 0) / (1, 0) with distinct [4, 12, 20] /
    [6, 18, 30] per-channel raw_non_zeros sequences preserves
    cross-channel isolation.

    Lib tests 584 → 608 (+24). Pure-control-flow primitive in the
    same shape as round-89 dct_quant_weights, round-95
    hf_dequant, round-121 llf_from_lf, round-138
    chroma_from_luma, round-141 gaborish, round-144 epf,
    round-147 afv_idct, round-159 / 164 pass_group_hf, and
    round-177 non_zeros_grid — no bit reads, no spec
    re-derivation. A future round wiring §C.7.2 entropy histograms
    (#799 DOCS-GAP) + the per-LfGroup varblock-shape grid +
    per-channel BlockContext() history can drop these helpers in
    as the per-channel step without re-deriving any Listing C.13 /
    C.14 formulae.

  • Round 177 (2021-FDIS) — typed NonZeros(x, y) grid bookkeeping +
    per-varblock decode driver (FDIS §C.8.3 + Listing C.13 prelude +
    Listing C.14 post-prose).
    New non_zeros_grid module bridging
    round 159 pass_group_hf::predicted_non_zeros (the four-branch
    PredictedNonZeros(x, y) recurrence) with round 164
    pass_group_hf::read_non_zeros_and_decode_block_for_transform
    (the TransformType-driven per-block coefficient loop):

    • NonZerosGrid::new(width, height) -> Result<Self> — rectangular
      varblock-grid storage of NonZeros(x, y) cells. Defensive
      rejection of zero dims + dims > 65535.
    • NonZerosGrid::{get, set, width, height, cells} — accessors.
    • NonZerosGrid::predicted(x, y) -> Result<u32> — delegates to
      pass_group_hf::predicted_non_zeros against
      |xx, yy| self.get(xx, yy).unwrap_or(0).
    • NonZerosGrid::update_after_block(x, y, non_zeros, num_blocks) -> Result<u32> — FDIS post-Listing-C.14 prose formula
      (non_zeros + num_blocks - 1) Idiv num_blocks (ceiling-divide
      identity, saturating_add at u32::MAX).
    • NonZerosGrid::update_after_block_for_transform(x, y, non_zeros, t)num_blocks from pass_group_hf::transform_block_params.
    • non_zeros_grid::decode_block_at(grid, x, y, t, block_ctx, nb_block_ctx, read_non_zeros, decode_symbol) -> Result< (DecodedHfBlock, u32)> — typed per-varblock driver: computes
      predicted, invokes
      read_non_zeros_and_decode_block_for_transform, then calls
      update_after_block_for_transform before returning the
      (DecodedHfBlock, raw_non_zeros) pair.

    35 new tests (23 unit in non_zeros_grid::tests + 12 integration
    in tests/round177_non_zeros_grid.rs) pin: defensive rejection
    of zero / oversize (> 65535) dims and out-of-range (x, y);
    zero-init cells; PredictedNonZeros(0, 0) = 32 across a sweep
    of grid shapes; the y == 0 and x == 0 border-recurrence branches
    via horizontal / vertical raster chains; the interior
    (above + left + 1) >> 1 average (odd-sum rounding); the
    predicted_non_zeros helper agreement byte-for-byte on a seeded
    3×3 grid; the post-Listing-C.14 ceiling-divide formula at
    num_blocks ∈ {1, 4, 16} (DCT8×8 / DCT16×16 / DCT32×32 — the
    TransformType dispatch reduces a raw non_zeros = 17 to
    {17, 5, 2} at the three shapes); the typed driver's
    predicted = 32 at the origin routes through the predicted >= 8 NonZerosContext branch (ctx = block_ctx + nb_block_ctx × (4 + 32 Idiv 2) = 67 at (block_ctx, nb_block_ctx) = (7, 3));
    decode_block_at reads back (0, 0)'s post-update cell when
    invoked at (1, 0); OOB positions error cleanly; per-channel
    independence (two grids of the same shape evolve
    independently); row-major cells() layout pinned at [0, 10, 20, 30] after writing (1,0)=10, (0,1)=20, (1,1)=30 on a
    2×2 grid; and pathological u32::MAX does not panic.

    Lib tests 561 → 584 (+23). Pure-control-flow primitive in the
    same shape as round-89 dct_quant_weights, round-95
    hf_dequant, round-121 llf_from_lf, round-138
    chroma_from_luma, round-141 gaborish, round-144 epf,
    round-147 afv_idct, and round-159 / 164 pass_group_hf — no
    bit reads, no spec re-derivation. A future round wiring §C.7.2
    entropy histograms (#799 DOCS-GAP) + the per-LfGroup
    varblock-shape grid + per-channel BlockContext() history can
    drop these helpers in as the per-varblock-position step without
    re-deriving any Listing C.13 / C.14 formulae.

  • Round 164 (2021-FDIS) — TransformType-driven entry points for
    the §C.8.3 per-block HF coefficient decode loop (DCT16×16 /
    DCT16×8 / DCT32×32 dimensions pinned end-to-end).
    New public API
    in pass_group_hf:

    • transform_block_params(t: TransformType) -> (num_blocks, size)
      — §I.2.4 opening paragraph + Listing C.14: num_blocks = (bwidth / 8) × (bheight / 8), size = bwidth × bheight.
    • decode_block_coefficients_for_transform(t, initial_non_zeros, block_ctx, nb_block_ctx, decode_symbol) — typed wrapper that
      derives (num_blocks, size, natural_order) from t (via
      [coeff_order::order_id_for_transform] +
      [coeff_order::natural_coeff_order]) and reduces to the
      round-159 decode_block_coefficients.
    • read_non_zeros_and_decode_block_for_transform(t, predicted, block_ctx, nb_block_ctx, read_non_zeros, decode_symbol)
      analogous typed wrapper around
      read_non_zeros_and_decode_block.
      20 new tests (8 unit in pass_group_hf::tests + 12 integration
      in tests/round164_dct16x16_block_coefficient_loop.rs) pin the
      (num_blocks, size) derivation for every Table C.16 transform
      (every entry satisfies num_blocks * 64 == size); the DCT16×16
      prev threshold at non_zeros == 17 (= size/16 + 1); the typed
      entry point at DCT8×8 reduces to the raw entry point; the typed
      entry point at DCT16×16 walks (num_blocks=4, size=256) for
      all-zero / single-non-zero / three-consecutive / full-density
      (252 reads) cases with coefficients landing at
      natural_coeff_order(Id2)[4..]; the typed and raw entry points
      agree byte-for-byte on a mixed [2, 0, 4, 0, 0, 6] sequence;
      read_non_zeros_and_decode_block_for_transform threads the
      NonZerosContext value through the first closure; the rectangular
      DCT16×8 / DCT8×16 collapse to the same per-block outcome (they
      share OrderId::Id4); defensive rejection of initial_non_zeros > size - num_blocks (= 252 max for DCT16×16); and one DCT32×32
      smoke-test at (num_blocks=16, size=1024). Lib tests 553 → 561
      (+8). Pure-typed wrapper layer: no new bit reads, no spec
      re-derivation — the round-159 module note ("the primitive itself
      is shape-agnostic and ready for the larger variable-block sizes
      once their parameterisation lands") is now exercised from the
      caller-facing API.
  • Round 159 (2021-FDIS) — §C.8.3 per-block HF coefficient decode
    loop scaffolding (Listing C.13 + Listing C.14).
    New public API in
    pass_group_hf:

    • prev_for_context(k, num_blocks, size, non_zeros, prev_nonzero)
      — Listing C.14 verbatim (k == num_blocks ? (non_zeros > size / 16 ? 1 : 0) : (prev_nonzero(k - 1) ? 1 : 0)).
    • DecodedHfBlock { coeffs, remaining_non_zeros, coeffs_read }
      return bundle for the per-block primitive.
    • decode_block_coefficients(natural_order, num_blocks, size, initial_non_zeros, block_ctx, nb_block_ctx, decode_symbol)
      Listing C.14's per-block raster-order loop with the §C.8.3
      "stop when non_zeros reaches 0" early-exit, UnpackSigned
      application, and natural_order[k] placement. The
      decode_symbol: FnMut(ctx) -> Result<u32> closure abstracts
      over the (still un-landed) §C.7.2 entropy histograms — a real
      consumer wraps EntropyStream + HybridUintState + the
      per-group histogram_offset; tests can hand-roll a symbol
      sequence.
    • read_non_zeros_and_decode_block(.., predicted, .., read_non_zeros, decode_symbol) — convenience wrapper that issues the
      D[NonZerosContext(predicted) + offset] read via the first
      closure and drives decode_block_coefficients with the result.
      Returns (DecodedHfBlock, non_zeros) so the caller can update
      its NonZeros-grid bookkeeping per `NonZeros(x, y) = (non_zeros
      • num_blocks - 1) Idiv num_blocks`.

    Bounded scope: DCT8×8 alone — num_blocks = 1, size = 64,
    OrderId::Id0 natural-coefficient order (the simplest case that
    exercises the full state machine). The primitive itself is
    shape-agnostic; the larger variable-block sizes (DCT16×16,
    DCT32×32, AFV0..3, …) need their num_blocks / size parameters
    threaded through the varblock driver above this primitive.

    11 new unit tests (pass_group_hf::tests::*) + 11 integration
    tests (round159_block_coefficient_loop) cover: all-zero block
    (no symbol reads); single non-zero at the first HF slot (one
    read, UnpackSigned(1) = -1 at natural_order[1]); three
    consecutive non-zeros (loop stops after three reads); full
    density (63 reads, LLF cell untouched); the size/16 threshold
    for prev (crossover at non_zeros == 5 for DCT8×8); the
    "previous coefficient is zero / non-zero" flag tracking through
    the loop's history; defensive rejection of malformed
    natural-order vectors, zero num_blocks, and over-large
    initial_non_zeros; closure-threaded end-to-end smoke through
    read_non_zeros_and_decode_block. Lib tests 538 → 553 (+15).

    Pure-math / pure-control-flow primitive in the same shape as
    round-89 dct_quant_weights, round-95 hf_dequant, round-121
    llf_from_lf, round-138 chroma_from_luma, round-141
    gaborish, round-144 epf, and round-147 afv_idct — a future
    round wiring §C.7.2 histograms into the per-pass entropy stream
    can drop this primitive in as the per-block loop body without
    re-deriving any C.13 / C.14 formulae. The §C.7.2 entropy
    histogram decode (#799 DOCS-GAP), the per-channel (Y / X / B)
    non_zeros read in the varblock driver above this primitive,
    the per-pass NonZeros-grid update, and the per-varblock
    BlockContext() derivation remain follow-up work for subsequent
    rounds.

  • Round 150 (2021-FDIS) — Annex I.2.3.8 / Listing I.13 Inverse AFV
    transform composition (idct::idct_afv).
    Composes the round-147
    crate::afv::afv_idct pure-math primitive (Listings I.5 + I.6)
    with two idct_2d calls (one at 4×4, one at 4×8) per the
    three-sub-block decomposition of Listing I.13 — yielding the
    full 8×8 sample buffer for TransformType::Afv0..Afv3. With
    this wiring the idct::idct_for_transform dispatcher routes
    Afv0..Afv3 to idct_afv instead of returning
    Err(Unsupported); all 10 non-DCT branches of Table I.4 are now
    pure-math-complete (Hornuss / DCT2×2 / DCT4×4 / DCT8×4 / DCT4×8

    • AFV0..AFV3). Each AFV variant's sub-block placement is
      controlled by flip_x = n & 1 / flip_y = n >> 1 (§I.2.3.8);
      the AFV sub-block additionally mirrors its read coordinates
      (flip_x == 1 ? 3 - ix : ix and the iy dual) per the inner
      loop of Listing I.13. Seven new property-style tests cover:
      rejection of non-AFV transforms / wrong lengths; all-zero
      input → all-zero output for all four variants; DC-only input
      → constant c(0,0) output (the three DC patches (c00+c01+c10) × 4, c00-c01+c10, c00-c01 collapse to 4·1, 1, 1
      respectively, with each sub-block's IDCT mapping a DC-only
      cell to a constant sub-block since AFVBasis row 0 = [0.25; 16] and IDCT_2D DC-only is constant); dense-AC input →
      every cell written; AFV0↔AFV1 x-axis flip swaps the AFV
      sub-block column reads; AFV0↔AFV2 y-axis flip swaps the 4×8
      sub-block y-band placement; linearity. Test-count delta:
      +7 (531 → 538).

    FDIS typo documented in module docs. Listing I.13's final
    source line reads samples_4×4(ix, iy) but the inner loop
    iterates ix ∈ [0..8) and samples_4×4 only has columns
    0..3, while the immediately preceding line computes
    samples_4×8 = IDCT_2D(coeffs_4×8). Implementation reads from
    samples_4×8 per context; the typo is now annotated alongside
    the existing four Annex D / D.3 typos in the project
    FDIS-typo memory.

  • Round 147 (2021-FDIS) — Annex I.2.2 AFV basis + AFV_IDCT
    pure-math primitive (Listings I.5 + I.6, p. 76).
    New
    src/afv.rs module transcribes the orthonormal AFVBasis[16][16]
    table from Listing I.5 verbatim and the Listing I.6 cell-sum
    samples[i] = sum_j coefficients[j] × AFVBasis[j][i]. Public
    API:

    • AFV_CELL_LEN: usize = 16 — the §I.2.2 4×4-as-flat-16 cell.
    • AFV_BASIS: [[f32; 16]; 16] — verbatim Listing I.5.
    • afv_idct(coefficients: &[f32]) -> Result<[f32; 16]>
      Listing I.6.

    The 256-float transcription is independently verified at the
    table level: row-0 = [0.25; 16] (Listing I.5 line 1); row-4 =
    two non-zero entries at columns 1 and 4, both at ±1/sqrt(2),
    zero elsewhere (Listing I.5 line 5); per-row L2 unit-norm
    (orthonormality diagonal); pairwise zero inner product
    (orthonormality off-diagonal); afv_idct is linear; one-hot
    coefficient input recovers AFVBasis[j] row-for-row;
    ||samples||_2 == ||coefficients||_2 (L2 conservation, an
    orthonormal-basis property). A single transcription typo in any
    of the 256 entries would fail at least one orthonormality sum.

    10 new unit tests + 9 integration tests
    (round147_afv_idct); lib tests 521 → 531. Pure-math primitive
    in the same shape as round-89 dct_quant_weights, round-95
    hf_dequant, round-121 llf_from_lf, round-138
    chroma_from_luma, round-141 gaborish, and round-144 epf
    a future round wiring §I.2.3.8 Inverse AFV transform (Listing
    I.13) into idct_for_transform can drop this helper in without
    re-deriving any I.5 / I.6 cells. The Listing I.13 composition
    (the coeffs_afv corner-load, the two IDCT_2D 4×4 / 4×8
    sub-blocks, the flip_x / flip_y AFVn flip) remains
    follow-up work because it depends on idct_2d for non-square
    blocks plus the AFVn dispatch wiring; the §I.2.2 arithmetic
    core landed in this round unblocks that follow-up.

  • Round 144 (2021-FDIS) — Annex J.3 "Edge-preserving filter"
    pure-math primitive (pages 85–87).
    New src/epf.rs module
    transcribes the four §J.3 listings as a self-contained pure-math
    primitive: given a triple of three-channel f32 planes (the output
    of round-141 Gaborish on the §I.2.5 + Annex G chain), per-call
    scalar parameters (sigma, step_multiplier, zeroflush,
    position_multiplier_border, channel_scale), and a
    [frame_header::RestorationFilter] (Table C.9) for
    epf_quant_mul / epf_sharp_lut[..] / epf_sigma_for_modular,
    this module returns the per-pass output planes Listing J.4
    prescribes. Public API:

    • distance_step_0_and_1(x, y, b, w, h, x, y, cx, cy, scale)
      Listing J.1 DistanceStep0and1 (the five-pixel cross-shape
      three-channel scaled L1 distance for passes 0 and 1).
    • distance_step_2(...) — Listing J.1 DistanceStep2 (the
      single-sample three-channel scaled L1 distance for pass 2,
      under the literal (ix, iy) == (0, 0) reading of the free-
      variable bug — see DOCS-GAP).
    • weight(distance, inv_sigma, position_multiplier, zeroflush)
      — Listing J.2 Weight() decreasing-function-of-distance
      kernel with the v <= zeroflush cutoff.
    • inv_sigma_for_pass(step_multiplier, sigma) — Listing J.2's
      pre-computed step_multiplier × 4 × (sqrt(0.5) - 1) / sigma
      factor (rejects non-finite or non-positive sigma).
    • vardct_sigma_from_listing_j3(quantization_width, sharpness, &rf) — Listing J.3's per-varblock sigma derivation with the
      max(1e-4, ..) clamp; the modular-mode branch uses
      rf.epf_sigma_for_modular directly.
    • is_border_position(x, y) — Listing J.2's "either coordinate
      of the reference sample is 0 or 7 IMod 8" predicate driving
      the per-pixel epf_border_sad_mul selection.
    • apply_step_5tap(Pass::Pass1 | Pass::Pass2, ..) — Listing
      J.4's 5-tap cross-shape kernel pass (passes 1 and 2); the
      distance metric is selected by the Pass discriminant.
    • apply_step_13tap(..) — Listing J.4's 13-tap diamond kernel
      pass 0 (always using DistanceStep0and1).
    • Pass — enum picking Pass0 / Pass1 / Pass2 for the dispatch.

    §6.5 Mirror1D boundary handling is reused verbatim from
    round-141 gaborish::mirror1d. 36 new unit tests + 12 new
    integration tests (round144_epf) pin self-distance-is-zero on
    constant planes for both metrics, per-channel-scale linearity,
    offset symmetry for DistanceStep0and1, DistanceStep2 hand-
    derived spatially-varying-plane case
    (x:1×40 + y:2×5 + b:0×3.5 = 50), Weight() zero-distance
    returns 1.0 / zeroflush cutoff / position-multiplier scaling,
    Listing J.3 sigma at default rf sharpness 0 → 1e-4 clamp and
    sharpness 7 → full quant, the is_border_position 8×8 grid
    layout, constant-plane invariance across all three passes, and
    the zero-channel-scale collapse to the uniform mean on a centre
    impulse. Lib tests 485 → 521. Pure-math primitive in the same
    shape as round-89 dct_quant_weights, round-95 hf_dequant,
    round-121 llf_from_lf, round-138 chroma_from_luma, and
    round-141 gaborish — a future round wiring §J.3 into the
    per-frame restoration-filter pipeline can drop these helpers in
    without re-deriving any of the J.1/J.2/J.3/J.4 listings. The
    per-frame loop (calling each pass for each varblock under the
    right epf_iters / per-block sigma / position-multiplier
    conditions with output of pass i feeding pass i+1), the
    sigma < 0.3 skip-the-block path, and the epf_iters > 0 skip
    remain caller responsibilities (deferred to follow-up rounds).
    DOCS-GAP observed in FDIS Listing J.1 DistanceStep2 (free
    ix/iy variables — adopted (ix, iy) == (0, 0)) and Listing
    J.2 step_multiplier array (missing comma between
    epf_pass0_sigma_scale and 1); both surfaced in the
    module-level rustdoc with the adopted reading and rationale, and
    the public API sidesteps the indexing ambiguity by accepting
    step_multiplier: f32 directly so the wiring round can pick the
    resolution without an API churn.

  • Round 141 (2021-FDIS / 2024-spec) — Annex J.2 "Gabor-like
    transform" pure-math primitive (page 85).
    New src/gaborish.rs
    module transcribes FDIS §J.2 verbatim: given a per-channel plane
    of f32 samples (the output of §I.2.5 LLF/HF reconstruction + the
    round-138 Annex G chroma-from-luma chain) and the per-channel
    gab_C_weight1 / gab_C_weight2 weights carried by
    [frame_header::RestorationFilter] (Table C.9), the module applies
    the spec's symmetric 3×3 convolution (centre = 1, edges = w1, corners = w2), rescaled uniformly so the nine kernel entries
    sum to 1, with §6.5 Mirror1D boundary handling on
    out-of-image references. Public API: mirror1d(coord, size)
    (Listing 6.1 iterative form), sample_mirror(plane, w, h, x, y)
    (direct §6.5 fetch), gab_kernel(w1, w2) -> [f32; 9]
    (materialised normalized kernel in row-major order), apply_channel
    (out-of-place per-channel convolution with an interior fast path

    • edge-mirror fallback), apply_channel_in_place (single-buffer
      scratch convenience), and apply_xyb_planes_in_place(x, y, b, w, h, &rf) (the three-channel XYB-pipeline convenience using
      rf.gab_x_weight* / gab_y_weight* / gab_b_weight*). 23 new
      unit tests + 10 new integration tests (round141_gaborish) pin
      Mirror1D's identity / first-reflection / single-row collapse
      cases, the default-weight kernel sum-to-one and centre-tap
      (≈ 0.586) values, the four-edge / four-corner kernel symmetry,
      identity-kernel pass-through, constant-plane invariance, the
      per-channel impulse response on a 3×3 plane, linearity of the
      convolution operator, single-row mirror-collapse, and the
      per-channel dispatch through apply_xyb_planes_in_place. Lib
      tests 462 → 485. This is a pure-math primitive in the same shape
      as round-89 dct_quant_weights, round-95 hf_dequant, round-121
      llf_from_lf, and round-138 chroma_from_luma: it lands the
      bit-exact arithmetic so a future round wiring §J.2 into the
      per-frame restoration-filter pipeline can drop it in without
      re-deriving the kernel or the mirror semantics. Does NOT
      implement §J.3 (edge-preserving filter) and does NOT honour the
      rf.gab skip — both are the caller's responsibility.
  • Round 138 (2021-FDIS / 2024-spec) — Annex G "Chroma from luma"
    pure-math primitive (Listing G.1).
    New src/chroma_from_luma.rs
    module transcribes FDIS Annex G (page 73) verbatim: given the
    per-frame [lf_global::LfChannelCorrelation] bundle (§C.4.4) and,
    for HF coefficients, the per-64×64-tile factor samples from
    [lf_group::HfMetadata]'s x_from_y / b_from_y channels
    (§C.5.4), the module computes the CfL multipliers (kX, kB) and
    applies the Listing G.1 reconstruction X = dX + kX × Y,
    B = dB + kB × Y, Y = dY per sample. Public API:
    kx_kb_raw(base_x, base_b, colour_factor, x_factor, b_factor)
    (Listing G.1 lines 1-2), kx_kb_lf(cfl) (LF derivation
    x_factor = x_factor_lf - 127, b_factor = b_factor_lf - 127),
    kx_kb_hf(cfl, x_factor_hf, b_factor_hf) (HF derivation from the
    64×64-tile factor sample), apply_sample / apply_lf_sample /
    apply_hf_sample for the per-sample reconstruction, and the
    plane-level apply_lf_plane_inplace(dx, dy, db, cfl) (constant
    per-frame (kX, kB)) + apply_hf_plane_inplace(dx, dy, db, w, h, x_from_y, b_from_y, cfl) (per-tile_x=x/64/tile_y=y/64
    lookup, with a per-tile (kX, kB) cache). 20 new unit tests + 11
    new integration tests (round138_chroma_from_luma) pin the
    default-bundle multipliers (kX = 1/84, kB = 1 + 1/84), the
    Y-identity line, the round-trip against the encoder-side
    decorrelation dX = X - kX × Y, multi-tile HF plane lookup
    (128×64 → 2 tiles wide, 65×65 → 4 tiles via div_ceil), and the
    defensive colour_factor == 0 rejection on both LF and HF paths.
    Lib tests 442 → 462. This is a pure-math primitive in the same
    shape as round-89 dct_quant_weights, round-95 hf_dequant, and
    round-121 llf_from_lf: it lands the bit-exact arithmetic so a
    future round wiring §F.3 + Annex G into the per-LfGroup VarDCT
    pipeline can drop it in without re-deriving any G.1 formulae.
    Does not handle subsampled chroma (Annex G excludes that case
    outright) and does not drive the per-LfGroup loop (deferred).

  • Round 133 (2021-FDIS / 2024-spec) — §C.7.1 DecodePermutation()
    for used_orders != 0.
    HfPass::read now handles the
    non-natural coefficient-order path of Listing C.12: the shared
    "8 clustered distributions D" are read once into a
    modular_fdis::EntropyStream (num_dist = 8) with its ANS state
    initialised, then each set used_orders bit runs the §C.3.2
    Lehmer-code permutation against that same stream. New public
    coeff_order::decode_permutation_from_stream(br, entropy, hybrid, size, skip) factors the §C.3.2 procedure generically (the same
    algorithm the TOC permuted_toc path uses); §C.7.1 supplies
    size = coefficient_count(order) and skip = size / 64, yielding
    order[i] = natural_coeff_order[nat_ord_perm[i]]. HfPass::read
    no longer returns Error::Unsupported for used_orders != 0.
    Adds get_context + lehmer_to_permutation unit coverage and
    rewrites the two former hf_pass Unsupported tests to assert the
    stream-read path is now taken.

  • Round 129 (2021-FDIS / 2024-spec) — per-varblock LF→LLF
    composition glue (§I.2.5 plumbing).
    Three new public functions
    in vardct that compose the round-121
    [llf_from_lf::llf_from_lf] pure-math step with a single
    channel's dequantised LF samples for a single varblock placement:

    • vardct::extract_lf_subblock(lf_samples, lf_width, lf_height, bx, by, t) — extracts the cy × cx LF sub-block at varblock
      origin (bx, by) in row-major order, per FDIS §I.2.5 prose
      "the corresponding X/8 × Y/8 samples from the dequantized LF
      image". Returns Err(InvalidData) on dim-mismatch, origin
      overflow, or varblock extending past the LF grid (defensive
      bounds-checking before the indexing).
    • vardct::compose_lf_to_llf_block(lf_samples, lf_width, lf_height, bx, by, t)extract_lf_subblock + llf_from_lf
      in one call, returning the cy × cx LLF coefficient block of
      the top-left of an HF varblock.
    • vardct::compose_lf_to_llf_block_3ch(&LfDequantOutput, bx, by, t) — convenience wrapper that invokes the per-channel helper
      once for each of the three colour channels (X, Y, B) when no
      channel is subsampled (the common case where §F.2 adaptive LF
      smoothing applied); rejects mismatched per-channel dims with a
      clear InvalidData message pointing the caller at the
      per-channel compose_lf_to_llf_block for the subsampled case.

    24 new tests (15 unit in src/vardct.rs + 9 integration in
    tests/round129_compose_lf_to_llf.rs). Covers DCT8×8 / DCT16×16
    / DCT32×32 squares, all six DCT16×8-class rectangles (DCT16×8,
    DCT8×16, DCT32×8, DCT8×32, DCT32×16, DCT16×32), the nine non-DCT
    pass-through transforms (Hornuss / DCT2×2 / DCT4×4 / DCT4×8 /
    DCT8×4 / AFV0..AFV3), every kind of out-of-bounds varblock
    placement (x-only, y-only, both, and DCT32×32 at the only
    fitting origin), LfDequantOutput subsampling rejection, and
    byte-exact agreement with the hand-derivable dc * ScaleF(cy, bheight, 0) * ScaleF(cx, bwidth, 0) formula for every
    rectangular transform on a constant input.

    This is the geometry glue between rounds 12/13 (per-LfGroup
    LF dequant + smoothing) and rounds 91+/95 (HF coefficient ANS
    decode + HF dequantisation). A future round wiring the §F.x
    pipeline into decode_codestream can drop these helpers in as
    the per-varblock loop body without re-deriving any LF→LLF
    geometry or §I.2.5 prose mechanics. Total lib tests: 422 → 437
    (+15); total integration test files: 41 → 42 (+1).

    Round 129 also intentionally does not chase the
    noise-64x64-lossless sample-194 wp_pred8 = 717 vs spec
    divergence: the trace doc retired 2026-05-06 still has no
    replacement in docs/image/jpegxl/ per the project_jpegxl_ pixel_blocked memory note (DOCS-GAP unchanged across r126 and
    r129). The deep-trace plumbing from r126 remains the stable
    baseline for the future Specifier round.

  • **Round 126 (2021-FDIS) — Self-correcting WP deep-trace plumbing

    • sample-194 hand-derivation against Listings E.1/E.2/E.3.** New
      WP_DEEP_TRACE + WP_DEEP_TRACE_ARMED thread-locals in
      modular_fdis capture the 20-entry intermediate snapshot
      (subpred[0..4], err_sum[0..4], post-shift weight_shifted[0..4],
      sum_weights_pre, log_weight, sh, sum_weights_post, nn8,
      ww8, pred_pre_clamp, clamped_flag) for the trace-target
      sample. The existing LEAF_PICK_TRACE_WP only exposes
      (te_w, te_n, te_nw, te_ne, w8, n8, nw8, ne8, wp_pred8, max_error) — round 126 fills in the missing nn8/ww8 + Listing
      E.1/E.2/E.3 internals so a by-hand FDIS re-derivation against
      pinned ground-truth is possible.

    New test tests/r126_wp_intermediates_at_194.rs (~150 lines,
    2 tests + a docstring with the full hand-derivation). Pins:
    wp_pred8 = 717 at the noise-64x64-lossless sample 194
    (y=3, x=2, channel 0); the 20-entry deep trace; the 3-plane
    first-divergence scan vs expected.png. The hand-derivation
    in the module docstring proves that NEITHER the subpred[3]
    sign knob NOR the s_init - 1 knob (the two FDIS-vs-current
    deviations round 32 swept independently) can produce a
    prediction in [709..716] from the captured neighbour state.
    The fix must come from somewhere else — most likely a
    state-evolution bug in sub_err or a WpHeader parameter
    mismatch. Round 126 also tried the FDIS-literal sub_err
    formula (abs(((p_i + 3) >> 3) - true_value) per FDIS line
    6832 vs the legacy (abs(p_i - tv*8) + 3) >> 3); the noise
    fixture's wp_pred8 at sample 194 was unchanged, but the
    synth_320 drift-bisect fixture regressed (first drift moved
    from y=24,x=14 to y=11,x=104), so the change is reverted in
    this round and parked for the docs-collaborator behavioural
    trace promised in project_jpegxl_pixel_blocked.

    Net deliverable: deeper diagnostic plumbing + a stable pinned
    baseline for the next round to compare hypotheses against.
    Seven small lossless fixtures + synth_320 baselines untouched;
    the noise fixture's plane[0] first-mismatch boundary remains
    at linear index 194 (dec=35 vs exp=34).

  • Round 121 (2021-FDIS / 2024-spec) — §I.2.5 LLF-from-LF
    pure-math step (Listings I.15 + I.16)
    . New src/llf_from_lf.rs
    (~500 LOC + 28 unit tests + 16 integration tests in
    tests/round121_llf_from_lf.rs) lands the bridge from §F.2's
    dequantised+smoothed LF samples into the top-left LLF coefficient
    block of each HF varblock — the step the trailing prose of
    §F.2 hands off to §I.2.7 (renumbered §I.2.5 in the 2021 FDIS).

    Public API: scale_i8(n, u), scale_d8(n, u), scale_i(n, u),
    scale_d(n, u), scale_c(n_big, n_small, x),
    scale_f(n_big, n_small, x) (FDIS Listing I.15 closed-form
    helpers); dct_1d(input) -> Result<Vec<f32>> (FDIS §I.2.1
    forward 1-D DCT, sizes 1..=32); dct_2d(samples, rows, cols) -> Result<Vec<f32>> (§I.2.2 Listing I.3 forward 2-D DCT, algorithmic
    inverse of [idct::idct_2d]); llf_dims(t) -> (u32, u32)
    (LF-block dims per TransformType); llf_from_lf(input, t) -> Result<Vec<f32>> (Listing I.16 verbatim, including the non-DCT
    pass-through cases for Hornuss / DCT2×2 / DCT4×4 / DCT4×8 /
    DCT8×4 / AFV0..3).

    44 new tests pin: (a) the Listing I.15 closed forms — I8(8, 0)
    = sqrt(0.5)/2, D8 = 1/(N·I8), the N=8 branch of I/D, C(N, N, x)
    = 1, C reciprocal-on-swap, ScaleF(1, 8, 0) = 1.0 (DCT8×8 corner
    identity), (b) the §I.2.1 1-D forward DCT formula via the
    unit-impulse closed form and the constant-signal DC-only result,
    (c) byte-exact LLF blocks for DCT8×8 (single-cell identity),
    DCT16×16 with both constant-block and impulse-block inputs
    (out[y·2+x] = 0.25 · SF(2,16,y) · SF(2,16,x)),
    DCT16×8 / DCT8×16 rectangular paths, DCT32×32 dimension
    contract, and the non-DCT pass-through across all nine
    single-8×8-block transforms.

    dct_2didct::idct_2d round-trip verified at 4×4 to f32
    epsilon, confirming the forward DCT is the precise algorithmic
    inverse of the round-12 IDCT.

  • Round 95 (2021-FDIS / 2024-spec) — §F.3 HF dequantisation
    pure-math step
    . New src/hf_dequant.rs (~310 LOC + 13 unit
    tests) implements the FDIS p. 72 Annex F.3 HF coefficient
    dequantisation formula verbatim: Listing F.2 bias-adjust
    (*= quant_bias[c] for |q| <= 1, -= quant_bias_numerator / quant otherwise), per-block HfMul multiplier, per-channel
    0.8^(x_qm_scale - 2) / 0.8^(b_qm_scale - 2) factor (Y
    channel exempt), and the §C.6.2 per-(channel, transform_type, coeff_index) dequant-matrix entry from the
    round-89 dct_quant_weights::DequantMatrixSet.

    Public API: bias_adjust(quant: i32, channel: usize, oim: &OpsinInverseMatrix) -> f32, QmScaleFactors::for_frame(&FrameHeader),
    QmScaleFactors::for_channel(channel) -> f32,
    dequant_hf_coefficient(quant, channel, hf_mul, dequant_matrix_entry, oim, qm) -> f32,
    dequant_hf_pre_matrix(...) (partial product helper).

    10 new integration tests
    (tests/round35_hf_dequant.rs) pin Listing F.2 branch
    boundaries (zero, ±1, |q|>1 subtractive bias sign-preservation),
    the FDIS default quant_bias_numerator = 0.145 fixed-point
    quant=2 → 1.9275, the 0.8^(u(3) - 2) exponent sweep, and
    the cross-module composition against
    materialise_default_dequant_set() for X / Y channels at the
    DCT8×8 corner cell. Y channel verified to skip the qm-scale
    factor; X channel under default x_qm_scale = 3 verified to
    pick up a 0.8 factor.

    Made FrameHeader::default_with pub(crate) (was private) so
    the new hf_dequant unit tests can construct a default
    FrameHeader without going through bit-stream parsing.

    Round 95 lands the bit-exact F.3 arithmetic so the future
    round that wires the per-block ANS coefficient decode (the
    round-90 followup blocked on the shared 8-cluster ANS stream

    • §C.7.2 histograms) can drop the integer ANS reader on top
      without re-deriving any formulae. CfL (Annex G) and IDCT
      (Annex I.2) still chain afterwards.
  • Round 90 (2021-FDIS / 2024-spec) — HfPass + PassGroup HF
    structural parsers
    . Three new modules surface the §C.7.1 /
    §C.7.2 HfPass bundle and the §C.8.3 PassGroup HF entry-points,
    preparing the HF coefficient decode pipeline for the per-block
    ANS-stream wiring scheduled for round 91+.

    New src/coeff_order.rs (~430 LOC + 12 tests): §I.2.4 natural
    coefficient ordering for every OrderId 0..=12 (Table I.1).
    Builds the LLF prefix sorted by y × bwidth + x followed by
    the HF tail sorted by (key1, key2) per Listing I.14. Public
    API: OrderId, varblock_size_for_order, natural_coeff_order,
    coefficient_count, order_id_for_transform,
    COEFFICIENTS_PER_ORDER.

    New src/hf_pass.rs (~290 LOC + 7 tests): §C.7.1 Listing C.12
    parser. Reads used_orders = U32(Val(0x5F), Val(0x13), Val(0), Bits(13)). The used_orders == 0 fast path materialises all 13
    natural orders directly per the listing's else branch.
    used_orders != 0 returns Error::Unsupported — the permutation
    reads need the shared 8-cluster ANS stream that §C.7.2 histograms
    also feed; wiring that shared stream is round-91 work. Exposes
    num_histogram_distributions = 495 × num_hf_presets × nb_block_ctx so the next round knows the §C.7.2 read count
    up-front. Also exposes read_hf_pass_sequence for the per-pass
    loop.

    New src/pass_group_hf.rs (~460 LOC + 18 tests): §C.8.3 first
    line + Listing C.13. Reads hfp = u(ceil(log2(num_hf_presets))),
    validates hfp < num_hf_presets, computes
    histogram_offset = 495 × nb_block_ctx × hfp. Verbatim
    transcriptions of block_context, non_zeros_context,
    coefficient_context, predicted_non_zeros, plus the two
    64-element CoeffFreqContext / CoeffNumNonzeroContext ladder
    tables as pub const arrays. The actual per-block ANS
    coefficient decode loop defers to a later round (it requires the
    shared per-pass ANS stream from §C.7.2).

    New integration suite tests/round34_hf_pass_pass_group_hf.rs
    (12 tests) exercises the typed surface end-to-end at the
    structural level — HfPass used_orders == 0 parse + all 13
    natural orders, §C.8.3 hfp range checks, BlockContext default-
    map paths, NonZerosContext continuity at the
    predicted == 8 boundary, CoefficientContext with the listed
    ladder constants, PredictedNonZeros four-arm dispatch table.

    Test delta: +49 tests (332 → 381 lib tests; new integration
    suite contributes 12 more). No fixture-level pixel decode
    changes; the seven small lossless fixtures continue to decode
    pixel-correct, and the two committed VarDCT fixtures still hit
    their existing round-13 deferral gate (next round's HF dequant

    • per-block decode flips that gate).

    Spec gap: none new. Listing C.12 / Listing C.13 / Listing I.14
    / Table I.1 are unambiguous on the round-90 contract scope.

    Followups (round 91+): (a) shared per-pass 8-cluster ANS stream
    init, (b) used_orders != 0 DecodePermutation reads, (c)
    §C.7.2 histogram read (495 × num_hf_presets × nb_block_ctx
    clustered distributions), (d) per-block coefficient decode loop
    per the C.8.3 prose right after Listing C.13, (e) §F.3 HF
    dequantisation gluing the round-89 dequant matrices to the
    newly decoded coefficients.

  • Round 89 (2024-spec) — GetDCTQuantWeights + Table I.6 default
    dequantization-matrix materialisation
    (parent-dispatch r89). New
    src/dct_quant_weights.rs (~1k LOC + tests) transcribes the
    ISO/IEC 18181-1:2024 §I.2.4 / §I.2.5 + Table I.4 + Table I.6
    listing block from page 58-60 of the published core PDF:

    • mult(v) — spec Mult piecewise function
      (1+v if v > 0 else 1/(1-v)).
    • interpolate(pos, max, bands) — spec Interpolate with the
      2024 corrected A * pow(B/A, frac_index) form. Includes
      defensive clamping when pos == max (would otherwise index
      past bands.size() - 1).
    • compute_dct_weights(params, x_dim, y_dim) — spec
      GetDCTQuantWeights per the post-typo-fix 2024 listing
      (bands loop closes BEFORE the weights matrix double-loop,
      correcting the FDIS 2021 PDF's nested-loop bug).
    • materialise_weights_for_dct_select(bundle, channel, X, Y)
      per-mode (DCT, DCT4, DCT2, Hornuss, DCT4x8, AFV)
      weights-matrix dispatch per §I.2.4 page 58 prose +
      Listing C.11 for AFV.
    • materialise_dequant_for_channel(bundle, channel, X, Y)
      element-wise reciprocal of the weights matrix per
      §I.2.4 last paragraph. Validates the
      "no non-positive or infinity" spec invariant.
    • materialise_default_dequant_set() — the full 17-slot ×
      3-channel default set per Table I.6 (page 60),
      transcribed verbatim including the SeqA / SeqB /
      SeqC abbreviated sequences from the spec footnote and
      the dct4x4_params constant for slots 3 (DCT4×4) and 10
      (AFV).
    • weights_matrix_dims_for_slot(slot) — Table I.4 page 57
      dimensions lookup (0..=16).
    • slot_for_transform(t)TransformType (Table C.16
      0..=26) → Table I.4 slot (0..=16) mapping; multiple
      transforms share a slot (e.g. DCT16×8 and DCT8×16 both
      map to slot 6).

    Test count: 26 new tests (15 unit tests in
    src/dct_quant_weights.rs + 11 integration tests in
    tests/round33_dct_quant_weights.rs). Every cell of every
    channel of every default slot is verified positive-finite per
    the §I.2.4 invariant. Spot-checks include:

    • DCT8×8 slot 0 channel 0 (0,0) cell = 1 / 3150.0 (reciprocal of
      Table I.6 row-0 head).
    • Hornuss slot 1 (0,0) cell = 1.0 (spec sets weights(0,0) = 1).
    • AFV slot 10 8×8 fully populated (Listing C.11 covers all 64
      cells across the freqs interpolation + weights4x8 + weights4x4
      fills).

    Spec-listing typo notes (recorded in module doc-comment):

    • FDIS 2021 PDF Listing C.10 has the for (y, x) { ... }
      weights double-loop INSIDE the for (i = 1; i < len; i++)
      bands loop — would compute the matrix len - 1 times. The
      2024 published edition (docs/image/jpegxl/ ISO_IEC_18181-1-JPEG-XL-Core-2024.pdf page 58) corrects this.
      Module follows the 2024 form.
    • 2024 Interpolate drops len (uses bands.size()) and
      writes pow(B / A, frac_index) instead of FDIS 2021's
      A * (B / A)^frac_index. Mathematically identical.

    SPECGAP recorded: DCT2 cell (0, 0) is not assigned by the spec
    listing block (page 58). Implementation fills it with
    params(c, 0) (same value used for i == 0 neighbours) so the
    dequant reciprocal is finite. The 6-rectangle assignments cover
    62 of 64 cells, plus (1, 1); (0, 0) is the only unmentioned
    position. Recommend a spec clarification.

    Unblocks: downstream HF coefficient dequantisation per §F.3 on
    the HfGlobal u(1) == 1 default-encoding fast path. The
    non-default branch's RAW encoding mode still requires a
    modular sub-bitstream decode (deferred to round 90+ alongside
    the §F.3 wiring).

    Spec citations: ISO/IEC 18181-1:2024 page 58 (Listing for
    Interpolate / Mult / GetDCTQuantWeights), page 59
    (Listing C.11 AFV weights + per-mode prose), page 60
    (Table I.6 default matrix parameters), page 57 (Table I.4
    weights-matrix dimensions). Cross-referenced against ISO/IEC
    FDIS 18181-1:2021 PDF (extractable) Listing C.10 / Table C.18
    / Table C.20 (the 2021 equivalents).

    Fixture count remains 7 pixel-correct lossless small fixtures
    (no change — round 89 is upstream of the pixel-decode flow;
    HfGlobal default-encoding parsing remains unchanged in
    behaviour).

  • Round 77 (2024-spec) — animation-3frame SPECDIFF audit + docs
    citation.
    Two new audit-grade integration tests
    (tests/r77_animation_3frame_specdiff.rs) characterise the
    docs/image/jpegxl/fixtures/animation-3frame/input.jxl fixture
    (cjxl 0.12.0, 78 B, 3 RGB Regular Modular frames of 32×32 with
    have_animation = 1). The probe-level path is correct
    (probe_fdis recovers SizeHeader + ImageMetadata with
    have_animation = true + AnimationHeader populated); the
    decode-level path remains blocked on a real spec-edition split
    between ISO/IEC 18181-1:2021 FDIS Table C.9 (which our
    RestorationFilter::read follows; no leading all_default
    field) and the published 2024 Table J.1 (which prepends an
    all_default Bool() to the bundle plus a u(32) "(ignored)"
    field after epf_channel_scale). Bit-trace bisect (recorded in
    the test file's module docs):

    • The two-bit RF SPECDIFF lifts our FrameHeader bit count from
      39 to 40 for the animation fixture, which lets `permuted_toc
      • pu0correctly land the TOC entry U32 at byte 11 of the codestream; that read yieldsentry value = 16, matching the libjxl trace's total_bytes = 16`.
    • The seven currently-pixel-correct lossless fixtures were
      encoded by cjxl 0.11.1 against the 2021 FDIS layout and do
      NOT include the leading all_default bit; landing the
      2024-Table-J.1 fix straightforwardly breaks
      alpha-64x64.jxl. The audit recommendation (recorded in the
      test docs) is to re-encode the seven fixtures with cjxl
      0.12.0+ before applying the 2024-spec fix uniformly. This is
      a docs-collaborator follow-up — there is no codestream-level
      edition tag, so a single-pass parser cannot dispatch between
      the two RF layouts without a heuristic.
    • Spec citations: ISO/IEC 18181-1:2024 Table J.1
      (docs/image/jpegxl/ISO_IEC_18181-1-JPEG-XL-Core-2024.pdf
      page 70) and ISO/IEC FDIS 18181-1:2021 Table C.9
      (pdftotext-extractable lines 4088-4101). Trace fixture at
      docs/image/jpegxl/fixtures/animation-3frame/trace.txt.

    Fixture count remains 7 pixel-correct lossless small fixtures
    (no change). Test count grows by 2 (audit harness).

Changed

  • Round 32 (2024-spec) — noise-64x64-lossless pixel-divergence
    bisected to the Self-correcting weighted predictor at the first
    y >= 2, x >= 2 sample whose predictor == 6; root cause
    localised, fix deferred pending a libjxl-trace doc that this
    workspace does not yet ship.
    The fixture count therefore stays
    at 7 pixel-correct lossless fixtures (status quo). No source-file
    semantic changes this round; the diagnostic harness used to
    bisect was removed before commit and the regression set remains
    green.

    Round 31 left the noise fixture as a "decodes without EOF, but
    pixels diverge from expected.png starting at plane[0] sample
    194" follow-up. Round 32 reproduced that divergence and pinned
    it down further:

    • The first divergence is at plane[0] (y=3, x=2) — the FIRST
      sample whose predictor is 6 (Self-correcting) and which has
      the full set of WP neighbours N, W, NW, NE, NN, WW populated
      (i.e. y >= 2 && x >= 2). The prior predictor == 6 samples
      in rows y = 0 and y = 1 all decoded pixel-correct because
      their WP path takes the NN does not exist → NN = N
      fall-back. Two predictor == 6 samples on row y = 2 also
      decoded correctly because WW = W was used (the bug requires
      WW ≠ W, i.e. x >= 2).
    • At sample 194 the WP machinery produces wp_pred8 = 717
      (Listing E.3 weighted sum). The spec rounding `(wp_pred8 + 3)

      3then yieldsp = 90, giving v = diff + p = -55 + 90
      = 35— butexpected.pngsays34. So wp_pred8is 1 too high modulo the rounding (any value in[709..716]would givep = 89and thencev = 34). The MA-tree leaf, the decoded token, the diff -55, and wp_max_error` all match what the
      neighbour state legitimately implies — the discrepancy is
      purely in the WP weighted sum.

    • Bisected against WP_ROUND_BIAS ∈ {0..=7}, s_init ∈ {(sum_weights >> 1) - 1, (sum_weights >> 1), sum_weights, 0},
      the subpred[3] sign (FDIS N + … vs. round-3 code N - …),
      and the clamp condition (<= 0 vs >= 0). Every alternative
      either re-introduces an EARLIER divergence (samples 68, 79,
      142) on the noise fixture, OR breaks one of the seven
      currently-pixel-correct lossless fixtures. So the bug is NOT
      in any of the dimensions our spec text exposes a knob for.
    • Suspected residual root cause: a subtle interaction between
      the FDIS error2weight formula's outer >> shift step (only
      in the 2024 published edition and the round-3 code; absent
      from FDIS 2021 literal text), the four sub-predictor weights,
      and the final s × ((1 << 24) Idiv sum_weights) >> 24
      division. Most likely the libjxl reference uses an s_init
      formula that depends on the shifted vs unshifted
      sum_weights in a way the FDIS spec text does not disclose.
      Resolving this needs either (a) a behavioural trace of the
      libjxl WP path on the noise fixture at sample 194 captured by
      the docs collaborator, or (b) the docs collaborator's
      promised docs/image/jpegxl/libjxl-trace-reverse-engineering.md
      section on §H.5.2 Sub-predictions (referenced in the
      project_jpegxl_pixel_blocked memory note, but the file does
      not yet exist in docs/image/jpegxl/).

    Round-32 scope therefore closes with the bisect finding above
    recorded and the regression set green. No .gitignore / Cargo
    changes; no API surface deltas. The §F.3 zero-pad fix from
    round 31 stays in place and noise-64x64-lossless continues to
    decode-complete (just with non-byte-exact pixels).

    Spec citations: FDIS Annex E.1 (Sub-predictions, Listing E.1),
    E.2 (Prediction weights, Listing E.2), E.3 (Prediction, Listing
    E.3), and Table H.3 row predictor == 6 (`(prediction + 3)

    3`).

Added

  • Round 31 (2024-spec) — §F.3 zero-pad uniformly applied to the
    single-TOC-entry LfGlobal fast path; noise-64x64-lossless now
    decodes without EOF
    (parent-dispatch "r16" option A). One
    narrow src/lib.rs::decode_codestream delta:

    • Pre-round-31, when num_groups == 1 && passes == 1 && toc.entries.len() == 1, the decoder routed LfGlobal::read
      through the non-padding main BitReader (pad_eof_with_zeros == false). The other LfGlobal path already used
      BitReader::new_section (which implements FDIS §F.3's
      section-bit-budget + zero-pad rule). For six of the seven small
      lossless fixtures the entire LfGlobal section had enough
      trailing slack that the read never touched the padding region;
      noise-64x64-lossless (cjxl -d 0 -e 7, 64×64 high-entropy RGB
      Modular, MA tree nodes=167 leaves=84) does NOT — its
      per-pixel ANS / hybrid-uint refill loop on the final samples
      reaches a few bits past the byte budget that the spec says must
      read as zero. Pre-round-31 the non-padding reader errored
      instead → InvalidData("unexpected end of JXL bitstream").

    • The fix collapses both LfGlobal-read branches into one path
      that always uses BitReader::new_section against the
      toc-declared section byte range. This makes the single-section
      fast path bit-for-bit equivalent to the multi-section path on
      its real-data prefix, and applies §F.3 zero-pad uniformly.

    Spec citation: FDIS §F.3 first paragraph — "When decoding a
    section, no more bits are read from the codestream than 8 times
    the byte size indicated in the TOC; if fewer bits are read, then
    the remaining bits of the section all have the value zero."

    Test added: tests/r31_noise_lossless.rs with two cases —
    noise_64x64_lossless_decodes_without_eof_error (locks the
    shape of the post-fix VideoFrame: 3 RGB planes, stride=64,
    data.len()=4096 each) and pre_round31_seven_lossless_fixtures_ still_decode (regression sentinel: the seven pre-round-31
    fixtures all decode successfully under the unified path).
    Committed fixture pair under tests/fixtures/:
    noise_64x64_lossless.jxl (13 505 B) +
    noise_64x64_lossless_expected.png (12 505 B, 8-bit RGB PNG).

    Known limitation NOT fixed this round: while
    noise-64x64-lossless now decode-completes (vs hard-EOF), the
    produced pixels are not yet byte-identical to expected.png.
    The first divergence is plane[0] (R) at (2, 3) — i.e. samples
    0..193 of plane 0 match, and from sample 194 on ~98 % of samples
    diverge. The divergence point is deterministic and well within
    the section's real-byte budget, so the §F.3 fix is independent
    of the residual pixel-divergence. Suspected root cause: a
    latent state-evolution bug in either the MA-tree leaf decode
    with num_contexts > 16 (the leaf-stream EntropyStream's
    cluster_map is 84 → 3 clusters here, vs ≤ 6 → ≤ 4 in every
    other lossless fixture), the Self-correcting WP state on
    high-entropy neighbour history, or the hybrid-uint extra-bits
    path for large n_extra values. Deferred to round 32 — needs
    the round-24-style per-cluster trace replayed against the
    cleanroom Python reference at ~30 distinct bit positions across
    the 108 kbit symbol stream.

    Docs gap noted: docs/image/jpegxl-cleanroom/reference-impl/
    (referenced in the round-31 brief as the place to bisect
    against) does not yet exist; the round-30 deferral note pointed
    at it as a future bisect target. The §F.3 fix landed without
    needing it — pure spec-text bisect against FDIS §F.3 was
    sufficient. The reference-impl directory would still be useful
    for the residual pixel-divergence bisect; ask the docs
    collaborator to populate it for round 32.

  • Round 30 (2024-spec) — bit-depth-16 RGB pixel-correct decode +
    16-bit LE plane-pack convention
    (parent-dispatch "r15" option A).
    Lifts the fixture count from 6 to 7 by adding bit-depth-16
    (docs/image/jpegxl/fixtures/bit-depth-16/input.jxl, 421 B,
    64×64 RGB lossless Modular at bits_per_sample = 16) and
    documents the wider-than-8-bit pack convention forced on us by
    oxideav-core 0.1.x's bit-depth-less VideoPlane.

    Two narrow src/lib.rs::decode_codestream deltas:

    1. Bit-depth gate widened. The pre-round-30 hard reject
      metadata.bit_depth.bits_per_sample != 8 now accepts
      bps ∈ 1..=16. The XYB and YCbCr branches (FDIS Annex L.2.2 /
      L.3) still hard-require bps == 8 because their dequantisation
      lattice is calibrated against the 8-bit output range — a
      specific Error::Unsupported("jxl decoder (round 30): XYB high-bit-depth (bps={...}) deferred") now precedes the
      transform call. Float (float_sample == true) and bps > 16
      remain unsupported.

    2. Pass-through plane pack dispatches on bps. The previous
      loop unconditionally clamped each i32 sample to [0, 255]
      and pushed one byte per sample with stride == width. The
      new loop:

      • bps ≤ 8 — unchanged: 1 byte/sample, stride == width,
        sample clamped to [0, 2^bps - 1].
      • 9 ≤ bps ≤ 16 — 2 bytes/sample little-endian,
        stride == width × 2, sample clamped to [0, 2^bps - 1],
        packed via u16::to_le_bytes.

      The LE-pack choice is documented in
      crates/oxideav-jpegxl/README.md under "Plane byte layout"
      (new section) so that downstream consumers (cli-convert /
      etc.) know how to reinterpret a wide plane as &[u16]. PNG's
      RFC 2083 §2.1 ships big-endian 16-bit samples; we deliberately
      pick LE so a bytemuck::cast_slice::<u8, u16> on a
      little-endian host is a zero-cost view (vs forcing a per-sample
      swap).

    Test count: tests/round30_bit_depth_16.rs adds 3 tests
    (bit_depth_16_rgb_pixel_correct_vs_expected_png — full 64×64×3
    16-bit byte-for-byte match against the committed
    bit_depth_16_expected.png;
    bit_depth_16_le_pack_convention_self_consistent — invariant
    check on stride/length/round-trip;
    pre_round30_8bit_fixtures_still_byte_packed — regression
    sentinel for the four pre-existing 8-bit byte-packed fixtures).
    Committed fixture pair under tests/fixtures/:
    bit_depth_16.jxl (421 B) + bit_depth_16_expected.png
    (375 B, 16-bit RGB PNG).

    Cross-checked against djxl v0.11.1 as a black-box oracle (PPM
    output → byteswap BE→LE → byte-equal to our planes). Crate now
    decodes 7 small lossless Modular fixtures pixel-correct vs
    expected.png (was 6): pixel-1x1, gray-64x64,
    gradient-64x64-lossless, palette-32x32, grey_8x8_lossless,
    alpha-64x64, bit-depth-16.

    Spec citations: FDIS Annex A.6 + Table A.22
    (bit_depth.bits_per_sample bundle), Annex G.1.3 (Modular
    channel-order rule — colour channels share the global
    bits_per_sample, no per-channel bit-depth split for kModular
    RGB), PNG RFC 2083 §2.1 (PNG ships 16-bit big-endian, so our
    reference-PNG read uses u16::from_be_bytes).

    Docs gaps identified probing adjacent fixtures during round 30:
    noise-64x64-lossless (13.5 KB, nodes=167 leaves=84 per
    trace.txt) still fails inside LfGlobal::read with "unexpected
    end of JXL bitstream" — large MA-tree decode path likely
    mis-computes a hybrid-uint extra-bits count for a high-context
    leaf; deferred to round 31. vardct-256x256-d1 / d3 and
    noise-feature-256x256 fixtures all hit independent VarDCT
    pipeline gaps and are unrelated to round 30.

  • Round 29 (2024-spec) — alpha-64x64 RGBA pixel-correct decode +
    ISOBMFF signature-strip fix
    (parent-dispatch "r14" option A).
    Two narrow lib-level fixes in src/lib.rs::decode_one_frame /
    decode_codestream unblock the docs cleanroom alpha-64x64
    4-channel Modular lossless fixture (docs/image/jpegxl/fixtures/ alpha-64x64/input.jxl, 86 B) for pixel-exact decode against the
    committed expected.png (8-bit RGBA, 64×64):

    1. ISOBMFF FF 0A strip. The jxlc/jxlp box payload IS a JXL
      codestream and therefore begins with the 2-byte FF 0A
      codestream signature (FDIS Annex B.1). The RawCodestream branch
      already stripped those 2 bytes before handing off to
      decode_codestream; the ISOBMFF branch did NOT. The result was
      a 16-bit misalignment at the SizeHeader::read parse that
      cascaded into apparently-unrelated downstream failures
      (bit-depth-16 tripped JXL permutation: LZ77-enabled TOC sub-stream not supported because the TOC permuted flag bit
      parsed as 1 instead of 0). Now the ISOBMFF branch validates the
      FF 0A prefix and strips it symmetric with the raw path. A new
      unit test wraps gradient-64x64-lossless in a minimal ISOBMFF
      (signature + ftyp + jxlc) and asserts plane-by-plane equality
      vs. the raw decode (tests/round29_alpha_rgba_pixel.rs:: isobmff_wraps_raw_codestream_decodes_identically).

    2. Extra-channel mapping. The post-Modular channel-count check
      n_chans != expected_chans rejected RGBA Modular frames
      because the Modular decoder lays out colour and extra channels
      in a flat array of length expected_chans + num_extra_channels
      (FDIS Annex G.1.3 colour-then-extras channel-order rule). The
      check now also accepts the with-extras length and emits a
      trailing VideoFrame plane per extra channel. For
      alpha-64x64 this maps directly to 4 RGBA planes; for
      hypothetical multi-extra fixtures (depth, spot colour, …) the
      same path extends N-ways. The XYB-encoded / YCbCr branches are
      unchanged — those still require exactly 3 colour channels and
      fall through if extras are present (round-30+ work).

    Test count: tests/round29_alpha_rgba_pixel.rs adds 3 tests
    (alpha_64x64_rgba_pixel_correct_vs_expected_png — full 64×64×4
    byte-for-byte match; five_pre_round29_fixtures_still_pass
    regression sentinel for pixel-1x1 / gray-64x64 / gradient-64x64 /
    palette-32x32 / grey_8x8_lossless; isobmff_wraps_raw_codestream_ decodes_identically — synthetic ISOBMFF wrap of
    gradient-64x64). Committed fixture pair under tests/fixtures/:
    alpha_64x64.jxl (86 B) + alpha_64x64_expected.png (283 B).

    Crate now decodes 6 small lossless Modular fixtures pixel-correct
    vs expected.png (was 5): pixel-1x1, gray-64x64,
    gradient-64x64-lossless, palette-32x32, grey_8x8_lossless,
    alpha-64x64.

    Spec citations: FDIS Annex B.1 (codestream signature),
    Annex G.1.3 (channel order), Annex A.6 + A.9 + Table A.22
    (ImageMetadata + ExtraChannelInfo).

    Docs gaps identified probing adjacent fixtures: bit-depth-16
    (421 B) reaches the 8-bit-only post-Modular check (decoder needs
    a 16-bit output-pack path before VideoFrame mapping — deferred);
    noise-64x64-lossless (13.5 KB) fails inside LfGlobal with
    "unexpected end of JXL bitstream" suggesting the high-entropy
    random-RGB MA tree exercises a code path not yet covered
    (deferred).

  • Round 28 (2024-spec) — non-DCT IDCT helpers (parent-dispatch
    "r13" item 3). Extends src/idct.rs with five new public helpers
    that complete the IDCT surface for the non-DCT TransformType
    variants:

    • aux_idct_2x2(block, S) — Annex I.9.3 Hadamard-style butterfly on
      the top-left S × S cells of an 8×8 buffer (S ∈ {1, 2, 4, 8}).
    • idct_dct2x2(coefficients) — Annex I.9.3 closing recipe (chained
      aux_idct_2x2 calls at S=2, 4, 8).
    • idct_dct4x4(coefficients) — Annex I.9.4: per-2×2-quadrant 4×4
      IDCT_2D over interleaved coefficient cells with a DC patch from
      aux_idct_2x2(coefficients, 2).
    • idct_hornuss(coefficients) — Annex I.9.5: per-quadrant
      block-LF + residual-sum centre cell + neighbour-fill + corner
      corrective.
    • idct_dct8x4(coefficients) — Annex I.9.6: column-major Hadamard
      pair into two 4×8 (rows × cols) IDCT_2D halves tiled into rows
      [0..4) and [4..8) of the 8×8 output.
    • idct_dct4x8(coefficients) — Annex I.9.7: dual of dct8x4,
      row-major Hadamard pair into two 4×8 halves tiled by row.

    idct_for_transform(t, coefficients) now dispatches Hornuss,
    Dct2x2, Dct4x4, Dct8x4, Dct4x8 to the dedicated helpers in
    addition to the 18 plain-DCT variants from r12. Afv0..Afv3 continue
    to return Err(Unsupported) pending an independently verified
    256-entry AFVBasis table (deferred to a later round to avoid a
    high-risk OCR transcription).

    New helper non_dct_pixel_dims(t) returns Some((8, 8)) for the
    nine non-DCT TransformType variants and None for plain-DCT — the
    output of all five new helpers is always an 8×8 row-major buffer
    (length 64), matching the closing entries of Listings I.9.3..I.9.8.

    Test count: lib idct::tests 36 → 57 (+21 new — 8 covering
    aux_idct_2x2 validation/butterfly/preserve/DC, 6 covering DC-only

    • per-quadrant correctness for the five helpers, 5 covering length
      validation, 2 covering non_dct_pixel_dims); integration tests
      +5 in new tests/round13_non_dct_idct.rs plus 1 updated
      assertion in tests/round12_idct_dispatch.rs (renamed
      idct_for_transform_non_dct_transforms_return_unsupported
      idct_for_transform_afv_only_unsupported_after_round_13,
      reflecting that only the AFV variants remain unsupported).

    Spec-gap notes inline in the module documentation enumerate the OCR
    transcription work deferred for AFVBasis.

  • Round 27 (2024-spec) — IDCT dispatch (parent-dispatch "r12" item
    5). New src/idct.rs (~470 LOC including tests) wires the
    spec-conformant 1-D inverse DCT (FDIS Annex I.2.1) for power-of-two
    sizes s ∈ {1, 2, 4, 8, 16, 32, 64, 128, 256} and the 2-D inverse
    DCT (Annex I.2.2 Listing I.4) handling rectangular R × C blocks.

    Three public entry points: idct_1d(input) for the bare 1-D form,
    idct_2d(coefficients, output_rows, output_cols) for the 2-D form
    taking coefficients in the spec's (short × long) row-major natural-
    ordering layout (Annex I.2.4) and returning samples in (R × C)
    row-major, and idct_for_transform(t, coefficients) which dispatches
    on a dct_select::TransformType to the appropriate 2-D IDCT for the
    18 plain-DCT transform types in Table C.16 (DCT8x8, DCT16x16,
    DCT32x32, DCT16x8, DCT8x16, DCT32x8, DCT8x32, DCT32x16, DCT16x32,
    DCT64x64, DCT64x32, DCT32x64, DCT128x128, DCT128x64, DCT64x128,
    DCT256x256, DCT256x128, DCT128x256). The 9 non-DCT transforms
    (Hornuss, DCT2x2, DCT4x4, DCT4x8, DCT8x4, AFV0..AFV3) — Listings
    I.7..I.13 — return Err(Unsupported) and are deferred to round 13+.

    Companion helper dct_pixel_dims(t) returns the (rows, cols)
    output shape for plain-DCT TransformType variants and None for the
    non-DCT transforms.

    31 lib unit tests in idct::tests (1-D length validation, DC-only
    consistency for all 9 supported sizes, 1-D round-trip via private
    forward DCT oracle for sizes 8/16/32/64, 1-D AC[1] hand-computed
    spec-formula reference, 2-D length / shape validation, 2-D DC-only
    consistency for 12 DCT block sizes, 2-D round-trip via 2-D forward
    oracle for 8x8/16x8/8x16/16x16/32x32, dispatch validation for
    DCT8x8/16x16/32x32/8x16/16x8 + every non-DCT TransformType returning
    Unsupported, dct_pixel_dims completeness for both branches); 5
    integration tests in tests/round12_idct_dispatch.rs (1-D DC-only
    for all sizes, 2-D DC-only for every plain-DCT block size,
    Unsupported sentinel for every non-DCT transform, 2-D round-trip for
    asymmetric 8x16 and 16x8 via inline forward oracle, five-fixture
    Modular regression sentinel). Total test count 345 → 381 (+36 net).

    No new fixture coverage — the IDCT lands as a callable primitive that
    round 13's PassGroup HF coefficient decode + F.3 dequantisation will
    feed. The legacy vardct::idct1d_8 and vardct::idct2d_8x8 (round 8
    scaffold, scaled-orthonormal IDCT) are kept untouched for backward
    compatibility but are NOT spec-conformant; new HF-decode wiring will
    call through idct::idct_for_transform exclusively.

  • Round 26 (2024-spec) — Annex L colour transforms (parent-dispatch
    "r11"). New src/xyb.rs (~210 LOC) transcribes FDIS §L.2.2 inverse
    XYB → linear RGB and §L.3 inverse YCbCr → RGB verbatim from the
    ISO/IEC 18181-1:2024 spec text. Three public entry points:
    inverse_xyb_to_rgb(x, y, b, oim, tone_mapping),
    inverse_ycbcr_to_rgb(cb, y, cr), and the convenience composite
    modular_xyb_to_linear_rgb(y_prime, x_prime, b_prime, lf_dequant, oim, tone_mapping) which folds in the §L.2.2 preamble step
    (X = X' * m_x_lf_unscaled, Y = Y' * m_y_lf_unscaled,
    B = (B' + Y') * m_b_lf_unscaled). Helper linear_rgb_to_u8
    clamps + rounds the linear [0, 1] output to 8 bits.

    Wired into decode_codestream modular output stage: when
    metadata.xyb_encoded == true AND colour_encoding.colour_space == Rgb (3 colour channels), the per-channel pass-through is replaced
    with build_rgb_planes_from_xyb which walks every pixel through
    the inverse transform. Symmetric build_rgb_planes_from_ycbcr
    branch handles frame_header.do_ycbcr == true. The original
    pass-through path is preserved for the common case
    (xyb_encoded=false AND do_ycbcr=false) so all five small lossless
    fixtures continue to pixel-correct decode.

    9 unit tests in xyb::tests (DC zero-input, spec-listing
    hand-computed reproduction, intensity_target linear scaling,
    modular preamble multiplier check, YCbCr neutral / red-dominant,
    linear→u8 clamping, X-sign-flip symmetry); 6 integration tests
    in tests/round11_xyb_inverse.rs (forward-→-inverse round-trip
    for neutral grey AND saturated red using a hand-computed Cramer's-
    rule matrix inversion of oim.inv_mat, YCbCr neutral, u8
    quantisation reference values, end-to-end zero-input modular wrapper,
    and five-fixture pass-through regression sentinel). Total test count
    345 → 362 (+17 net: 9 lib + 6 integration + 2 from earlier round-21
    recount).

    No fixture decoded that didn't decode before — round 11 lays the
    colour-transform foundation, but no modular-XYB or modular-YCbCr
    fixture is currently committed (cjxl encodes photo-content XYB
    inputs as VarDCT by default; the rare modular-XYB path needs a
    hand-built minimal trace, deferred to round 12+ or a docs-
    collaborator commission). The two committed VarDCT fixtures
    (vardct_256x256_d1.jxl, vardct_256x256_d3.jxl) still terminate
    at the round-13 "round 14+: HF subband decode + IDCT not yet wired"
    Unsupported.

    SPECGAP documented in xyb::linear_rgb_to_u8 doc comment: §L.2.2
    outputs linear-domain RGB (NOTE in spec) but the spec doesn't
    prescribe a gamma encoding step before display — strict conformance
    defers gamma application to a downstream colour-management consumer.
    The crate emits linear bytes (clamp + scale by 255 + round); spec
    callers needing sRGB-encoded bytes should apply sRGB transfer
    themselves.

    Wall respected: spec PDF (Annex L pages 82-84 read directly), no
    external library source consulted, no libjxl-trace-reverse- engineering.md (retired). OpsinInverseMatrix defaults already
    transcribed in metadata_fdis::OpsinInverseMatrix::default()
    (round-2) from FDIS Table L.1 independently; the new module
    consumes those constants without re-reading the table. Test count
    362, fmt + clippy clean against 1.95 toolchain.

  • Round 24 (2024-spec, Auditor mode) — pursued round-23 candidates
    (1) per-cluster ANS distribution byte-trace for clusters 0+1 and
    (2) per-call alias-mapping invariant audit. Result: both paths
    falsified
    . Cluster 0 (19 nonzero entries) and cluster 1 (23
    nonzero entries) both sum to 4096; the alias table built from each
    D[] routes probability mass to symbols identically to the declared
    D[] (per-symbol routed-mass divergence = 0 for both clusters);
    across the FULL 3072-call ANS trace the spec C.3.2
    (symbol, offset) = AliasMapping(state & 0xFFF) invariant holds
    bit-for-bit when checked against either cluster 0 or cluster 1's
    alias table (0 hard violations; 288 ambiguous calls where both
    clusters yield the same (symbol, offset, prob)). Per-call state
    arithmetic state = prob * (state >> 12) + offset also reproduces
    the trace exactly. Cluster usage breakdown: c0=1755 calls,
    c1=1317 calls, unknown=0 (no cross-talk into HFMetadata clusters
    2/3/4). The d1 ANS final-state delta of 0x21914271 - 0x00130000 ≈ 562M is therefore NOT caused by a per-cluster D[]
    shape mismatch, alias-table self-map / Vose-pump bug,
    alias-mapping lookup bug, per-call state-arithmetic bug, or
    cluster-routing leakage. Round 25 candidates: (1) D[]-vs-cjxl
    reference comparison (a single mismatched count would be the
    smoking gun), (2) leaf-pick + cluster-routing audit at samples
    beyond sample 22 up to sample 79 (where r23's first ctx-flip was
    observed), (3) HFMetadata stream-boundary cross-talk audit. New
    diagnostic tests/round24_d1_disttrace.rs (Auditor mode, never
    asserts) with two tests:
    d1_per_cluster_distribution_byte_trace_round_24 (path 1) and
    d1_per_call_alias_mapping_invariant_round_24 (path 2). Full
    audit notes in crates/oxideav-jpegxl/round24-d1-disttrace.md.
    Test count 343 → 345 (+2).

  • Round 22 (2024-spec, Auditor mode) — pursued round-21 candidates
    (a) lf_quant first-256-sample dump per channel and (c) WP (p+3)>>3
    rounding bias toggle on the d1 LfCoefficients sub-bitstream. Result:
    WP-rounding-bias bug class falsified. Added a runtime atomic
    WP_ROUND_BIAS (default 3, spec-conformant per ISO/IEC 18181-1:2024
    Table H.3 + FDIS-2021 Listing C.16) so the auditor can sweep biases
    without recompile. Sweeps recorded post-decode ANS final state for
    bias ∈ {0, 3, 4, 7}: 0 → 0x0042cd42 (|Δ|=3 132 738), 3 → 0x21914271
    (|Δ|=561 922 673, spec), 4 → 0x00fd721e (|Δ|=15 364 638), 7 →
    0x001214ac (|Δ|=60 244). All four miss the §D.3.3 sentinel
    0x00130000; the +7 bias being closest proves the variation is
    ANS-chain noise from leaf-flip cascades, not a true rounding bug.
    Per-channel lf_quant dump (Y'/X'/B', 1024 samples each, 32×32) shows
    smooth low-frequency shape with sane stats (Y' mean=468 min=326
    max=644; X' mean=14 min=−125 max=135; B' mean=41 min=−49 max=123),
    consistent with a real-image fixture and proving the per-sample
    decode loop is producing plausible data — not garbage
    . WP+3 vs +4
    diverges first at Y' sample 22 (row 0, col 22), localising the actual
    bug to a specific MA-tree leaf-flip at that sample. New diagnostic
    tests/round22_d1_sample_dump.rs (Auditor mode, never asserts) dumps
    both the lf_quant table and the bias-sweep final states; full audit
    notes in crates/oxideav-jpegxl/round22-d1-sampledump.md. Test count
    337 → 338 (+1).

  • Round 21 (2024-spec, Auditor mode) — pursued round-20 candidates
    (1) per-cluster distribution decode bisect and (2) alias-table
    self-map branch audit on the d1 LfCoefficients sub-bitstream.
    Result: both paths falsified. The 5 per-cluster ANS distributions
    (clusters 0..4) all sum to 4096 with sane shapes (cluster sizes
    19/23/5/2/2 nonzero entries out of 64); cluster 1's full 64-entry
    alias table reconciles with the round-19 bit-faithful trace at calls
    #0 and #1. Critically, none of the five clusters has any D[i] == bucket_size entry, so the alias-table self-map branch (round-3
    fix territory) is not triggered for d1. Documented one strict-spec
    divergence in AliasTable::build (else vs spec's else if (cutoffs[i] < bucket_size)) that has zero observational effect on
    d1 — hand-tracing the equal-bucket path confirms output-equivalent
    behaviour. New diagnostic tests/round21_d1_dist_alias_dump.rs
    (Auditor mode, never asserts) captures per-cluster (cfg, D, alias)
    triples + cluster-1 full alias dump as evidence; full bisect notes
    in crates/oxideav-jpegxl/round21-d1-distbisect.md. Test count
    336 → 337 (+1).

  • Round 20 (2024-spec, Auditor mode) — re-interpreted cjxl
    JXL_TRACE output's bits_consumed field as section-local (not
    cumulative file position), invalidating the round-17/18/19 claim of a
    267-bit overshoot in LfCoefficients. Empirical proof: in the same
    trace, AC_GLOBAL_END bits_consumed=307 while DC_GLOBAL_END=1026,
    so 307 < 1026 precludes a cumulative reading. With the corrected
    interpretation DC_GROUP is 12754 bits (not 11728), LfCoefficients
    fits well within the budget, and HfMetadata's slot is 759 bits.

    Identified a stronger oracle for the actual divergence: per FDIS
    D.3.3, the ANS state must equal 0x00130000 after the final symbol
    in any stream. Wired LATEST_ANS_STATE / LATEST_ANS_CALL_COUNT
    thread-locals (in src/ans/symbol.rs) so a test can read the
    post-decode state without holding the per-stream MaTreeFdis clone.
    On d1's LfCoefficients the final state is 0x21914271 after 3072
    decode_symbol calls — proving a structural decode divergence (wrong
    per-cluster distribution, wrong alias mapping, wrong sample count, or
    wrong read in the per-sample loop). The state never reaches the
    sentinel within 3072 calls, so it's not a sample-count off-by-one.

    Lifted the previous 30-call cap on STATE_TRACE_BUF so end-of-stream
    bisects over multi-thousand-sample LF channels are tractable. Five
    new tests in tests/round20_d1_*.rs. See
    crates/oxideav-jpegxl/round20-d1-hfmeta.md for the full audit and
    the round-21 candidate ranking.

  • Round 19 (2024-spec, Auditor mode) — extended the per-token
    trace ring with (ctx, cluster, ans_refill_bits) and added a
    STATE_TRACE_BUF recording the first 30 ANS state transitions for
    spot-checking against raw codestream bits. New
    AnsDecoder::decode_symbol_with_refill reports refill-bit cost. New
    tests/round19_d1_cluster.rs drives d1 LfCoefficients under the
    extended trace and emits per-cluster / per-ctx histograms plus a
    diagnostic eprintln on the leaf-stream EntropyStream::read prelude
    bit count. Findings: prelude is bit-exact (602 bits matching cjxl's
    num_contexts=16 num_histograms=5 log_alpha_size=6), cluster_map is
    bit-exact (16 → 5 distinct clusters), state transitions are
    bit-faithful to raw codestream. The 267-bit overshoot remains
    unexplained; deferred to round 20 with cjxl --debug per-call
    bit-position trace as the proposed next-step. See
    crates/oxideav-jpegxl/round19-d1-cluster.md for the full audit.