Release v0.0.10 · OxideAV/oxideav-jpegxl

Other

round-191 (parent-dispatch r191) against ISO/IEC FDIS 18181-1:2021 — Annex E / §H.5.2 Weighted-Predictor oracle test driven by clean-room behavioural trace at noise-64x64-lossless sample 194
round-190 (parent-dispatch r190) against ISO/IEC FDIS 18181-1:2021 — typed per-pass NonZeros(x, y) grid container above the round-183 per-channel primitive
round-183 (parent-dispatch r183) against ISO/IEC FDIS 18181-1:2021 — typed per-channel NonZeros(x, y) grid container layered above round-177 single-channel primitive
round-177 (parent-dispatch r177) against ISO/IEC FDIS 18181-1:2021 — typed NonZeros(x, y) grid bookkeeping + per-varblock decode driver
round-164 (parent-dispatch r164) against ISO/IEC FDIS 18181-1:2021 — TransformType-driven entry points for the §C.8.3 per-block HF coefficient decode loop
round-159 (parent-dispatch r159) against ISO/IEC FDIS 18181-1:2021 — §C.8.3 per-block HF coefficient decode loop scaffolding (Listings C.13 + C.14)
round-150 (parent-dispatch r150) against ISO/IEC FDIS 18181-1:2021 — Annex I.2.3.8 Listing I.13 Inverse AFV transform wired into idct dispatch
round-147 (parent-dispatch r147) against ISO/IEC FDIS 18181-1:2021 — Annex I.2.2 AFV basis + AFV_IDCT pure-math primitive (Listings I.5 + I.6)
round-144 (parent-dispatch r144) against ISO/IEC FDIS 18181-1:2021 — Annex J.3 edge-preserving-filter pure-math primitive
round-141 (parent-dispatch r141) against ISO/IEC FDIS 18181-1:2021 — Annex J.2 Gabor-like-transform pure-math primitive
round-138 (parent-dispatch r138) against ISO/IEC FDIS 18181-1:2021 — Annex G Chroma-from-Luma pure-math primitive (Listing G.1)
round-133 (parent-dispatch r133) against ISO/IEC FDIS 18181-1:2021 — §C.7.1 DecodePermutation() for used_orders != 0
Round 129: per-varblock LF→LLF composition glue (§I.2.5 plumbing)
Round 126: WP deep-trace plumbing + sample-194 hand-derivation
round-121 (parent-dispatch r121) against ISO/IEC FDIS 18181-1:2021 — §I.2.5 LLF-from-LF pure-math step (Listings I.15 + I.16)
round-95 (parent-dispatch r95) against ISO/IEC FDIS 18181-1:2021 — §F.3 HF dequantisation pure-math step
round-90 (parent-dispatch r90) against ISO/IEC 18181-1:2021 FDIS — HfPass + PassGroup HF structural parsers
round-89 (parent-dispatch r89) against ISO/IEC 18181-1:2024 — GetDCTQuantWeights + Table I.6 default dequantization-matrix materialisation
rewrite lf_dequant comment to remove libjxl numeric-defaults citation
round-77 fixup — inline animation-3frame fixture under crate-local tests/fixtures/
round-77 (parent-dispatch r17) against ISO/IEC 18181-1:2024 — animation-3frame SPECDIFF audit harness
round-32 (parent-dispatch r17) against ISO/IEC 18181-1:2024 — noise-64x64-lossless pixel-divergence bisected to WP at first predictor=6 sample with WW/NN both in-image; fix deferred pending libjxl-WP behavioural trace
round-31 (parent-dispatch r16) against ISO/IEC 18181-1:2024 — §F.3 zero-pad uniformly applied to single-TOC-entry LfGlobal fast path
round-30 (parent-dispatch r15) against ISO/IEC 18181-1:2024 — bit-depth-16 RGB pixel-correct + 16-bit LE plane-pack convention
round-29 (parent-dispatch r14) against ISO/IEC 18181-1:2024 — alpha-64x64 RGBA pixel-correct + ISOBMFF FF 0A strip
round-28 (parent-dispatch r13) against ISO/IEC 18181-1:2024 — non-DCT IDCT helpers (Annex I.9.3..I.9.7)
round-27 (parent-dispatch r12) against ISO/IEC 18181-1:2024 — IDCT dispatch (Annex I.2.1 + I.2.2 Listing I.4)
round-26 (parent-dispatch r11) against ISO/IEC 18181-1:2024 — Annex L colour transforms (XYB inverse + YCbCr inverse)
round-25 (Auditor mode) against ISO/IEC 18181-1:2024 — d1 LfCoefficients per-sample rich-state range dump 22..=79
round-24 (Auditor mode) against ISO/IEC 18181-1:2024 — d1 per-cluster D[] byte trace + per-call alias-mapping invariant audit
round-23 (Auditor mode) against ISO/IEC 18181-1:2024 — d1 leaf-pick property dump at Y' sample 22 + WP y=0 boundary audit
round-22 (Auditor mode) against ISO/IEC 18181-1:2024 — d1 lf_quant sample dump + WP rounding bias toggle
round-21 (Auditor mode) against ISO/IEC 18181-1:2024 — d1 per-cluster distribution + alias-table self-map audit
round-20 followup — refresh round-19 trace eprintln with corrected DC_GROUP budget
round-20 (Auditor pivot) against ISO/IEC 18181-1:2024 — DC_GROUP boundary recount + ANS-final-state oracle
round-19 (Auditor mode) — d1 cluster + ANS state evolution audit
round-18 (Auditor mode) against ISO/IEC 18181-1:2024 — per-token bit accounting trace + drift narrowed

Added

Round 191 (2021-FDIS) — Annex E / §H.5.2 Weighted-Predictor
oracle test driven by clean-room behavioural trace at sample 194 of
noise-64x64-lossless. New tests/r191_wp_trace_oracle.rs (5
tests) and new pub fn modular_fdis::wp_predict_pub test wrapper
around the production wp_predict. The oracle consumes the
docs/image/jpegxl/fixtures/noise-64x64-lossless/wp-trace-sample-194.md
trace (provenance recorded alongside as wp-trace-provenance.md),
which records the FDIS-conformant per-listing intermediates an
instrumented reference decoder produces at the
(channel 0, x=2, y=3) divergence point bisected in rounds 31..126:
- r191_wp_predict_matches_trace_at_sample_194 — drives the
  production wp_predict with the trace's WpState/Neighbours
  inputs; asserts the four sub-predictions [1248, 747, 420, 559],
  the final pre-round prediction 709, and max_error = 737 all
  reproduce exactly. Result: PASS — proves Annex E.2 Listings
  E.1 (sub-predictions), E.2 (err_sum_i + error2weight), E.3
  (weighted sum + same-sign clamp), and E.4 (max_error) are
  spec-correct in wp_predict, isolating the still-unfixed
  sample-194 wp_pred8 = 717 vs trace 709 off-by-8 divergence to
  upstream state evolution (set_true_err / set_sub_err
  calls fired across samples 0..193) rather than the predictor
  arithmetic itself.
- r191_trace_err_sum_self_consistency — pure-arithmetic sanity
  check on the trace's sub_err_{i,N/NE/NW} table summing to the
  reported err_sum_i ([438, 330, 416, 240]).
- r191_trace_weights_match_error2weight — hand-derives the
  trace's weight_i = [495694, 599189, 474830, 825112] from
  FDIS-literal error2weight(err_sum_i, wp_w_i); documents a
  1-unit inner-Idiv-vs-multiplication-first discrepancy with the
  production reading that does NOT affect sample 194's shifted
  weights (both readings give [3, 4, 3, 6] after the Listing E.3
  >> sh step).
- r191_trace_prediction_matches_listing_e3 — independent
  hand-derivation of prediction = 709 from Listing E.3 inputs,
  including verification that the same-sign clamp predicate fires
  but is a no-op (pre-clamp 709 ∈ [min(W,N,NE)=584, max(W,N,NE)=
  1232]).
- r191_pin_state_evolution_gap — pins the production-vs-trace
  delta as a roadmap for the next round's bisect: Δ te_w = +21,
  Δ te_nw = -21 (symmetric pair → likely a single upstream
  defect), Δ wp_pred8 = +8 in 8x scale = +1 in un-shifted pixel
  space (matches r126_first_divergence_scan dec=35 vs exp=34).
  Spec citations and provenance attestation embedded in the test
  module docstring; references the in-repo FDIS §E.1-E.4 line
  numbers and the trace doc's stated prediction − true_value
  sign convention. Trace doc is the newly-staged
  docs/image/jpegxl/fixtures/noise-64x64-lossless/wp-trace-*.md
  pair landed alongside this round (tasks #820 + #1077). Issues #6,
  #64, #799.
Round 190 (2021-FDIS) — typed per-pass NonZeros(x, y) grid
container (FDIS §C.8.3 + Listing C.13 per-pass keying). New
per_pass_non_zeros module that owns one
per_channel_non_zeros::PerChannelNonZerosGrids per pass index
p ∈ [0, num_passes), layered above the round-183 per-channel
container. A VarDCT frame is decoded in num_passes ordered passes
(declared in FrameHeader.passes.num_passes); each pass scans every
PassGroup once and §C.8.3 specifies that within a pass each
channel of each varblock maintains its own NonZeros(x, y) state.
Between passes the per-channel bookkeeping is reset because the
per-pass histogram is selected by hfp from the per-pass HfPass
array — a different pass uses a different histogram and the
prediction recurrence is keyed against the current pass's own
coefficient counts. The new module captures the per-pass routing
layer above round 183's per-channel routing layer:
- PerPassNonZerosGrids::new(pass_dims: &[&[(u32, u32)]]) -> Result<Self>
  — per-pass per-channel (width, height) slice, validated
  entry-by-entry via PerChannelNonZerosGrids::new (zero / oversize
  dims rejected per channel; empty pass-list rejected).
- PerPassNonZerosGrids::new_uniform(num_passes, num_channels, width, height) -> Result<Self> — convenience builder for the
  uniform-per-pass case.
- PerPassNonZerosGrids::{num_passes, pass, pass_mut, predicted, get, set, update_after_block, update_after_block_for_transform} —
  per-pass routing accessors; out-of-range p errors cleanly.
- PerPassNonZerosGrids::decode_block_at_for_pass_channel(p, c, x, y, t, block_ctx, nb_block_ctx, read_non_zeros, decode_symbol) -> Result<(DecodedHfBlock, u32)> — typed per-pass per-channel
  driver that wraps the round-183
  PerChannelNonZerosGrids::decode_block_at_for_channel with pass
  routing. Caller pre-computes block_ctx via
  pass_group_hf::block_context with the matching c; the
  container is a pure storage + routing primitive and does not
  re-derive pass_group_hf::block_context nor materialise the
  per-pass histogram.
- Per-pass per-channel shapes are independent — ragged per-pass
  channel counts are tolerated.
41 new tests (28 unit in per_pass_non_zeros::tests + 13 integration
in tests/round190_per_pass_non_zeros.rs) pin: empty-pass-list /
zero-channel-pass / zero-dim rejection; two-pass chroma-subsampled
construction; new_uniform convenience; out-of-range pass index
errors on every accessor (8 paths); PredictedNonZeros(0, 0) = 32
on every (pass, channel); per-pass write isolation; per-pass
predicted propagation reads back each pass's own history (not
another pass's); per-pass update_after_block_for_transform
dispatch (raw non_zeros = 17 → {17, 5, 2} at DCT8×8 / DCT16×16 /
DCT32×32 on three independent passes); per-pass
decode_block_at_for_pass_channel routing; two-pass three-channel
raster walk at (0, 0) / (1, 0) with distinct [4, 8, 12] /
[3, 6, 9] per-pass per-channel raw_non_zeros sequences preserves
cross-pass isolation; ragged per-pass channel counts (one-channel
DC-only preview followed by three-channel main); u32::MAX
no-panic saturating-add chain through the per-pass route. Lib
tests 608 → 636 (+28).
Round 183 (2021-FDIS) — typed per-channel NonZeros(x, y) grid
container (FDIS §C.8.3 + Listing C.13 channel-keying). New
per_channel_non_zeros module that owns one
non_zeros_grid::NonZerosGrid per channel, layered above the
round-177 single-channel primitive. Listing C.13's
BlockContext() factors c into (c < 2 ? c ^ 1 : 2) × 13 + s,
so the NonZeros(x, y) bookkeeping is keyed per-channel because
chroma subsampling + TransformType heterogeneity means each
channel's varblock-grid shape can differ:
- PerChannelNonZerosGrids::new(dims: &[(u32, u32)]) -> Result<Self>
  — per-channel (width, height) slice, validated entry-by-entry
  via NonZerosGrid::new (zero / > 65535 dims rejected; empty
  slice rejected).
- PerChannelNonZerosGrids::new_uniform(num_channels, width, height) -> Result<Self> — convenience builder for the
  unsubsampled 4:4:4-style container.
- PerChannelNonZerosGrids::{num_channels, grid, grid_mut, predicted, get, set, update_after_block, update_after_block_for_transform} — per-channel routing
  accessors; out-of-range c errors cleanly.
- PerChannelNonZerosGrids::decode_block_at_for_channel(c, x, y, t, block_ctx, nb_block_ctx, read_non_zeros, decode_symbol) -> Result<(DecodedHfBlock, u32)> — typed per-channel driver
  that wraps the round-177 non_zeros_grid::decode_block_at
  with channel routing. Caller pre-computes block_ctx via
  pass_group_hf::block_context with the matching c; the
  container is a pure storage + routing primitive.
- DEFAULT_NUM_CHANNELS = 3 — the YCbCr / XYB canonical channel
  count.
36 new tests (24 unit in per_channel_non_zeros::tests + 12
integration in tests/round183_per_channel_non_zeros.rs) pin:
empty-channel-list rejection; zero-dim / oversize-dim rejection
on any channel; three-channel chroma-subsampled construction at
[(16, 16), (8, 8), (8, 8)]; new_uniform convenience;
out-of-range channel index errors on every accessor (8 paths);
PredictedNonZeros(0, 0) = 32 on every channel; per-channel
write isolation; per-channel predicted horizontal chain on a
seeded channel-1 grid; update_after_block_for_transform
dispatch (raw non_zeros = 17 → {17, 5, 2} at DCT8×8 /
DCT16×16 / DCT32×32 on three independent channels);
decode_block_at_for_channel routes the round-177 typed driver
per channel; post-update cell feeds the next-position predicted
value back per-channel; OOB (x, y) past the per-channel grid
errors cleanly; a two-step three-channel raster walk at
(0, 0) / (1, 0) with distinct [4, 12, 20] /
[6, 18, 30] per-channel raw_non_zeros sequences preserves
cross-channel isolation.

Lib tests 584 → 608 (+24). Pure-control-flow primitive in the
same shape as round-89 dct_quant_weights, round-95
hf_dequant, round-121 llf_from_lf, round-138
chroma_from_luma, round-141 gaborish, round-144 epf,
round-147 afv_idct, round-159 / 164 pass_group_hf, and
round-177 non_zeros_grid — no bit reads, no spec
re-derivation. A future round wiring §C.7.2 entropy histograms
(#799 DOCS-GAP) + the per-LfGroup varblock-shape grid +
per-channel BlockContext() history can drop these helpers in
as the per-channel step without re-deriving any Listing C.13 /
C.14 formulae.
Round 177 (2021-FDIS) — typed NonZeros(x, y) grid bookkeeping +
per-varblock decode driver (FDIS §C.8.3 + Listing C.13 prelude +
Listing C.14 post-prose). New non_zeros_grid module bridging
round 159 pass_group_hf::predicted_non_zeros (the four-branch
PredictedNonZeros(x, y) recurrence) with round 164
pass_group_hf::read_non_zeros_and_decode_block_for_transform
(the TransformType-driven per-block coefficient loop):
- NonZerosGrid::new(width, height) -> Result<Self> — rectangular
  varblock-grid storage of NonZeros(x, y) cells. Defensive
  rejection of zero dims + dims > 65535.
- NonZerosGrid::{get, set, width, height, cells} — accessors.
- NonZerosGrid::predicted(x, y) -> Result<u32> — delegates to
  pass_group_hf::predicted_non_zeros against
  |xx, yy| self.get(xx, yy).unwrap_or(0).
- NonZerosGrid::update_after_block(x, y, non_zeros, num_blocks) -> Result<u32> — FDIS post-Listing-C.14 prose formula
  (non_zeros + num_blocks - 1) Idiv num_blocks (ceiling-divide
  identity, saturating_add at u32::MAX).
- NonZerosGrid::update_after_block_for_transform(x, y, non_zeros, t) — num_blocks from pass_group_hf::transform_block_params.
- non_zeros_grid::decode_block_at(grid, x, y, t, block_ctx, nb_block_ctx, read_non_zeros, decode_symbol) -> Result< (DecodedHfBlock, u32)> — typed per-varblock driver: computes
  predicted, invokes
  read_non_zeros_and_decode_block_for_transform, then calls
  update_after_block_for_transform before returning the
  (DecodedHfBlock, raw_non_zeros) pair.
35 new tests (23 unit in non_zeros_grid::tests + 12 integration
in tests/round177_non_zeros_grid.rs) pin: defensive rejection
of zero / oversize (> 65535) dims and out-of-range (x, y);
zero-init cells; PredictedNonZeros(0, 0) = 32 across a sweep
of grid shapes; the y == 0 and x == 0 border-recurrence branches
via horizontal / vertical raster chains; the interior
(above + left + 1) >> 1 average (odd-sum rounding); the
predicted_non_zeros helper agreement byte-for-byte on a seeded
3×3 grid; the post-Listing-C.14 ceiling-divide formula at
num_blocks ∈ {1, 4, 16} (DCT8×8 / DCT16×16 / DCT32×32 — the
TransformType dispatch reduces a raw non_zeros = 17 to
{17, 5, 2} at the three shapes); the typed driver's
predicted = 32 at the origin routes through the predicted >= 8 NonZerosContext branch (ctx = block_ctx + nb_block_ctx × (4 + 32 Idiv 2) = 67 at (block_ctx, nb_block_ctx) = (7, 3));
decode_block_at reads back (0, 0)'s post-update cell when
invoked at (1, 0); OOB positions error cleanly; per-channel
independence (two grids of the same shape evolve
independently); row-major cells() layout pinned at [0, 10, 20, 30] after writing (1,0)=10, (0,1)=20, (1,1)=30 on a
2×2 grid; and pathological u32::MAX does not panic.

Lib tests 561 → 584 (+23). Pure-control-flow primitive in the
same shape as round-89 dct_quant_weights, round-95
hf_dequant, round-121 llf_from_lf, round-138
chroma_from_luma, round-141 gaborish, round-144 epf,
round-147 afv_idct, and round-159 / 164 pass_group_hf — no
bit reads, no spec re-derivation. A future round wiring §C.7.2
entropy histograms (#799 DOCS-GAP) + the per-LfGroup
varblock-shape grid + per-channel BlockContext() history can
drop these helpers in as the per-varblock-position step without
re-deriving any Listing C.13 / C.14 formulae.
Round 164 (2021-FDIS) — TransformType-driven entry points for
the §C.8.3 per-block HF coefficient decode loop (DCT16×16 /
DCT16×8 / DCT32×32 dimensions pinned end-to-end). New public API
in pass_group_hf:
- transform_block_params(t: TransformType) -> (num_blocks, size)
  — §I.2.4 opening paragraph + Listing C.14: num_blocks = (bwidth / 8) × (bheight / 8), size = bwidth × bheight.
- decode_block_coefficients_for_transform(t, initial_non_zeros, block_ctx, nb_block_ctx, decode_symbol) — typed wrapper that
  derives (num_blocks, size, natural_order) from t (via
  [coeff_order::order_id_for_transform] +
  [coeff_order::natural_coeff_order]) and reduces to the
  round-159 decode_block_coefficients.
- read_non_zeros_and_decode_block_for_transform(t, predicted, block_ctx, nb_block_ctx, read_non_zeros, decode_symbol) —
  analogous typed wrapper around
  read_non_zeros_and_decode_block.
  20 new tests (8 unit in pass_group_hf::tests + 12 integration
  in tests/round164_dct16x16_block_coefficient_loop.rs) pin the
  (num_blocks, size) derivation for every Table C.16 transform
  (every entry satisfies num_blocks * 64 == size); the DCT16×16
  prev threshold at non_zeros == 17 (= size/16 + 1); the typed
  entry point at DCT8×8 reduces to the raw entry point; the typed
  entry point at DCT16×16 walks (num_blocks=4, size=256) for
  all-zero / single-non-zero / three-consecutive / full-density
  (252 reads) cases with coefficients landing at
  natural_coeff_order(Id2)[4..]; the typed and raw entry points
  agree byte-for-byte on a mixed [2, 0, 4, 0, 0, 6] sequence;
  read_non_zeros_and_decode_block_for_transform threads the
  NonZerosContext value through the first closure; the rectangular
  DCT16×8 / DCT8×16 collapse to the same per-block outcome (they
  share OrderId::Id4); defensive rejection of initial_non_zeros > size - num_blocks (= 252 max for DCT16×16); and one DCT32×32
  smoke-test at (num_blocks=16, size=1024). Lib tests 553 → 561
  (+8). Pure-typed wrapper layer: no new bit reads, no spec
  re-derivation — the round-159 module note ("the primitive itself
  is shape-agnostic and ready for the larger variable-block sizes
  once their parameterisation lands") is now exercised from the
  caller-facing API.
Round 159 (2021-FDIS) — §C.8.3 per-block HF coefficient decode
loop scaffolding (Listing C.13 + Listing C.14). New public API in
pass_group_hf:
- prev_for_context(k, num_blocks, size, non_zeros, prev_nonzero)
  — Listing C.14 verbatim (k == num_blocks ? (non_zeros > size / 16 ? 1 : 0) : (prev_nonzero(k - 1) ? 1 : 0)).
- DecodedHfBlock { coeffs, remaining_non_zeros, coeffs_read } —
  return bundle for the per-block primitive.
- decode_block_coefficients(natural_order, num_blocks, size, initial_non_zeros, block_ctx, nb_block_ctx, decode_symbol) —
  Listing C.14's per-block raster-order loop with the §C.8.3
  "stop when non_zeros reaches 0" early-exit, UnpackSigned
  application, and natural_order[k] placement. The
  decode_symbol: FnMut(ctx) -> Result<u32> closure abstracts
  over the (still un-landed) §C.7.2 entropy histograms — a real
  consumer wraps EntropyStream + HybridUintState + the
  per-group histogram_offset; tests can hand-roll a symbol
  sequence.
- read_non_zeros_and_decode_block(.., predicted, .., read_non_zeros, decode_symbol) — convenience wrapper that issues the
  D[NonZerosContext(predicted) + offset] read via the first
  closure and drives decode_block_coefficients with the result.
  Returns (DecodedHfBlock, non_zeros) so the caller can update
  its NonZeros-grid bookkeeping per `NonZeros(x, y) = (non_zeros
  - num_blocks - 1) Idiv num_blocks`.
Bounded scope: DCT8×8 alone — num_blocks = 1, size = 64,
OrderId::Id0 natural-coefficient order (the simplest case that
exercises the full state machine). The primitive itself is
shape-agnostic; the larger variable-block sizes (DCT16×16,
DCT32×32, AFV0..3, …) need their num_blocks / size parameters
threaded through the varblock driver above this primitive.

11 new unit tests (pass_group_hf::tests::*) + 11 integration
tests (round159_block_coefficient_loop) cover: all-zero block
(no symbol reads); single non-zero at the first HF slot (one
read, UnpackSigned(1) = -1 at natural_order[1]); three
consecutive non-zeros (loop stops after three reads); full
density (63 reads, LLF cell untouched); the size/16 threshold
for prev (crossover at non_zeros == 5 for DCT8×8); the
"previous coefficient is zero / non-zero" flag tracking through
the loop's history; defensive rejection of malformed
natural-order vectors, zero num_blocks, and over-large
initial_non_zeros; closure-threaded end-to-end smoke through
read_non_zeros_and_decode_block. Lib tests 538 → 553 (+15).

Pure-math / pure-control-flow primitive in the same shape as
round-89 dct_quant_weights, round-95 hf_dequant, round-121
llf_from_lf, round-138 chroma_from_luma, round-141
gaborish, round-144 epf, and round-147 afv_idct — a future
round wiring §C.7.2 histograms into the per-pass entropy stream
can drop this primitive in as the per-block loop body without
re-deriving any C.13 / C.14 formulae. The §C.7.2 entropy
histogram decode (#799 DOCS-GAP), the per-channel (Y / X / B)
non_zeros read in the varblock driver above this primitive,
the per-pass NonZeros-grid update, and the per-varblock
BlockContext() derivation remain follow-up work for subsequent
rounds.
Round 150 (2021-FDIS) — Annex I.2.3.8 / Listing I.13 Inverse AFV
transform composition (idct::idct_afv). Composes the round-147
crate::afv::afv_idct pure-math primitive (Listings I.5 + I.6)
with two idct_2d calls (one at 4×4, one at 4×8) per the
three-sub-block decomposition of Listing I.13 — yielding the
full 8×8 sample buffer for TransformType::Afv0..Afv3. With
this wiring the idct::idct_for_transform dispatcher routes
Afv0..Afv3 to idct_afv instead of returning
Err(Unsupported); all 10 non-DCT branches of Table I.4 are now
pure-math-complete (Hornuss / DCT2×2 / DCT4×4 / DCT8×4 / DCT4×8
- AFV0..AFV3). Each AFV variant's sub-block placement is
  controlled by flip_x = n & 1 / flip_y = n >> 1 (§I.2.3.8);
  the AFV sub-block additionally mirrors its read coordinates
  (flip_x == 1 ? 3 - ix : ix and the iy dual) per the inner
  loop of Listing I.13. Seven new property-style tests cover:
  rejection of non-AFV transforms / wrong lengths; all-zero
  input → all-zero output for all four variants; DC-only input
  → constant c(0,0) output (the three DC patches (c00+c01+c10) × 4, c00-c01+c10, c00-c01 collapse to 4·1, 1, 1
  respectively, with each sub-block's IDCT mapping a DC-only
  cell to a constant sub-block since AFVBasis row 0 = [0.25; 16] and IDCT_2D DC-only is constant); dense-AC input →
  every cell written; AFV0↔AFV1 x-axis flip swaps the AFV
  sub-block column reads; AFV0↔AFV2 y-axis flip swaps the 4×8
  sub-block y-band placement; linearity. Test-count delta:
  +7 (531 → 538).
FDIS typo documented in module docs. Listing I.13's final
source line reads samples_4×4(ix, iy) but the inner loop
iterates ix ∈ [0..8) and samples_4×4 only has columns
0..3, while the immediately preceding line computes
samples_4×8 = IDCT_2D(coeffs_4×8). Implementation reads from
samples_4×8 per context; the typo is now annotated alongside
the existing four Annex D / D.3 typos in the project
FDIS-typo memory.
Round 147 (2021-FDIS) — Annex I.2.2 AFV basis + AFV_IDCT
pure-math primitive (Listings I.5 + I.6, p. 76). New
src/afv.rs module transcribes the orthonormal AFVBasis[16][16]
table from Listing I.5 verbatim and the Listing I.6 cell-sum
samples[i] = sum_j coefficients[j] × AFVBasis[j][i]. Public
API:
- AFV_CELL_LEN: usize = 16 — the §I.2.2 4×4-as-flat-16 cell.
- AFV_BASIS: [[f32; 16]; 16] — verbatim Listing I.5.
- afv_idct(coefficients: &[f32]) -> Result<[f32; 16]> —
  Listing I.6.
The 256-float transcription is independently verified at the
table level: row-0 = [0.25; 16] (Listing I.5 line 1); row-4 =
two non-zero entries at columns 1 and 4, both at ±1/sqrt(2),
zero elsewhere (Listing I.5 line 5); per-row L2 unit-norm
(orthonormality diagonal); pairwise zero inner product
(orthonormality off-diagonal); afv_idct is linear; one-hot
coefficient input recovers AFVBasis[j] row-for-row;
||samples||_2 == ||coefficients||_2 (L2 conservation, an
orthonormal-basis property). A single transcription typo in any
of the 256 entries would fail at least one orthonormality sum.

10 new unit tests + 9 integration tests
(round147_afv_idct); lib tests 521 → 531. Pure-math primitive
in the same shape as round-89 dct_quant_weights, round-95
hf_dequant, round-121 llf_from_lf, round-138
chroma_from_luma, round-141 gaborish, and round-144 epf —
a future round wiring §I.2.3.8 Inverse AFV transform (Listing
I.13) into idct_for_transform can drop this helper in without
re-deriving any I.5 / I.6 cells. The Listing I.13 composition
(the coeffs_afv corner-load, the two IDCT_2D 4×4 / 4×8
sub-blocks, the flip_x / flip_y AFVn flip) remains
follow-up work because it depends on idct_2d for non-square
blocks plus the AFVn dispatch wiring; the §I.2.2 arithmetic
core landed in this round unblocks that follow-up.
Round 144 (2021-FDIS) — Annex J.3 "Edge-preserving filter"
pure-math primitive (pages 85–87). New src/epf.rs module
transcribes the four §J.3 listings as a self-contained pure-math
primitive: given a triple of three-channel f32 planes (the output
of round-141 Gaborish on the §I.2.5 + Annex G chain), per-call
scalar parameters (sigma, step_multiplier, zeroflush,
position_multiplier_border, channel_scale), and a
[frame_header::RestorationFilter] (Table C.9) for
epf_quant_mul / epf_sharp_lut[..] / epf_sigma_for_modular,
this module returns the per-pass output planes Listing J.4
prescribes. Public API:
- distance_step_0_and_1(x, y, b, w, h, x, y, cx, cy, scale) —
  Listing J.1 DistanceStep0and1 (the five-pixel cross-shape
  three-channel scaled L1 distance for passes 0 and 1).
- distance_step_2(...) — Listing J.1 DistanceStep2 (the
  single-sample three-channel scaled L1 distance for pass 2,
  under the literal (ix, iy) == (0, 0) reading of the free-
  variable bug — see DOCS-GAP).
- weight(distance, inv_sigma, position_multiplier, zeroflush)
  — Listing J.2 Weight() decreasing-function-of-distance
  kernel with the v <= zeroflush cutoff.
- inv_sigma_for_pass(step_multiplier, sigma) — Listing J.2's
  pre-computed step_multiplier × 4 × (sqrt(0.5) - 1) / sigma
  factor (rejects non-finite or non-positive sigma).
- vardct_sigma_from_listing_j3(quantization_width, sharpness, &rf) — Listing J.3's per-varblock sigma derivation with the
  max(1e-4, ..) clamp; the modular-mode branch uses
  rf.epf_sigma_for_modular directly.
- is_border_position(x, y) — Listing J.2's "either coordinate
  of the reference sample is 0 or 7 IMod 8" predicate driving
  the per-pixel epf_border_sad_mul selection.
- apply_step_5tap(Pass::Pass1 | Pass::Pass2, ..) — Listing
  J.4's 5-tap cross-shape kernel pass (passes 1 and 2); the
  distance metric is selected by the Pass discriminant.
- apply_step_13tap(..) — Listing J.4's 13-tap diamond kernel
  pass 0 (always using DistanceStep0and1).
- Pass — enum picking Pass0 / Pass1 / Pass2 for the dispatch.
§6.5 Mirror1D boundary handling is reused verbatim from
round-141 gaborish::mirror1d. 36 new unit tests + 12 new
integration tests (round144_epf) pin self-distance-is-zero on
constant planes for both metrics, per-channel-scale linearity,
offset symmetry for DistanceStep0and1, DistanceStep2 hand-
derived spatially-varying-plane case
(x:1×40 + y:2×5 + b:0×3.5 = 50), Weight() zero-distance
returns 1.0 / zeroflush cutoff / position-multiplier scaling,
Listing J.3 sigma at default rf sharpness 0 → 1e-4 clamp and
sharpness 7 → full quant, the is_border_position 8×8 grid
layout, constant-plane invariance across all three passes, and
the zero-channel-scale collapse to the uniform mean on a centre
impulse. Lib tests 485 → 521. Pure-math primitive in the same
shape as round-89 dct_quant_weights, round-95 hf_dequant,
round-121 llf_from_lf, round-138 chroma_from_luma, and
round-141 gaborish — a future round wiring §J.3 into the
per-frame restoration-filter pipeline can drop these helpers in
without re-deriving any of the J.1/J.2/J.3/J.4 listings. The
per-frame loop (calling each pass for each varblock under the
right epf_iters / per-block sigma / position-multiplier
conditions with output of pass i feeding pass i+1), the
sigma < 0.3 skip-the-block path, and the epf_iters > 0 skip
remain caller responsibilities (deferred to follow-up rounds).
DOCS-GAP observed in FDIS Listing J.1 DistanceStep2 (free
ix/iy variables — adopted (ix, iy) == (0, 0)) and Listing
J.2 step_multiplier array (missing comma between
epf_pass0_sigma_scale and 1); both surfaced in the
module-level rustdoc with the adopted reading and rationale, and
the public API sidesteps the indexing ambiguity by accepting
step_multiplier: f32 directly so the wiring round can pick the
resolution without an API churn.
Round 141 (2021-FDIS / 2024-spec) — Annex J.2 "Gabor-like
transform" pure-math primitive (page 85). New src/gaborish.rs
module transcribes FDIS §J.2 verbatim: given a per-channel plane
of f32 samples (the output of §I.2.5 LLF/HF reconstruction + the
round-138 Annex G chroma-from-luma chain) and the per-channel
gab_C_weight1 / gab_C_weight2 weights carried by
[frame_header::RestorationFilter] (Table C.9), the module applies
the spec's symmetric 3×3 convolution (centre = 1, edges = w1, corners = w2), rescaled uniformly so the nine kernel entries
sum to 1, with §6.5 Mirror1D boundary handling on
out-of-image references. Public API: mirror1d(coord, size)
(Listing 6.1 iterative form), sample_mirror(plane, w, h, x, y)
(direct §6.5 fetch), gab_kernel(w1, w2) -> [f32; 9]
(materialised normalized kernel in row-major order), apply_channel
(out-of-place per-channel convolution with an interior fast path
- edge-mirror fallback), apply_channel_in_place (single-buffer
  scratch convenience), and apply_xyb_planes_in_place(x, y, b, w, h, &rf) (the three-channel XYB-pipeline convenience using
  rf.gab_x_weight* / gab_y_weight* / gab_b_weight*). 23 new
  unit tests + 10 new integration tests (round141_gaborish) pin
  Mirror1D's identity / first-reflection / single-row collapse
  cases, the default-weight kernel sum-to-one and centre-tap
  (≈ 0.586) values, the four-edge / four-corner kernel symmetry,
  identity-kernel pass-through, constant-plane invariance, the
  per-channel impulse response on a 3×3 plane, linearity of the
  convolution operator, single-row mirror-collapse, and the
  per-channel dispatch through apply_xyb_planes_in_place. Lib
  tests 462 → 485. This is a pure-math primitive in the same shape
  as round-89 dct_quant_weights, round-95 hf_dequant, round-121
  llf_from_lf, and round-138 chroma_from_luma: it lands the
  bit-exact arithmetic so a future round wiring §J.2 into the
  per-frame restoration-filter pipeline can drop it in without
  re-deriving the kernel or the mirror semantics. Does NOT
  implement §J.3 (edge-preserving filter) and does NOT honour the
  rf.gab skip — both are the caller's responsibility.
Round 138 (2021-FDIS / 2024-spec) — Annex G "Chroma from luma"
pure-math primitive (Listing G.1). New src/chroma_from_luma.rs
module transcribes FDIS Annex G (page 73) verbatim: given the
per-frame [lf_global::LfChannelCorrelation] bundle (§C.4.4) and,
for HF coefficients, the per-64×64-tile factor samples from
[lf_group::HfMetadata]'s x_from_y / b_from_y channels
(§C.5.4), the module computes the CfL multipliers (kX, kB) and
applies the Listing G.1 reconstruction X = dX + kX × Y,
B = dB + kB × Y, Y = dY per sample. Public API:
kx_kb_raw(base_x, base_b, colour_factor, x_factor, b_factor)
(Listing G.1 lines 1-2), kx_kb_lf(cfl) (LF derivation
x_factor = x_factor_lf - 127, b_factor = b_factor_lf - 127),
kx_kb_hf(cfl, x_factor_hf, b_factor_hf) (HF derivation from the
64×64-tile factor sample), apply_sample / apply_lf_sample /
apply_hf_sample for the per-sample reconstruction, and the
plane-level apply_lf_plane_inplace(dx, dy, db, cfl) (constant
per-frame (kX, kB)) + apply_hf_plane_inplace(dx, dy, db, w, h, x_from_y, b_from_y, cfl) (per-tile_x=x/64/tile_y=y/64
lookup, with a per-tile (kX, kB) cache). 20 new unit tests + 11
new integration tests (round138_chroma_from_luma) pin the
default-bundle multipliers (kX = 1/84, kB = 1 + 1/84), the
Y-identity line, the round-trip against the encoder-side
decorrelation dX = X - kX × Y, multi-tile HF plane lookup
(128×64 → 2 tiles wide, 65×65 → 4 tiles via div_ceil), and the
defensive colour_factor == 0 rejection on both LF and HF paths.
Lib tests 442 → 462. This is a pure-math primitive in the same
shape as round-89 dct_quant_weights, round-95 hf_dequant, and
round-121 llf_from_lf: it lands the bit-exact arithmetic so a
future round wiring §F.3 + Annex G into the per-LfGroup VarDCT
pipeline can drop it in without re-deriving any G.1 formulae.
Does not handle subsampled chroma (Annex G excludes that case
outright) and does not drive the per-LfGroup loop (deferred).
Round 133 (2021-FDIS / 2024-spec) — §C.7.1 DecodePermutation()
for used_orders != 0. HfPass::read now handles the
non-natural coefficient-order path of Listing C.12: the shared
"8 clustered distributions D" are read once into a
modular_fdis::EntropyStream (num_dist = 8) with its ANS state
initialised, then each set used_orders bit runs the §C.3.2
Lehmer-code permutation against that same stream. New public
coeff_order::decode_permutation_from_stream(br, entropy, hybrid, size, skip) factors the §C.3.2 procedure generically (the same
algorithm the TOC permuted_toc path uses); §C.7.1 supplies
size = coefficient_count(order) and skip = size / 64, yielding
order[i] = natural_coeff_order[nat_ord_perm[i]]. HfPass::read
no longer returns Error::Unsupported for used_orders != 0.
Adds get_context + lehmer_to_permutation unit coverage and
rewrites the two former hf_pass Unsupported tests to assert the
stream-read path is now taken.
Round 129 (2021-FDIS / 2024-spec) — per-varblock LF→LLF
composition glue (§I.2.5 plumbing). Three new public functions
in vardct that compose the round-121
[llf_from_lf::llf_from_lf] pure-math step with a single
channel's dequantised LF samples for a single varblock placement:
- vardct::extract_lf_subblock(lf_samples, lf_width, lf_height, bx, by, t) — extracts the cy × cx LF sub-block at varblock
  origin (bx, by) in row-major order, per FDIS §I.2.5 prose
  "the corresponding X/8 × Y/8 samples from the dequantized LF
  image". Returns Err(InvalidData) on dim-mismatch, origin
  overflow, or varblock extending past the LF grid (defensive
  bounds-checking before the indexing).
- vardct::compose_lf_to_llf_block(lf_samples, lf_width, lf_height, bx, by, t) — extract_lf_subblock + llf_from_lf
  in one call, returning the cy × cx LLF coefficient block of
  the top-left of an HF varblock.
- vardct::compose_lf_to_llf_block_3ch(&LfDequantOutput, bx, by, t) — convenience wrapper that invokes the per-channel helper
  once for each of the three colour channels (X, Y, B) when no
  channel is subsampled (the common case where §F.2 adaptive LF
  smoothing applied); rejects mismatched per-channel dims with a
  clear InvalidData message pointing the caller at the
  per-channel compose_lf_to_llf_block for the subsampled case.
24 new tests (15 unit in src/vardct.rs + 9 integration in
tests/round129_compose_lf_to_llf.rs). Covers DCT8×8 / DCT16×16
/ DCT32×32 squares, all six DCT16×8-class rectangles (DCT16×8,
DCT8×16, DCT32×8, DCT8×32, DCT32×16, DCT16×32), the nine non-DCT
pass-through transforms (Hornuss / DCT2×2 / DCT4×4 / DCT4×8 /
DCT8×4 / AFV0..AFV3), every kind of out-of-bounds varblock
placement (x-only, y-only, both, and DCT32×32 at the only
fitting origin), LfDequantOutput subsampling rejection, and
byte-exact agreement with the hand-derivable dc * ScaleF(cy, bheight, 0) * ScaleF(cx, bwidth, 0) formula for every
rectangular transform on a constant input.

This is the geometry glue between rounds 12/13 (per-LfGroup
LF dequant + smoothing) and rounds 91+/95 (HF coefficient ANS
decode + HF dequantisation). A future round wiring the §F.x
pipeline into decode_codestream can drop these helpers in as
the per-varblock loop body without re-deriving any LF→LLF
geometry or §I.2.5 prose mechanics. Total lib tests: 422 → 437
(+15); total integration test files: 41 → 42 (+1).

Round 129 also intentionally does not chase the
noise-64x64-lossless sample-194 wp_pred8 = 717 vs spec
divergence: the trace doc retired 2026-05-06 still has no
replacement in docs/image/jpegxl/ per the project_jpegxl_ pixel_blocked memory note (DOCS-GAP unchanged across r126 and
r129). The deep-trace plumbing from r126 remains the stable
baseline for the future Specifier round.
**Round 126 (2021-FDIS) — Self-correcting WP deep-trace plumbing
- sample-194 hand-derivation against Listings E.1/E.2/E.3.** New
  WP_DEEP_TRACE + WP_DEEP_TRACE_ARMED thread-locals in
  modular_fdis capture the 20-entry intermediate snapshot
  (subpred[0..4], err_sum[0..4], post-shift weight_shifted[0..4],
  sum_weights_pre, log_weight, sh, sum_weights_post, nn8,
  ww8, pred_pre_clamp, clamped_flag) for the trace-target
  sample. The existing LEAF_PICK_TRACE_WP only exposes
  (te_w, te_n, te_nw, te_ne, w8, n8, nw8, ne8, wp_pred8, max_error) — round 126 fills in the missing nn8/ww8 + Listing
  E.1/E.2/E.3 internals so a by-hand FDIS re-derivation against
  pinned ground-truth is possible.
New test tests/r126_wp_intermediates_at_194.rs (~150 lines,
2 tests + a docstring with the full hand-derivation). Pins:
wp_pred8 = 717 at the noise-64x64-lossless sample 194
(y=3, x=2, channel 0); the 20-entry deep trace; the 3-plane
first-divergence scan vs expected.png. The hand-derivation
in the module docstring proves that NEITHER the subpred[3]
sign knob NOR the s_init - 1 knob (the two FDIS-vs-current
deviations round 32 swept independently) can produce a
prediction in [709..716] from the captured neighbour state.
The fix must come from somewhere else — most likely a
state-evolution bug in sub_err or a WpHeader parameter
mismatch. Round 126 also tried the FDIS-literal sub_err
formula (abs(((p_i + 3) >> 3) - true_value) per FDIS line
6832 vs the legacy (abs(p_i - tv*8) + 3) >> 3); the noise
fixture's wp_pred8 at sample 194 was unchanged, but the
synth_320 drift-bisect fixture regressed (first drift moved
from y=24,x=14 to y=11,x=104), so the change is reverted in
this round and parked for the docs-collaborator behavioural
trace promised in project_jpegxl_pixel_blocked.

Net deliverable: deeper diagnostic plumbing + a stable pinned
baseline for the next round to compare hypotheses against.
Seven small lossless fixtures + synth_320 baselines untouched;
the noise fixture's plane[0] first-mismatch boundary remains
at linear index 194 (dec=35 vs exp=34).
Round 121 (2021-FDIS / 2024-spec) — §I.2.5 LLF-from-LF
pure-math step (Listings I.15 + I.16). New src/llf_from_lf.rs
(~500 LOC + 28 unit tests + 16 integration tests in
tests/round121_llf_from_lf.rs) lands the bridge from §F.2's
dequantised+smoothed LF samples into the top-left LLF coefficient
block of each HF varblock — the step the trailing prose of
§F.2 hands off to §I.2.7 (renumbered §I.2.5 in the 2021 FDIS).

Public API: scale_i8(n, u), scale_d8(n, u), scale_i(n, u),
scale_d(n, u), scale_c(n_big, n_small, x),
scale_f(n_big, n_small, x) (FDIS Listing I.15 closed-form
helpers); dct_1d(input) -> Result<Vec<f32>> (FDIS §I.2.1
forward 1-D DCT, sizes 1..=32); dct_2d(samples, rows, cols) -> Result<Vec<f32>> (§I.2.2 Listing I.3 forward 2-D DCT, algorithmic
inverse of [idct::idct_2d]); llf_dims(t) -> (u32, u32)
(LF-block dims per TransformType); llf_from_lf(input, t) -> Result<Vec<f32>> (Listing I.16 verbatim, including the non-DCT
pass-through cases for Hornuss / DCT2×2 / DCT4×4 / DCT4×8 /
DCT8×4 / AFV0..3).

44 new tests pin: (a) the Listing I.15 closed forms — I8(8, 0)
= sqrt(0.5)/2, D8 = 1/(N·I8), the N=8 branch of I/D, C(N, N, x)
= 1, C reciprocal-on-swap, ScaleF(1, 8, 0) = 1.0 (DCT8×8 corner
identity), (b) the §I.2.1 1-D forward DCT formula via the
unit-impulse closed form and the constant-signal DC-only result,
(c) byte-exact LLF blocks for DCT8×8 (single-cell identity),
DCT16×16 with both constant-block and impulse-block inputs
(out[y·2+x] = 0.25 · SF(2,16,y) · SF(2,16,x)),
DCT16×8 / DCT8×16 rectangular paths, DCT32×32 dimension
contract, and the non-DCT pass-through across all nine
single-8×8-block transforms.

dct_2d ↔ idct::idct_2d round-trip verified at 4×4 to f32
epsilon, confirming the forward DCT is the precise algorithmic
inverse of the round-12 IDCT.
Round 95 (2021-FDIS / 2024-spec) — §F.3 HF dequantisation
pure-math step. New src/hf_dequant.rs (~310 LOC + 13 unit
tests) implements the FDIS p. 72 Annex F.3 HF coefficient
dequantisation formula verbatim: Listing F.2 bias-adjust
(*= quant_bias[c] for |q| <= 1, -= quant_bias_numerator / quant otherwise), per-block HfMul multiplier, per-channel
0.8^(x_qm_scale - 2) / 0.8^(b_qm_scale - 2) factor (Y
channel exempt), and the §C.6.2 per-(channel, transform_type, coeff_index) dequant-matrix entry from the
round-89 dct_quant_weights::DequantMatrixSet.

Public API: bias_adjust(quant: i32, channel: usize, oim: &OpsinInverseMatrix) -> f32, QmScaleFactors::for_frame(&FrameHeader),
QmScaleFactors::for_channel(channel) -> f32,
dequant_hf_coefficient(quant, channel, hf_mul, dequant_matrix_entry, oim, qm) -> f32,
dequant_hf_pre_matrix(...) (partial product helper).

10 new integration tests
(tests/round35_hf_dequant.rs) pin Listing F.2 branch
boundaries (zero, ±1, |q|>1 subtractive bias sign-preservation),
the FDIS default quant_bias_numerator = 0.145 fixed-point
quant=2 → 1.9275, the 0.8^(u(3) - 2) exponent sweep, and
the cross-module composition against
materialise_default_dequant_set() for X / Y channels at the
DCT8×8 corner cell. Y channel verified to skip the qm-scale
factor; X channel under default x_qm_scale = 3 verified to
pick up a 0.8 factor.

Made FrameHeader::default_with pub(crate) (was private) so
the new hf_dequant unit tests can construct a default
FrameHeader without going through bit-stream parsing.

Round 95 lands the bit-exact F.3 arithmetic so the future
round that wires the per-block ANS coefficient decode (the
round-90 followup blocked on the shared 8-cluster ANS stream
- §C.7.2 histograms) can drop the integer ANS reader on top
  without re-deriving any formulae. CfL (Annex G) and IDCT
  (Annex I.2) still chain afterwards.
Round 90 (2021-FDIS / 2024-spec) — HfPass + PassGroup HF
structural parsers. Three new modules surface the §C.7.1 /
§C.7.2 HfPass bundle and the §C.8.3 PassGroup HF entry-points,
preparing the HF coefficient decode pipeline for the per-block
ANS-stream wiring scheduled for round 91+.

New src/coeff_order.rs (~430 LOC + 12 tests): §I.2.4 natural
coefficient ordering for every OrderId 0..=12 (Table I.1).
Builds the LLF prefix sorted by y × bwidth + x followed by
the HF tail sorted by (key1, key2) per Listing I.14. Public
API: OrderId, varblock_size_for_order, natural_coeff_order,
coefficient_count, order_id_for_transform,
COEFFICIENTS_PER_ORDER.

New src/hf_pass.rs (~290 LOC + 7 tests): §C.7.1 Listing C.12
parser. Reads used_orders = U32(Val(0x5F), Val(0x13), Val(0), Bits(13)). The used_orders == 0 fast path materialises all 13
natural orders directly per the listing's else branch.
used_orders != 0 returns Error::Unsupported — the permutation
reads need the shared 8-cluster ANS stream that §C.7.2 histograms
also feed; wiring that shared stream is round-91 work. Exposes
num_histogram_distributions = 495 × num_hf_presets × nb_block_ctx so the next round knows the §C.7.2 read count
up-front. Also exposes read_hf_pass_sequence for the per-pass
loop.

New src/pass_group_hf.rs (~460 LOC + 18 tests): §C.8.3 first
line + Listing C.13. Reads hfp = u(ceil(log2(num_hf_presets))),
validates hfp < num_hf_presets, computes
histogram_offset = 495 × nb_block_ctx × hfp. Verbatim
transcriptions of block_context, non_zeros_context,
coefficient_context, predicted_non_zeros, plus the two
64-element CoeffFreqContext / CoeffNumNonzeroContext ladder
tables as pub const arrays. The actual per-block ANS
coefficient decode loop defers to a later round (it requires the
shared per-pass ANS stream from §C.7.2).

New integration suite tests/round34_hf_pass_pass_group_hf.rs
(12 tests) exercises the typed surface end-to-end at the
structural level — HfPass used_orders == 0 parse + all 13
natural orders, §C.8.3 hfp range checks, BlockContext default-
map paths, NonZerosContext continuity at the
predicted == 8 boundary, CoefficientContext with the listed
ladder constants, PredictedNonZeros four-arm dispatch table.

Test delta: +49 tests (332 → 381 lib tests; new integration
suite contributes 12 more). No fixture-level pixel decode
changes; the seven small lossless fixtures continue to decode
pixel-correct, and the two committed VarDCT fixtures still hit
their existing round-13 deferral gate (next round's HF dequant
- per-block decode flips that gate).
Spec gap: none new. Listing C.12 / Listing C.13 / Listing I.14
/ Table I.1 are unambiguous on the round-90 contract scope.

Followups (round 91+): (a) shared per-pass 8-cluster ANS stream
init, (b) used_orders != 0 DecodePermutation reads, (c)
§C.7.2 histogram read (495 × num_hf_presets × nb_block_ctx
clustered distributions), (d) per-block coefficient decode loop
per the C.8.3 prose right after Listing C.13, (e) §F.3 HF
dequantisation gluing the round-89 dequant matrices to the
newly decoded coefficients.
Round 89 (2024-spec) — GetDCTQuantWeights + Table I.6 default
dequantization-matrix materialisation (parent-dispatch r89). New
src/dct_quant_weights.rs (~1k LOC + tests) transcribes the
ISO/IEC 18181-1:2024 §I.2.4 / §I.2.5 + Table I.4 + Table I.6
listing block from page 58-60 of the published core PDF:
- mult(v) — spec Mult piecewise function
  (1+v if v > 0 else 1/(1-v)).
- interpolate(pos, max, bands) — spec Interpolate with the
  2024 corrected A * pow(B/A, frac_index) form. Includes
  defensive clamping when pos == max (would otherwise index
  past bands.size() - 1).
- compute_dct_weights(params, x_dim, y_dim) — spec
  GetDCTQuantWeights per the post-typo-fix 2024 listing
  (bands loop closes BEFORE the weights matrix double-loop,
  correcting the FDIS 2021 PDF's nested-loop bug).
- materialise_weights_for_dct_select(bundle, channel, X, Y) —
  per-mode (DCT, DCT4, DCT2, Hornuss, DCT4x8, AFV)
  weights-matrix dispatch per §I.2.4 page 58 prose +
  Listing C.11 for AFV.
- materialise_dequant_for_channel(bundle, channel, X, Y) —
  element-wise reciprocal of the weights matrix per
  §I.2.4 last paragraph. Validates the
  "no non-positive or infinity" spec invariant.
- materialise_default_dequant_set() — the full 17-slot ×
  3-channel default set per Table I.6 (page 60),
  transcribed verbatim including the SeqA / SeqB /
  SeqC abbreviated sequences from the spec footnote and
  the dct4x4_params constant for slots 3 (DCT4×4) and 10
  (AFV).
- weights_matrix_dims_for_slot(slot) — Table I.4 page 57
  dimensions lookup (0..=16).
- slot_for_transform(t) — TransformType (Table C.16
  0..=26) → Table I.4 slot (0..=16) mapping; multiple
  transforms share a slot (e.g. DCT16×8 and DCT8×16 both
  map to slot 6).
Test count: 26 new tests (15 unit tests in
src/dct_quant_weights.rs + 11 integration tests in
tests/round33_dct_quant_weights.rs). Every cell of every
channel of every default slot is verified positive-finite per
the §I.2.4 invariant. Spot-checks include:
- DCT8×8 slot 0 channel 0 (0,0) cell = 1 / 3150.0 (reciprocal of
  Table I.6 row-0 head).
- Hornuss slot 1 (0,0) cell = 1.0 (spec sets weights(0,0) = 1).
- AFV slot 10 8×8 fully populated (Listing C.11 covers all 64
  cells across the freqs interpolation + weights4x8 + weights4x4
  fills).
Spec-listing typo notes (recorded in module doc-comment):
- FDIS 2021 PDF Listing C.10 has the for (y, x) { ... }
  weights double-loop INSIDE the for (i = 1; i < len; i++)
  bands loop — would compute the matrix len - 1 times. The
  2024 published edition (docs/image/jpegxl/ ISO_IEC_18181-1-JPEG-XL-Core-2024.pdf page 58) corrects this.
  Module follows the 2024 form.
- 2024 Interpolate drops len (uses bands.size()) and
  writes pow(B / A, frac_index) instead of FDIS 2021's
  A * (B / A)^frac_index. Mathematically identical.
SPECGAP recorded: DCT2 cell (0, 0) is not assigned by the spec
listing block (page 58). Implementation fills it with
params(c, 0) (same value used for i == 0 neighbours) so the
dequant reciprocal is finite. The 6-rectangle assignments cover
62 of 64 cells, plus (1, 1); (0, 0) is the only unmentioned
position. Recommend a spec clarification.

Unblocks: downstream HF coefficient dequantisation per §F.3 on
the HfGlobal u(1) == 1 default-encoding fast path. The
non-default branch's RAW encoding mode still requires a
modular sub-bitstream decode (deferred to round 90+ alongside
the §F.3 wiring).

Spec citations: ISO/IEC 18181-1:2024 page 58 (Listing for
Interpolate / Mult / GetDCTQuantWeights), page 59
(Listing C.11 AFV weights + per-mode prose), page 60
(Table I.6 default matrix parameters), page 57 (Table I.4
weights-matrix dimensions). Cross-referenced against ISO/IEC
FDIS 18181-1:2021 PDF (extractable) Listing C.10 / Table C.18
/ Table C.20 (the 2021 equivalents).

Fixture count remains 7 pixel-correct lossless small fixtures
(no change — round 89 is upstream of the pixel-decode flow;
HfGlobal default-encoding parsing remains unchanged in
behaviour).
Round 77 (2024-spec) — animation-3frame SPECDIFF audit + docs
citation. Two new audit-grade integration tests
(tests/r77_animation_3frame_specdiff.rs) characterise the
docs/image/jpegxl/fixtures/animation-3frame/input.jxl fixture
(cjxl 0.12.0, 78 B, 3 RGB Regular Modular frames of 32×32 with
have_animation = 1). The probe-level path is correct
(probe_fdis recovers SizeHeader + ImageMetadata with
have_animation = true + AnimationHeader populated); the
decode-level path remains blocked on a real spec-edition split
between ISO/IEC 18181-1:2021 FDIS Table C.9 (which our
RestorationFilter::read follows; no leading all_default
field) and the published 2024 Table J.1 (which prepends an
all_default Bool() to the bundle plus a u(32) "(ignored)"
field after epf_channel_scale). Bit-trace bisect (recorded in
the test file's module docs):
- The two-bit RF SPECDIFF lifts our FrameHeader bit count from
  39 to 40 for the animation fixture, which lets `permuted_toc
  - pu0correctly land the TOC entry U32 at byte 11 of the codestream; that read yieldsentry value = 16, matching the libjxl trace's total_bytes = 16`.
- The seven currently-pixel-correct lossless fixtures were
  encoded by cjxl 0.11.1 against the 2021 FDIS layout and do
  NOT include the leading all_default bit; landing the
  2024-Table-J.1 fix straightforwardly breaks
  alpha-64x64.jxl. The audit recommendation (recorded in the
  test docs) is to re-encode the seven fixtures with cjxl
  0.12.0+ before applying the 2024-spec fix uniformly. This is
  a docs-collaborator follow-up — there is no codestream-level
  edition tag, so a single-pass parser cannot dispatch between
  the two RF layouts without a heuristic.
- Spec citations: ISO/IEC 18181-1:2024 Table J.1
  (docs/image/jpegxl/ISO_IEC_18181-1-JPEG-XL-Core-2024.pdf
  page 70) and ISO/IEC FDIS 18181-1:2021 Table C.9
  (pdftotext-extractable lines 4088-4101). Trace fixture at
  docs/image/jpegxl/fixtures/animation-3frame/trace.txt.
Fixture count remains 7 pixel-correct lossless small fixtures
(no change). Test count grows by 2 (audit harness).

Changed

Round 32 (2024-spec) — noise-64x64-lossless pixel-divergence
bisected to the Self-correcting weighted predictor at the first
y >= 2, x >= 2 sample whose predictor == 6; root cause
localised, fix deferred pending a libjxl-trace doc that this
workspace does not yet ship. The fixture count therefore stays
at 7 pixel-correct lossless fixtures (status quo). No source-file
semantic changes this round; the diagnostic harness used to
bisect was removed before commit and the regression set remains
green.

Round 31 left the noise fixture as a "decodes without EOF, but
pixels diverge from expected.png starting at plane[0] sample
194" follow-up. Round 32 reproduced that divergence and pinned
it down further:
- The first divergence is at plane[0] (y=3, x=2) — the FIRST
  sample whose predictor is 6 (Self-correcting) and which has
  the full set of WP neighbours N, W, NW, NE, NN, WW populated
  (i.e. y >= 2 && x >= 2). The prior predictor == 6 samples
  in rows y = 0 and y = 1 all decoded pixel-correct because
  their WP path takes the NN does not exist → NN = N
  fall-back. Two predictor == 6 samples on row y = 2 also
  decoded correctly because WW = W was used (the bug requires
  WW ≠ W, i.e. x >= 2).
- At sample 194 the WP machinery produces wp_pred8 = 717
  (Listing E.3 weighted sum). The spec rounding `(wp_pred8 + 3)
  
  3then yieldsp = 90, giving v = diff + p = -55 + 90
  = 35— butexpected.pngsays34. So wp_pred8is 1 too high modulo the rounding (any value in[709..716]would givep = 89and thencev = 34). The MA-tree leaf, the decoded token, the diff -55, and wp_max_error` all match what the
  neighbour state legitimately implies — the discrepancy is
  purely in the WP weighted sum.
- Bisected against WP_ROUND_BIAS ∈ {0..=7}, s_init ∈ {(sum_weights >> 1) - 1, (sum_weights >> 1), sum_weights, 0},
  the subpred[3] sign (FDIS N + … vs. round-3 code N - …),
  and the clamp condition (<= 0 vs >= 0). Every alternative
  either re-introduces an EARLIER divergence (samples 68, 79,
  142) on the noise fixture, OR breaks one of the seven
  currently-pixel-correct lossless fixtures. So the bug is NOT
  in any of the dimensions our spec text exposes a knob for.
- Suspected residual root cause: a subtle interaction between
  the FDIS error2weight formula's outer >> shift step (only
  in the 2024 published edition and the round-3 code; absent
  from FDIS 2021 literal text), the four sub-predictor weights,
  and the final s × ((1 << 24) Idiv sum_weights) >> 24
  division. Most likely the libjxl reference uses an s_init
  formula that depends on the shifted vs unshifted
  sum_weights in a way the FDIS spec text does not disclose.
  Resolving this needs either (a) a behavioural trace of the
  libjxl WP path on the noise fixture at sample 194 captured by
  the docs collaborator, or (b) the docs collaborator's
  promised docs/image/jpegxl/libjxl-trace-reverse-engineering.md
  section on §H.5.2 Sub-predictions (referenced in the
  project_jpegxl_pixel_blocked memory note, but the file does
  not yet exist in docs/image/jpegxl/).
Round-32 scope therefore closes with the bisect finding above
recorded and the regression set green. No .gitignore / Cargo
changes; no API surface deltas. The §F.3 zero-pad fix from
round 31 stays in place and noise-64x64-lossless continues to
decode-complete (just with non-byte-exact pixels).

Spec citations: FDIS Annex E.1 (Sub-predictions, Listing E.1),
E.2 (Prediction weights, Listing E.2), E.3 (Prediction, Listing
E.3), and Table H.3 row predictor == 6 (`(prediction + 3)

3`).

Added

Round 31 (2024-spec) — §F.3 zero-pad uniformly applied to the
single-TOC-entry LfGlobal fast path; noise-64x64-lossless now
decodes without EOF (parent-dispatch "r16" option A). One
narrow src/lib.rs::decode_codestream delta:
- Pre-round-31, when num_groups == 1 && passes == 1 && toc.entries.len() == 1, the decoder routed LfGlobal::read
  through the non-padding main BitReader (pad_eof_with_zeros == false). The other LfGlobal path already used
  BitReader::new_section (which implements FDIS §F.3's
  section-bit-budget + zero-pad rule). For six of the seven small
  lossless fixtures the entire LfGlobal section had enough
  trailing slack that the read never touched the padding region;
  noise-64x64-lossless (cjxl -d 0 -e 7, 64×64 high-entropy RGB
  Modular, MA tree nodes=167 leaves=84) does NOT — its
  per-pixel ANS / hybrid-uint refill loop on the final samples
  reaches a few bits past the byte budget that the spec says must
  read as zero. Pre-round-31 the non-padding reader errored
  instead → InvalidData("unexpected end of JXL bitstream").
- The fix collapses both LfGlobal-read branches into one path
  that always uses BitReader::new_section against the
  toc-declared section byte range. This makes the single-section
  fast path bit-for-bit equivalent to the multi-section path on
  its real-data prefix, and applies §F.3 zero-pad uniformly.
Spec citation: FDIS §F.3 first paragraph — "When decoding a
section, no more bits are read from the codestream than 8 times
the byte size indicated in the TOC; if fewer bits are read, then
the remaining bits of the section all have the value zero."

Test added: tests/r31_noise_lossless.rs with two cases —
noise_64x64_lossless_decodes_without_eof_error (locks the
shape of the post-fix VideoFrame: 3 RGB planes, stride=64,
data.len()=4096 each) and pre_round31_seven_lossless_fixtures_ still_decode (regression sentinel: the seven pre-round-31
fixtures all decode successfully under the unified path).
Committed fixture pair under tests/fixtures/:
noise_64x64_lossless.jxl (13 505 B) +
noise_64x64_lossless_expected.png (12 505 B, 8-bit RGB PNG).

Known limitation NOT fixed this round: while
noise-64x64-lossless now decode-completes (vs hard-EOF), the
produced pixels are not yet byte-identical to expected.png.
The first divergence is plane[0] (R) at (2, 3) — i.e. samples
0..193 of plane 0 match, and from sample 194 on ~98 % of samples
diverge. The divergence point is deterministic and well within
the section's real-byte budget, so the §F.3 fix is independent
of the residual pixel-divergence. Suspected root cause: a
latent state-evolution bug in either the MA-tree leaf decode
with num_contexts > 16 (the leaf-stream EntropyStream's
cluster_map is 84 → 3 clusters here, vs ≤ 6 → ≤ 4 in every
other lossless fixture), the Self-correcting WP state on
high-entropy neighbour history, or the hybrid-uint extra-bits
path for large n_extra values. Deferred to round 32 — needs
the round-24-style per-cluster trace replayed against the
cleanroom Python reference at ~30 distinct bit positions across
the 108 kbit symbol stream.

Docs gap noted: docs/image/jpegxl-cleanroom/reference-impl/
(referenced in the round-31 brief as the place to bisect
against) does not yet exist; the round-30 deferral note pointed
at it as a future bisect target. The §F.3 fix landed without
needing it — pure spec-text bisect against FDIS §F.3 was
sufficient. The reference-impl directory would still be useful
for the residual pixel-divergence bisect; ask the docs
collaborator to populate it for round 32.
Round 30 (2024-spec) — bit-depth-16 RGB pixel-correct decode +
16-bit LE plane-pack convention (parent-dispatch "r15" option A).
Lifts the fixture count from 6 to 7 by adding bit-depth-16
(docs/image/jpegxl/fixtures/bit-depth-16/input.jxl, 421 B,
64×64 RGB lossless Modular at bits_per_sample = 16) and
documents the wider-than-8-bit pack convention forced on us by
oxideav-core 0.1.x's bit-depth-less VideoPlane.

Two narrow src/lib.rs::decode_codestream deltas:
1. Bit-depth gate widened. The pre-round-30 hard reject
  metadata.bit_depth.bits_per_sample != 8 now accepts
  bps ∈ 1..=16. The XYB and YCbCr branches (FDIS Annex L.2.2 /
  L.3) still hard-require bps == 8 because their dequantisation
  lattice is calibrated against the 8-bit output range — a
  specific Error::Unsupported("jxl decoder (round 30): XYB high-bit-depth (bps={...}) deferred") now precedes the
  transform call. Float (float_sample == true) and bps > 16
  remain unsupported.
2. Pass-through plane pack dispatches on bps. The previous
  loop unconditionally clamped each i32 sample to [0, 255]
  and pushed one byte per sample with stride == width. The
  new loop:
  - bps ≤ 8 — unchanged: 1 byte/sample, stride == width,
    sample clamped to [0, 2^bps - 1].
  - 9 ≤ bps ≤ 16 — 2 bytes/sample little-endian,
    stride == width × 2, sample clamped to [0, 2^bps - 1],
    packed via u16::to_le_bytes.
  The LE-pack choice is documented in
  crates/oxideav-jpegxl/README.md under "Plane byte layout"
  (new section) so that downstream consumers (cli-convert /
  etc.) know how to reinterpret a wide plane as &[u16]. PNG's
  RFC 2083 §2.1 ships big-endian 16-bit samples; we deliberately
  pick LE so a bytemuck::cast_slice::<u8, u16> on a
  little-endian host is a zero-cost view (vs forcing a per-sample
  swap).
Test count: tests/round30_bit_depth_16.rs adds 3 tests
(bit_depth_16_rgb_pixel_correct_vs_expected_png — full 64×64×3
16-bit byte-for-byte match against the committed
bit_depth_16_expected.png;
bit_depth_16_le_pack_convention_self_consistent — invariant
check on stride/length/round-trip;
pre_round30_8bit_fixtures_still_byte_packed — regression
sentinel for the four pre-existing 8-bit byte-packed fixtures).
Committed fixture pair under tests/fixtures/:
bit_depth_16.jxl (421 B) + bit_depth_16_expected.png
(375 B, 16-bit RGB PNG).

Cross-checked against djxl v0.11.1 as a black-box oracle (PPM
output → byteswap BE→LE → byte-equal to our planes). Crate now
decodes 7 small lossless Modular fixtures pixel-correct vs
expected.png (was 6): pixel-1x1, gray-64x64,
gradient-64x64-lossless, palette-32x32, grey_8x8_lossless,
alpha-64x64, bit-depth-16.

Spec citations: FDIS Annex A.6 + Table A.22
(bit_depth.bits_per_sample bundle), Annex G.1.3 (Modular
channel-order rule — colour channels share the global
bits_per_sample, no per-channel bit-depth split for kModular
RGB), PNG RFC 2083 §2.1 (PNG ships 16-bit big-endian, so our
reference-PNG read uses u16::from_be_bytes).

Docs gaps identified probing adjacent fixtures during round 30:
noise-64x64-lossless (13.5 KB, nodes=167 leaves=84 per
trace.txt) still fails inside LfGlobal::read with "unexpected
end of JXL bitstream" — large MA-tree decode path likely
mis-computes a hybrid-uint extra-bits count for a high-context
leaf; deferred to round 31. vardct-256x256-d1 / d3 and
noise-feature-256x256 fixtures all hit independent VarDCT
pipeline gaps and are unrelated to round 30.
Round 29 (2024-spec) — alpha-64x64 RGBA pixel-correct decode +
ISOBMFF signature-strip fix (parent-dispatch "r14" option A).
Two narrow lib-level fixes in src/lib.rs::decode_one_frame /
decode_codestream unblock the docs cleanroom alpha-64x64
4-channel Modular lossless fixture (docs/image/jpegxl/fixtures/ alpha-64x64/input.jxl, 86 B) for pixel-exact decode against the
committed expected.png (8-bit RGBA, 64×64):
1. ISOBMFF FF 0A strip. The jxlc/jxlp box payload IS a JXL
  codestream and therefore begins with the 2-byte FF 0A
  codestream signature (FDIS Annex B.1). The RawCodestream branch
  already stripped those 2 bytes before handing off to
  decode_codestream; the ISOBMFF branch did NOT. The result was
  a 16-bit misalignment at the SizeHeader::read parse that
  cascaded into apparently-unrelated downstream failures
  (bit-depth-16 tripped JXL permutation: LZ77-enabled TOC sub-stream not supported because the TOC permuted flag bit
  parsed as 1 instead of 0). Now the ISOBMFF branch validates the
  FF 0A prefix and strips it symmetric with the raw path. A new
  unit test wraps gradient-64x64-lossless in a minimal ISOBMFF
  (signature + ftyp + jxlc) and asserts plane-by-plane equality
  vs. the raw decode (tests/round29_alpha_rgba_pixel.rs:: isobmff_wraps_raw_codestream_decodes_identically).
2. Extra-channel mapping. The post-Modular channel-count check
  n_chans != expected_chans rejected RGBA Modular frames
  because the Modular decoder lays out colour and extra channels
  in a flat array of length expected_chans + num_extra_channels
  (FDIS Annex G.1.3 colour-then-extras channel-order rule). The
  check now also accepts the with-extras length and emits a
  trailing VideoFrame plane per extra channel. For
  alpha-64x64 this maps directly to 4 RGBA planes; for
  hypothetical multi-extra fixtures (depth, spot colour, …) the
  same path extends N-ways. The XYB-encoded / YCbCr branches are
  unchanged — those still require exactly 3 colour channels and
  fall through if extras are present (round-30+ work).
Test count: tests/round29_alpha_rgba_pixel.rs adds 3 tests
(alpha_64x64_rgba_pixel_correct_vs_expected_png — full 64×64×4
byte-for-byte match; five_pre_round29_fixtures_still_pass —
regression sentinel for pixel-1x1 / gray-64x64 / gradient-64x64 /
palette-32x32 / grey_8x8_lossless; isobmff_wraps_raw_codestream_ decodes_identically — synthetic ISOBMFF wrap of
gradient-64x64). Committed fixture pair under tests/fixtures/:
alpha_64x64.jxl (86 B) + alpha_64x64_expected.png (283 B).

Crate now decodes 6 small lossless Modular fixtures pixel-correct
vs expected.png (was 5): pixel-1x1, gray-64x64,
gradient-64x64-lossless, palette-32x32, grey_8x8_lossless,
alpha-64x64.

Spec citations: FDIS Annex B.1 (codestream signature),
Annex G.1.3 (channel order), Annex A.6 + A.9 + Table A.22
(ImageMetadata + ExtraChannelInfo).

Docs gaps identified probing adjacent fixtures: bit-depth-16
(421 B) reaches the 8-bit-only post-Modular check (decoder needs
a 16-bit output-pack path before VideoFrame mapping — deferred);
noise-64x64-lossless (13.5 KB) fails inside LfGlobal with
"unexpected end of JXL bitstream" suggesting the high-entropy
random-RGB MA tree exercises a code path not yet covered
(deferred).
Round 28 (2024-spec) — non-DCT IDCT helpers (parent-dispatch
"r13" item 3). Extends src/idct.rs with five new public helpers
that complete the IDCT surface for the non-DCT TransformType
variants:
- aux_idct_2x2(block, S) — Annex I.9.3 Hadamard-style butterfly on
  the top-left S × S cells of an 8×8 buffer (S ∈ {1, 2, 4, 8}).
- idct_dct2x2(coefficients) — Annex I.9.3 closing recipe (chained
  aux_idct_2x2 calls at S=2, 4, 8).
- idct_dct4x4(coefficients) — Annex I.9.4: per-2×2-quadrant 4×4
  IDCT_2D over interleaved coefficient cells with a DC patch from
  aux_idct_2x2(coefficients, 2).
- idct_hornuss(coefficients) — Annex I.9.5: per-quadrant
  block-LF + residual-sum centre cell + neighbour-fill + corner
  corrective.
- idct_dct8x4(coefficients) — Annex I.9.6: column-major Hadamard
  pair into two 4×8 (rows × cols) IDCT_2D halves tiled into rows
  [0..4) and [4..8) of the 8×8 output.
- idct_dct4x8(coefficients) — Annex I.9.7: dual of dct8x4,
  row-major Hadamard pair into two 4×8 halves tiled by row.
idct_for_transform(t, coefficients) now dispatches Hornuss,
Dct2x2, Dct4x4, Dct8x4, Dct4x8 to the dedicated helpers in
addition to the 18 plain-DCT variants from r12. Afv0..Afv3 continue
to return Err(Unsupported) pending an independently verified
256-entry AFVBasis table (deferred to a later round to avoid a
high-risk OCR transcription).

New helper non_dct_pixel_dims(t) returns Some((8, 8)) for the
nine non-DCT TransformType variants and None for plain-DCT — the
output of all five new helpers is always an 8×8 row-major buffer
(length 64), matching the closing entries of Listings I.9.3..I.9.8.

Test count: lib idct::tests 36 → 57 (+21 new — 8 covering
aux_idct_2x2 validation/butterfly/preserve/DC, 6 covering DC-only
- per-quadrant correctness for the five helpers, 5 covering length
  validation, 2 covering non_dct_pixel_dims); integration tests
  +5 in new tests/round13_non_dct_idct.rs plus 1 updated
  assertion in tests/round12_idct_dispatch.rs (renamed
  idct_for_transform_non_dct_transforms_return_unsupported →
  idct_for_transform_afv_only_unsupported_after_round_13,
  reflecting that only the AFV variants remain unsupported).
Spec-gap notes inline in the module documentation enumerate the OCR
transcription work deferred for AFVBasis.
Round 27 (2024-spec) — IDCT dispatch (parent-dispatch "r12" item
5). New src/idct.rs (~470 LOC including tests) wires the
spec-conformant 1-D inverse DCT (FDIS Annex I.2.1) for power-of-two
sizes s ∈ {1, 2, 4, 8, 16, 32, 64, 128, 256} and the 2-D inverse
DCT (Annex I.2.2 Listing I.4) handling rectangular R × C blocks.

Three public entry points: idct_1d(input) for the bare 1-D form,
idct_2d(coefficients, output_rows, output_cols) for the 2-D form
taking coefficients in the spec's (short × long) row-major natural-
ordering layout (Annex I.2.4) and returning samples in (R × C)
row-major, and idct_for_transform(t, coefficients) which dispatches
on a dct_select::TransformType to the appropriate 2-D IDCT for the
18 plain-DCT transform types in Table C.16 (DCT8x8, DCT16x16,
DCT32x32, DCT16x8, DCT8x16, DCT32x8, DCT8x32, DCT32x16, DCT16x32,
DCT64x64, DCT64x32, DCT32x64, DCT128x128, DCT128x64, DCT64x128,
DCT256x256, DCT256x128, DCT128x256). The 9 non-DCT transforms
(Hornuss, DCT2x2, DCT4x4, DCT4x8, DCT8x4, AFV0..AFV3) — Listings
I.7..I.13 — return Err(Unsupported) and are deferred to round 13+.

Companion helper dct_pixel_dims(t) returns the (rows, cols)
output shape for plain-DCT TransformType variants and None for the
non-DCT transforms.

31 lib unit tests in idct::tests (1-D length validation, DC-only
consistency for all 9 supported sizes, 1-D round-trip via private
forward DCT oracle for sizes 8/16/32/64, 1-D AC[1] hand-computed
spec-formula reference, 2-D length / shape validation, 2-D DC-only
consistency for 12 DCT block sizes, 2-D round-trip via 2-D forward
oracle for 8x8/16x8/8x16/16x16/32x32, dispatch validation for
DCT8x8/16x16/32x32/8x16/16x8 + every non-DCT TransformType returning
Unsupported, dct_pixel_dims completeness for both branches); 5
integration tests in tests/round12_idct_dispatch.rs (1-D DC-only
for all sizes, 2-D DC-only for every plain-DCT block size,
Unsupported sentinel for every non-DCT transform, 2-D round-trip for
asymmetric 8x16 and 16x8 via inline forward oracle, five-fixture
Modular regression sentinel). Total test count 345 → 381 (+36 net).

No new fixture coverage — the IDCT lands as a callable primitive that
round 13's PassGroup HF coefficient decode + F.3 dequantisation will
feed. The legacy vardct::idct1d_8 and vardct::idct2d_8x8 (round 8
scaffold, scaled-orthonormal IDCT) are kept untouched for backward
compatibility but are NOT spec-conformant; new HF-decode wiring will
call through idct::idct_for_transform exclusively.
Round 26 (2024-spec) — Annex L colour transforms (parent-dispatch
"r11"). New src/xyb.rs (~210 LOC) transcribes FDIS §L.2.2 inverse
XYB → linear RGB and §L.3 inverse YCbCr → RGB verbatim from the
ISO/IEC 18181-1:2024 spec text. Three public entry points:
inverse_xyb_to_rgb(x, y, b, oim, tone_mapping),
inverse_ycbcr_to_rgb(cb, y, cr), and the convenience composite
modular_xyb_to_linear_rgb(y_prime, x_prime, b_prime, lf_dequant, oim, tone_mapping) which folds in the §L.2.2 preamble step
(X = X' * m_x_lf_unscaled, Y = Y' * m_y_lf_unscaled,
B = (B' + Y') * m_b_lf_unscaled). Helper linear_rgb_to_u8
clamps + rounds the linear [0, 1] output to 8 bits.

Wired into decode_codestream modular output stage: when
metadata.xyb_encoded == true AND colour_encoding.colour_space == Rgb (3 colour channels), the per-channel pass-through is replaced
with build_rgb_planes_from_xyb which walks every pixel through
the inverse transform. Symmetric build_rgb_planes_from_ycbcr
branch handles frame_header.do_ycbcr == true. The original
pass-through path is preserved for the common case
(xyb_encoded=false AND do_ycbcr=false) so all five small lossless
fixtures continue to pixel-correct decode.

9 unit tests in xyb::tests (DC zero-input, spec-listing
hand-computed reproduction, intensity_target linear scaling,
modular preamble multiplier check, YCbCr neutral / red-dominant,
linear→u8 clamping, X-sign-flip symmetry); 6 integration tests
in tests/round11_xyb_inverse.rs (forward-→-inverse round-trip
for neutral grey AND saturated red using a hand-computed Cramer's-
rule matrix inversion of oim.inv_mat, YCbCr neutral, u8
quantisation reference values, end-to-end zero-input modular wrapper,
and five-fixture pass-through regression sentinel). Total test count
345 → 362 (+17 net: 9 lib + 6 integration + 2 from earlier round-21
recount).

No fixture decoded that didn't decode before — round 11 lays the
colour-transform foundation, but no modular-XYB or modular-YCbCr
fixture is currently committed (cjxl encodes photo-content XYB
inputs as VarDCT by default; the rare modular-XYB path needs a
hand-built minimal trace, deferred to round 12+ or a docs-
collaborator commission). The two committed VarDCT fixtures
(vardct_256x256_d1.jxl, vardct_256x256_d3.jxl) still terminate
at the round-13 "round 14+: HF subband decode + IDCT not yet wired"
Unsupported.

SPECGAP documented in xyb::linear_rgb_to_u8 doc comment: §L.2.2
outputs linear-domain RGB (NOTE in spec) but the spec doesn't
prescribe a gamma encoding step before display — strict conformance
defers gamma application to a downstream colour-management consumer.
The crate emits linear bytes (clamp + scale by 255 + round); spec
callers needing sRGB-encoded bytes should apply sRGB transfer
themselves.

Wall respected: spec PDF (Annex L pages 82-84 read directly), no
external library source consulted, no libjxl-trace-reverse- engineering.md (retired). OpsinInverseMatrix defaults already
transcribed in metadata_fdis::OpsinInverseMatrix::default()
(round-2) from FDIS Table L.1 independently; the new module
consumes those constants without re-reading the table. Test count
362, fmt + clippy clean against 1.95 toolchain.
Round 24 (2024-spec, Auditor mode) — pursued round-23 candidates
(1) per-cluster ANS distribution byte-trace for clusters 0+1 and
(2) per-call alias-mapping invariant audit. Result: both paths
falsified. Cluster 0 (19 nonzero entries) and cluster 1 (23
nonzero entries) both sum to 4096; the alias table built from each
D[] routes probability mass to symbols identically to the declared
D[] (per-symbol routed-mass divergence = 0 for both clusters);
across the FULL 3072-call ANS trace the spec C.3.2
(symbol, offset) = AliasMapping(state & 0xFFF) invariant holds
bit-for-bit when checked against either cluster 0 or cluster 1's
alias table (0 hard violations; 288 ambiguous calls where both
clusters yield the same (symbol, offset, prob)). Per-call state
arithmetic state = prob * (state >> 12) + offset also reproduces
the trace exactly. Cluster usage breakdown: c0=1755 calls,
c1=1317 calls, unknown=0 (no cross-talk into HFMetadata clusters
2/3/4). The d1 ANS final-state delta of 0x21914271 - 0x00130000 ≈ 562M is therefore NOT caused by a per-cluster D[]
shape mismatch, alias-table self-map / Vose-pump bug,
alias-mapping lookup bug, per-call state-arithmetic bug, or
cluster-routing leakage. Round 25 candidates: (1) D[]-vs-cjxl
reference comparison (a single mismatched count would be the
smoking gun), (2) leaf-pick + cluster-routing audit at samples
beyond sample 22 up to sample 79 (where r23's first ctx-flip was
observed), (3) HFMetadata stream-boundary cross-talk audit. New
diagnostic tests/round24_d1_disttrace.rs (Auditor mode, never
asserts) with two tests:
d1_per_cluster_distribution_byte_trace_round_24 (path 1) and
d1_per_call_alias_mapping_invariant_round_24 (path 2). Full
audit notes in crates/oxideav-jpegxl/round24-d1-disttrace.md.
Test count 343 → 345 (+2).
Round 22 (2024-spec, Auditor mode) — pursued round-21 candidates
(a) lf_quant first-256-sample dump per channel and (c) WP (p+3)>>3
rounding bias toggle on the d1 LfCoefficients sub-bitstream. Result:
WP-rounding-bias bug class falsified. Added a runtime atomic
WP_ROUND_BIAS (default 3, spec-conformant per ISO/IEC 18181-1:2024
Table H.3 + FDIS-2021 Listing C.16) so the auditor can sweep biases
without recompile. Sweeps recorded post-decode ANS final state for
bias ∈ {0, 3, 4, 7}: 0 → 0x0042cd42 (|Δ|=3 132 738), 3 → 0x21914271
(|Δ|=561 922 673, spec), 4 → 0x00fd721e (|Δ|=15 364 638), 7 →
0x001214ac (|Δ|=60 244). All four miss the §D.3.3 sentinel
0x00130000; the +7 bias being closest proves the variation is
ANS-chain noise from leaf-flip cascades, not a true rounding bug.
Per-channel lf_quant dump (Y'/X'/B', 1024 samples each, 32×32) shows
smooth low-frequency shape with sane stats (Y' mean=468 min=326
max=644; X' mean=14 min=−125 max=135; B' mean=41 min=−49 max=123),
consistent with a real-image fixture and proving the per-sample
decode loop is producing plausible data — not garbage. WP+3 vs +4
diverges first at Y' sample 22 (row 0, col 22), localising the actual
bug to a specific MA-tree leaf-flip at that sample. New diagnostic
tests/round22_d1_sample_dump.rs (Auditor mode, never asserts) dumps
both the lf_quant table and the bias-sweep final states; full audit
notes in crates/oxideav-jpegxl/round22-d1-sampledump.md. Test count
337 → 338 (+1).
Round 21 (2024-spec, Auditor mode) — pursued round-20 candidates
(1) per-cluster distribution decode bisect and (2) alias-table
self-map branch audit on the d1 LfCoefficients sub-bitstream.
Result: both paths falsified. The 5 per-cluster ANS distributions
(clusters 0..4) all sum to 4096 with sane shapes (cluster sizes
19/23/5/2/2 nonzero entries out of 64); cluster 1's full 64-entry
alias table reconciles with the round-19 bit-faithful trace at calls
#0 and #1. Critically, none of the five clusters has any D[i] == bucket_size entry, so the alias-table self-map branch (round-3
fix territory) is not triggered for d1. Documented one strict-spec
divergence in AliasTable::build (else vs spec's else if (cutoffs[i] < bucket_size)) that has zero observational effect on
d1 — hand-tracing the equal-bucket path confirms output-equivalent
behaviour. New diagnostic tests/round21_d1_dist_alias_dump.rs
(Auditor mode, never asserts) captures per-cluster (cfg, D, alias)
triples + cluster-1 full alias dump as evidence; full bisect notes
in crates/oxideav-jpegxl/round21-d1-distbisect.md. Test count
336 → 337 (+1).
Round 20 (2024-spec, Auditor mode) — re-interpreted cjxl
JXL_TRACE output's bits_consumed field as section-local (not
cumulative file position), invalidating the round-17/18/19 claim of a
267-bit overshoot in LfCoefficients. Empirical proof: in the same
trace, AC_GLOBAL_END bits_consumed=307 while DC_GLOBAL_END=1026,
so 307 < 1026 precludes a cumulative reading. With the corrected
interpretation DC_GROUP is 12754 bits (not 11728), LfCoefficients
fits well within the budget, and HfMetadata's slot is 759 bits.

Identified a stronger oracle for the actual divergence: per FDIS
D.3.3, the ANS state must equal 0x00130000 after the final symbol
in any stream. Wired LATEST_ANS_STATE / LATEST_ANS_CALL_COUNT
thread-locals (in src/ans/symbol.rs) so a test can read the
post-decode state without holding the per-stream MaTreeFdis clone.
On d1's LfCoefficients the final state is 0x21914271 after 3072
decode_symbol calls — proving a structural decode divergence (wrong
per-cluster distribution, wrong alias mapping, wrong sample count, or
wrong read in the per-sample loop). The state never reaches the
sentinel within 3072 calls, so it's not a sample-count off-by-one.

Lifted the previous 30-call cap on STATE_TRACE_BUF so end-of-stream
bisects over multi-thousand-sample LF channels are tractable. Five
new tests in tests/round20_d1_*.rs. See
crates/oxideav-jpegxl/round20-d1-hfmeta.md for the full audit and
the round-21 candidate ranking.
Round 19 (2024-spec, Auditor mode) — extended the per-token
trace ring with (ctx, cluster, ans_refill_bits) and added a
STATE_TRACE_BUF recording the first 30 ANS state transitions for
spot-checking against raw codestream bits. New
AnsDecoder::decode_symbol_with_refill reports refill-bit cost. New
tests/round19_d1_cluster.rs drives d1 LfCoefficients under the
extended trace and emits per-cluster / per-ctx histograms plus a
diagnostic eprintln on the leaf-stream EntropyStream::read prelude
bit count. Findings: prelude is bit-exact (602 bits matching cjxl's
num_contexts=16 num_histograms=5 log_alpha_size=6), cluster_map is
bit-exact (16 → 5 distinct clusters), state transitions are
bit-faithful to raw codestream. The 267-bit overshoot remains
unexplained; deferred to round 20 with cjxl --debug per-call
bit-position trace as the proposed next-step. See
crates/oxideav-jpegxl/round19-d1-cluster.md for the full audit.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.0.10

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Other

Added

Changed

Added

Uh oh!