Skip to content

v0.0.11

Latest

Choose a tag to compare

@MagicalTux MagicalTux released this 15 Jun 05:08
· 9 commits to master since this release
8136f5b

Other

  • round-309 (parent-dispatch r309) against ISO/IEC FDIS 18181-1:2021 — per-LfGroup VarDCT three-channel residual-plane assembly + Annex G chroma-from-luma
  • round-306 (parent-dispatch r306) against ISO/IEC FDIS 18181-1:2021 — per-LfGroup VarDCT residual-plane assembly (§C.5.4 placement + §C.8.3 + Table I.4/§I.2.3 pixel-dims)
  • cover non-DCT transforms (Hornuss/DCT2x2/DCT4x4/DCT4x8/DCT8x4/AFV) — round 300
  • round-293 (parent-dispatch r293) against ISO/IEC FDIS 18181-1:2021 — extend the per-block VarDCT decode walk to every plain separable-DCT transform (rectangular + DCT64..256 family), lifting the round-286 square-only orientation deferral
  • Round 286 — per-block VarDCT decode walk to spatial samples (square DCTs)
  • round-281 (parent-dispatch r281) against ISO/IEC FDIS 18181-1:2021 — two §C.8.3 decode-walk prose-conformance fixes: per-varblock channel decode order is Y, X, then B (rounds 221..264 advanced the entropy stream X-first; Listing C.13's (c < 2 ? c ^ 1 : 2) mapping independently corroborates Y-first) + NonZeros(x, y) writeback covers every block of the varblock footprint per the 'for each block in the current varblock' prose (rounds 177..264 wrote only the top-left cell, corrupting PredictedNonZeros reads against continuation cells of multi-cell transforms)
  • round-278 (parent-dispatch r278) against ISO/IEC FDIS 18181-1:2021 — FIX the rounds-31..272 noise-64x64-lossless WP pixel divergence: Listing E.2 error2weight Idiv-first operand order + true_errNW column-0 N-fallback, both pinned by the staged wp-trace-sample-194.md; noise-64x64-lossless now byte-exact on all three planes and synth_320 pixel-exact (102400/102400)
  • name + pin the WP sub_err reading choice (Annex E.1)
  • round-264 (parent-dispatch r264) against ISO/IEC FDIS 18181-1:2021 — HfHistogramDecodeContext::decode_lf_group_three_channels_for_pass bundled per-LfGroup raster-walk three-channel decode driver for one pass
  • round-260 (parent-dispatch r260) against ISO/IEC FDIS 18181-1:2021 — HfHistogramDecodeContext::decode_three_channel_varblock_for_pass bundled three-channel per-varblock walk composing the round-255 single-channel decode method three times against the round-214 BlockContextResolver per-channel block_ctx derivation
  • round-255 (parent-dispatch r255) against ISO/IEC FDIS 18181-1:2021 — HfHistogramDecodeContext::decode_block_for_pass_transform bundled per-varblock decode method composing the round-90 Listing C.14 state machine with the round-252 per-pass histogram routing
  • round-252 (parent-dispatch r252) against ISO/IEC FDIS 18181-1:2021 — multi_pass_hf_histogram_decoder::HfHistogramDecodeContext typed bridge wiring the §C.7.2 entropy stream to the §C.8.3 per-pass histogram_offset routing
  • drop release-plz.toml — use release-plz defaults across the workspace
  • round-247 (parent-dispatch r247) against ISO/IEC FDIS 18181-1:2021 — §C.7.2 HfCoefficientHistograms wrapper performing the actual EntropyStream::read of the 495 × num_hf_presets × nb_block_ctx clustered-distributions block
  • round-238 (parent-dispatch r238) against ISO/IEC FDIS 18181-1:2021 — hf_coeff_histogram_size typed sizing primitive for §C.7.2 + §C.8.3 routing offset
  • round-232 (parent-dispatch r232) against ISO/IEC FDIS 18181-1:2021 — per-LfGroup multi-pass HF-header + per-pass histogram_offset routing driver (§C.8.3 first paragraph)
  • round-228 (parent-dispatch r228) against ISO/IEC FDIS 18181-1:2021 — per-LfGroup multi-pass three-channel varblock decode driver (§C.8.3 + Table C.6 Passes outer pass loop)
  • round-221 (parent-dispatch r221) against ISO/IEC FDIS 18181-1:2021 — three-channel per-LfGroup varblock decode driver (§C.8.3 outer-varblock × inner-X/Y/B sweep)
  • round-214 (parent-dispatch r214) against ISO/IEC FDIS 18181-1:2021 — per-LfGroup BlockContext() resolver (§C.8.3 Listing C.13 + §I.2.2 HfBlockContext bundle)
  • round-208 (parent-dispatch r208) against ISO/IEC FDIS 18181-1:2021 — per-LfGroup varblock-walk driver (§C.5.4 + §C.8.3)
  • round-202 (parent-dispatch r202) against ISO/IEC FDIS 18181-1:2021 — full-row WP state-evolution chain validation across noise-64x64-lossless samples 192..=200
  • Hat-2 scrub: replace 'libjxl' decorative-attribution lines with neutral terms
  • r195 fix — add serial_test for r195 WP trace tests

Added

  • Round 309 — per-LfGroup VarDCT three-channel spatial-reconstruction
    layer
    (src/residual_plane.rs), lifting the round-306 single-channel
    assemble_channel_plane to the X / Y / B level and applying Annex G
    chroma-from-luma. New public API: ChannelResidualPlanes (the three XYB
    residual planes of one LfGroup, all on the shared padded block grid,
    channel order 0 = X / 1 = Y / 2 = B per Listing C.13, with x() / y()
    / b() / dims() accessors); assemble_three_channel_planes(grid, residual_at) (walks the shared dct_select::DctSelectGrid once per
    channel via assemble_channel_plane, invoking the caller's
    residual_at(channel, &vb) decode closure — in VarDCT mode all three
    channels share one DctSelect grid per §C.5.4, and Annex G CfL "is skipped
    if any channel is subsampled," so the three planes are geometrically
    identical); apply_chroma_from_luma(planes, x_from_y, b_from_y, cfl)
    (applies Annex G Listing G.1 in place via the round-138
    chroma_from_luma::apply_hf_plane_inplace: X = dX + kX·Y,
    B = dB + kB·Y, with (kX, kB) looked up per the 64×64 tile containing
    the sample; after the call X-plane holds final X, B-plane final B, Y
    unchanged); and the one-call driver reconstruct_three_channel_planes( grid, x_from_y, b_from_y, cfl, residual_at) (= assemble + CfL). 9 unit +
    5 integration (round309_three_channel_residual_plane, composing the
    real F.3-dequant + I.2.3-IDCT walk across all three channels then the
    real Annex G CfL end-to-end) tests. Lib tests 788 → 797 (+9).
    Pure-control-flow composition primitive — no bit reads, no spec
    re-derivation, no histogram materialisation. Gaborish (Annex J.2) + EPF
    (Annex J.3) run on the returned planes and remain caller-side concerns.
    Source of truth: ISO/IEC FDIS 18181-1:2021 §C.5.4 (DctSelect placement) +
    Annex G (chroma-from-luma, Listing G.1) + §F.3 / §I.2.3 (dequant + IDCT).

  • Round 306 — per-LfGroup VarDCT residual-plane assembly
    (src/residual_plane.rs), the spatial-placement layer directly above
    the round-286/293/300 block_dequant per-block decode walk. Walks a
    dct_select::DctSelectGrid via varblock_walk::VarblockWalk and
    writes each varblock's R × C row-major residual block (the
    block_dequant::decode_block_to_residual output) into a single-channel
    spatial plane at the varblock's pixel origin (bx · 8, by · 8). New
    public API: ResidualPlane (row-major f32 plane sized to the padded
    block grid width_blocks·8 × height_blocks·8, for_grid / get);
    block_pixel_dims(t) (the (R, C) pixel shape from
    idct::dct_pixel_dimsnon_dct_pixel_dims, covering every
    TransformType); place_block(plane, vb, block) (verbatim copy with
    length-mismatch + footprint-spill rejection); and
    assemble_channel_plane(grid, residual_at) (raster-order grid walk
    invoking the caller's per-varblock decode closure once per top-left
    cell, continuation cells skipped, residual-Empty cell rejected). The
    plane is the padded block grid (no per-edge clamping; caller crops to
    lf_w × lf_h). The geometry invariant C == block_dims().0 · 8 /
    R == block_dims().1 · 8 is pinned for every transform. The IDCT
    output already carries the LLF/DC contribution (§I.2.5) so no separate
    DC add at placement; chroma-from-luma / Gaborish / EPF run on the
    assembled plane and remain caller-side concerns. 14 unit + 5
    integration (round306_residual_plane, composing the real F.3-dequant

    • I.2.3-IDCT walk end-to-end) tests. Lib tests 774 → 788 (+14).
      Pure-control-flow geometry primitive — no bit reads, no spec
      re-derivation, no histogram materialisation. Source of truth:
      ISO/IEC FDIS 18181-1:2021 §C.5.4 (DctSelect placement) + §C.8.3 +
      Table I.4 / §I.2.3 (pixel-dims).
  • Round 300 — extend the per-block VarDCT decode walk
    (src/block_dequant.rs) to the non-DCT transforms: Hornuss,
    DCT2×2, DCT4×4, DCT4×8, DCT8×4, and AFV0..AFV3 — i.e. exactly the set
    for which idct::non_dct_pixel_dims returns Some (all 8 × 8).
    This lifts the round-293 deferral. The deferral worried that the
    AFV / DCT2×2 sub-block coefficient extraction "does not reduce to a
    flat identity over an 8 × 8 grid", but per ISO/IEC FDIS 18181-1:2021
    the §I.2.3 sub-block re-mapping happens inside the inverse-transform
    dispatch (idct_afv, idct_dct2x2, …), which the spec applies
    after the Annex F.3 dequant. The §F.3 dequant stage is uniform: it
    multiplies each stored coefficient by a multiplier keyed on "the
    channel, the transform type and the coefficient index inside the
    varblock". For every non-DCT transform the varblock is the 8 × 8
    OrderId-1 grid (coeff_order::varblock_size_for_order(8, 8)),
    the dequant matrix is the 8 × 8 slot matrix
    (weights_matrix_dims_for_slot(8, 8) for slots 1 / 2 / 3 / 9 /
    10), and the decoded block is already in raster index space
    (coeffs[natural_order[k]], natural_order[k] = y·bwidth + x), so
    the per-cell dequant is the identity raster map — exactly as for the
    square / rectangular DCT family, with no orientation subtlety
    (bwidth == bheight == 8). covered_grid_dims now returns Some for
    every TransformType; require_covered's Unsupported path now only
    guards a hypothetical future variant lacking a pixel-dims mapping.
    +3 lib tests (non-DCT all-zero residual census; non-DCT single-coeff
    per-sample-formula identity; AFV0..AFV3 shared-slot/grid dequant
    equality; chained == manual dequant-then-IDCT for the non-DCT path).
    Lib tests 771 → 774.

  • Round 293 — extend the per-block VarDCT decode walk
    (src/block_dequant.rs) from the three square plain-DCT transforms
    to every plain separable-DCT transform: the rectangular
    DCT16×8 / DCT8×16 / DCT32×8 / DCT8×32 / DCT32×16 / DCT16×32 family and
    the larger DCT64×64 … DCT256×256 family. The round-286 orientation
    deferral is lifted by pinning, against ISO/IEC FDIS 18181-1:2021
    §I.2.4 + Table I.4 + Annex I.2.3.2, that the decoded coefficient grid
    (varblock_size_for_order(bwidth, bheight), bwidth >= bheight)
    and the dequant matrix (weights_matrix_dims_for_slot
    (cols, rows) = (bwidth, bheight)) share one "wide"
    bwidth × bheight row-major layout, which is exactly the
    (short × long) "spec coefficient layout" idct_for_transform
    already consumes. A rectangular transform and its transpose
    (e.g. DCT16×8 / DCT8×16) share one coefficient grid and one dequant
    matrix; they differ only in the pixel orientation (R, C) the IDCT
    emits, so the per-cell dequant is the identity and no transpose is
    needed in this stage. New public API covered_grid_dims(t) -> Option<(bwidth, bheight)> (the full plain-DCT covered set, keyed off
    dct_pixel_dims); covered_square_dim retained for the square
    subset; dequant_block_for_transform / decode_block_to_residual
    now accept the whole plain-DCT set. The non-DCT transforms
    (Hornuss / DCT2×2 / DCT4×4 / DCT4×8 / DCT8×4 / AFV0..AFV3) stay
    Error::Unsupported — their dequant matrix is canonicalised to 8×8
    while their IDCT path is the §I.2.3 dispatch, so the sub-block
    coefficient extraction does not reduce to a flat 8×8 identity.
    +4 unit tests (transpose-pair grid/matrix sharing, full plain-DCT
    covered-set census, rectangular all-zero + pure-DC residuals);
    lib tests 767 → 771.

  • Round 286 — first per-block VarDCT decode-walk stage that reaches
    spatial samples (src/block_dequant.rs). Chains the §C.8.3 decoded
    quantised-coefficient block through Annex F.3 HF dequantisation and
    the Annex I.2.3.2 inverse DCT for the square plain-DCT transforms
    (DCT8×8 / DCT16×16 / DCT32×32), where the coefficient grid, the
    dequantisation matrix, and the inverse-DCT input all share one
    unambiguous dim × dim row-major layout. New public API:
    dequant_block_for_transform (Annex F.3 across the whole raster,
    per-cell dequant-matrix entry via slot_for_transform),
    decode_block_to_residual (dequant → idct_for_transform), and
    covered_square_dim. Rectangular / non-DCT transforms return
    Error::Unsupported, deferred to a follow-up round so their
    coefficient-grid-vs-pixel-block orientation can be pinned
    independently. 11 unit tests; lib tests 756 → 767.

Fixed

  • Round 281 — two §C.8.3 decode-walk prose-conformance fixes against
    ISO/IEC FDIS 18181-1:2021, both affecting the (not-yet-wired)
    VarDCT HF coefficient path. (1) Per-varblock channel decode
    order is Y, X, then B
    — the §C.8.3 prose reads "for each
    varblock it reads channels Y, X, then B"; rounds 221..264 advanced
    the entropy stream X-first. Fixed in
    block_context_resolver::decode_varblocks_three_channels_with_resolver
    (round 221; also feeds the round-228 multi-pass and round-232
    HF-header drivers) and
    HfHistogramDecodeContext::decode_three_channel_varblock_for_pass
    (round 260; also feeds the round-264 per-LfGroup driver). Output
    arrays stay indexed 0 = X / 1 = Y / 2 = B per Listing C.13's
    "c is the current channel (with 0=X, 1=Y, 2=B)" — only the
    stream-advance order changed. The Listing C.13 BlockContext()
    channel mapping (c < 2 ? c ^ 1 : 2) (Y → 0, X → 1, B → 2)
    independently corroborates Y-first decode order. (2)
    NonZeros(x, y) writeback covers every block of the varblock
    — the prose reads "The decoder then computes the NonZeros(x, y)
    field for each block in the current varblock"; rounds 177..264
    wrote only the top-left cell, so a neighbouring varblock's
    PredictedNonZeros(x, y) reading a continuation cell of a
    multi-cell transform (e.g. the second row/column of a DCT16×16)
    saw the zero-init sentinel instead of the varblock's
    ceiling-divided value. NonZerosGrid::update_after_block_for_transform
    now fills the full TransformType::block_dims() footprint
    (rejecting footprints that spill outside the grid); the
    per-channel / per-pass wrappers and every typed driver above them
    inherit the fix. Ordering + footprint tests rewritten to the
    prose readings across round177 / round183 / round190 /
    round221 / round228 suites plus the in-module unit tests; new
    rectangular-footprint (DCT16×8 1×2-cell vs DCT8×16 2×1-cell) and
    footprint-spill rejection pins. Tests 1156 → 1159.

  • Round 278 — the long-standing noise-64x64-lossless Weighted-
    Predictor pixel divergence (rounds 31..272) is FIXED; the fixture
    decodes byte-exact on all three planes and the round-10 synth_320
    drift is gone (102400/102400 pixels correct). Two FDIS Annex E
    readings in modular_fdis::wp_predict, both pinned by the staged
    behavioural trace
    (docs/image/jpegxl/fixtures/noise-64x64-lossless/wp-trace-sample-194.md):
    (1) Listing E.2 error2weight performs the inner
    (1 << 24) Idiv ((err_sum >> shift) + 1) division FIRST and
    multiplies the truncated quotient by maxweight (the FDIS-2021
    parenthesisation) — the trace's 52 full-precision
    (err_sum, weight) cells (samples 188..200) all match this
    reading while the previous multiply-first form mismatches 18 of
    them; (2) the true_errNW read falls back to true_errN when NW
    does not exist (x = 0), matching the H.5.2 NW/NE→N edge rule the
    err_sum accumulator reads already applied — the previous zero
    fallback corrupted every column-0 prediction and produced the
    sample-129 Δ = -21 state-evolution divergence. Root-caused via a
    from-scratch Annex E state-evolution sweep over the fixture's
    known-correct decoded values across every contested reading knob:
    exactly one combination reproduces all 13 traced samples plus the
    three known row-2 stored true_err cells (737 / -456 / -165), and
    it differs from production only in these two readings. The
    production 8x-domain sub_err reading (round 272) is confirmed —
    the literal reading now breaks the fixture at plane[0] sample 68.
    New error2weight_pub oracle + tests/r278_error2weight_trace.rs
    (3 tests) pin the 52 trace cells and the operand order; 12
    historical divergence-pin tests across 6 files
    (r32/round10/r126/r195/r202/r272) promoted to
    spec/pixel-exact assertions. Tests 1153 → 1156.

Added

  • Round 272 — extracted the Weighted-Predictor post-decode
    sub_err_i computation (FDIS Annex E.1 / §H.5.2) into the named
    modular_fdis::sub_err_for (8x-domain magnitude-then-round reading,
    used on the decode path) plus a modular_fdis::sub_err_fdis_literal
    reference oracle for the literal FDIS-2021 listing reading
    abs(((prediction_i + 3) >> 3) - true_value). New
    tests/r272_sub_err_reading.rs (4 tests) pins the reading choice as
    a regression guard: the two readings coincide for every non-negative
    sub-prediction (so both reproduce the noise-64x64-lossless
    sample-194 trace value sub_err = [122, 59, 18, 36]) but diverge for
    negative sub-predictions; and the production decode path must keep
    synth_320's round-10 drift anchor at PG[0][0] (y=24, x=14) — the
    literal reading moves it EARLIER to (y=11, x=104) (decodes the
    fixture less far), confirming the 8x-domain reading is the
    bisect-validated one. Round 272 also ruled the sub_err reading OUT
    as the cause of the residual noise-64x64-lossless sample-129
    Δ = -21 WP state-evolution divergence (switching readings leaves
    that fixture's divergence profile unchanged).

  • Round 264 —
    multi_pass_hf_histogram_decoder::HfHistogramDecodeContext::decode_lf_group_three_channels_for_pass
    bundled per-LfGroup raster-walk three-channel decode driver for one
    pass against ISO/IEC FDIS 18181-1:2021 §C.8.3 — one
    (br, p, grid, resolver, qdc_at, predicted_at) call walks the
    DctSelectGrid in raster order via VarblockWalk, invokes the
    caller's per-varblock qdc_at + predicted_at closures once per
    varblock to read the shared qdc[3] triple and the per-channel
    predicted[3] triple, then composes the round-260
    decode_three_channel_varblock_for_pass bundled three-channel walk
    to yield one ThreeChannelVarblock per top-left cell. Returns the
    in-raster-order Vec<ThreeChannelVarblock> per the round-221 / 228
    / 260 type alias. The driver owns both the raster walk and the
    §C.7.2 entropy-stream routing through the round-252 typed decode
    context — no read_non_zeros / decode_symbol closures cross the
    boundary, only the storage-only qdc_at + predicted_at lookups
    do. Per-varblock ordering: qdc_at fires before predicted_at;
    per-LfGroup ordering: row-major (DctSelectGrid raster). Defensive
    shape: propagates VarblockWalk::next errors (residual Empty
    cell), closure errors (qdc_at aborts before predicted_at;
    predicted_at error aborts before the inner method runs), and any
    inner decode_three_channel_varblock_for_pass error verbatim. On
    closure error the per-varblock cursor halts without advancing the
    BitReader past the failing call. Empty grid (width × height == 0) yields an empty output vector. 11 unit + 10 integration
    (round264_lf_group_three_channels_for_pass) tests pin: 1×1 DCT8×8
    short-circuit; 2×2 / 3×3 uniform raster ordering ((0,0), (1,0),
    (0,1), (1,1) — row-major); per-varblock qdc → predicted → decode
    ordering; per-pass offset routing matches round-260 cluster_map
    indexing for both p = 0 and p = 1; mixed-transform grid
    (DCT16×16 single varblock covering 2×2 cells) emits one
    varblock with coeffs.len() == 256 per channel; out-of-range pass
    index rejected; residual Empty cell rejected (VarblockWalk error
    propagated); closure errors (qdc_at / predicted_at) propagated
    without advancing the BitReader past the failing call; round-trip
    with PerPassHfHeaders::read driven off a real bitstream
    preserves per-pass histogram offsets across both passes; empty
    grid yields empty vector. Lib tests 742 → 753 (+11).

  • Round 260 —
    multi_pass_hf_histogram_decoder::HfHistogramDecodeContext::decode_three_channel_varblock_for_pass
    bundled three-channel per-varblock walk against ISO/IEC FDIS
    18181-1:2021 §C.8.3 — one
    (br, p, vb, resolver, qdc, predicted[3]) call composes the
    round-255 single-channel decode_block_for_pass_transform three
    times (channel order X = 0 → Y = 1 → B = 2 per the §C.8.3 listing
    sequence) against the round-214
    BlockContextResolver::resolve(c, vb, qdc) per-channel Listing
    C.13 block_ctx derivation, returning the per-channel
    ([DecodedHfBlock; 3], [u32; 3]) pair (decoded coefficient bundle
    plus the un-divided raw_non_zeros triple the caller threads into
    the per-channel NonZeros-grid bookkeeping). The nb_block_ctx
    invariant is read off resolver.nb_block_ctx() so the caller does
    not have to pass it separately; the qdc[3] triple is shared
    across the three channels per round-221's per-varblock invariant
    (one read, three lookups). Channel ordering is fixed at X → Y → B
    — the §C.7.2 entropy stream advances in that order; an error on Y
    aborts before B reads, so the B-channel ANS state is not
    advanced (matching round-221's error-path invariant). Defensive
    shape: propagates any BlockContextResolver::resolve error
    (channel > 2, s out-of-range, threshold-table inconsistency)
    and any decode_block_for_pass_transform error (out-of-range
    pass index, u32-overflow ctx + offset, downstream
    EntropyStream error, or non_zeros > size - num_blocks cap)
    verbatim. 8 unit + 11 integration
    (round260_three_channel_varblock_for_pass) tests pin: DCT8×8 /
    DCT16×16 / DCT16×8 / DCT8×16 / DCT4×4 per-channel short-circuit
    to raw == [0, 0, 0] → coeffs_read == 0 → all-zero coeffs vector of the right length; per-pass offset routing matches round-252
    cluster_map indexing for both p = 0 and p = 1 against a
    2-preset bundle; out-of-range pass index rejected; u32 overflow
    on ctx + offset rejected; BitReader cursor unchanged on a
    short-circuited three-channel block; round-trip with
    PerPassHfHeaders::read driven off a real bitstream preserves
    the per-pass histogram offsets across both passes; per-channel
    block_ctx values resolved by the BlockContextResolver are < nb_block_ctx (= 15) for the default-table bundle. Lib tests 734
    → 742 (+8).

  • Round 255 —
    multi_pass_hf_histogram_decoder::HfHistogramDecodeContext::decode_block_for_pass_transform
    bundled per-varblock decode method closing the round-252 deferred
    next-step "per-block raster walk remain caller-side concerns above
    this primitive" against ISO/IEC FDIS 18181-1:2021 §C.8.3 + Listing
    C.13 + Listing C.14. One (p, t, predicted, block_ctx, nb_block_ctx) call now wires the round-90 Listing C.14 state
    machine (prev_nonzero[] tracking, non_zeros == 0 early-stop,
    non_zeros > size - num_blocks defensive cap) against the
    round-252 per-pass histogram routing for one varblock, returning
    the round-90 DecodedHfBlock coefficient bundle plus the un-
    divided raw_non_zeros for downstream (raw + num_blocks - 1) Idiv num_blocks NonZeros-grid bookkeeping. The internal walk is a
    single sequential &mut self loop because the two underlying
    entry points (non_zeros_at, coefficient_at) each need &mut self and therefore can't be wrapped into the round-90
    read_non_zeros_and_decode_block_for_transform closure pair —
    this method is the typed bridge. Defensive shape: rejects p >= num_passes, ctx + offset > u32::MAX, and num_blocks == 0 /
    mismatched natural-order length, all without panicking. 7 unit +
    10 integration (round255_decode_block_for_pass_transform) tests
    pin: DCT8×8 / DCT16×16 / DCT16×8 / DCT8×16 / DCT4×4 short-circuit
    to raw_non_zeros == 0 → coeffs_read == 0 → all-zero coeffs vector of the right length; per-pass offset routing matches round-252
    cluster_map indexing; out-of-range pass index rejected; u32
    overflow on ctx + offset rejected; BitReader cursor unchanged on
    a short-circuited block; round-trip with PerPassHfHeaders::read
    driven off a real bitstream preserves the per-pass histogram
    offsets. Lib tests 727 → 734 (+7).

  • Round 252 —
    multi_pass_hf_histogram_decoder::HfHistogramDecodeContext typed
    bridge that wires the round-247 HfCoefficientHistograms §C.7.2
    entropy stream to the round-232 PerPassHfHeaders per-pass
    (hfp, histogram_offset) array, closing the round-247 deferred
    next-step (the §C.8.3 per-block decode walk through the freshly-
    read histograms). Public surface against ISO/IEC FDIS 18181-1:2021:
    HfHistogramDecodeContext::new(histograms, headers) validates
    per-pass hfp < histograms.num_hf_presets() (defensive cross-
    container invariant) + headers.num_passes() ≥ 1, then caches the
    per-pass histogram_offset array for a single-array-index per-
    symbol path. Three decode entry-points expose the §C.8.3 prose
    shape: (1) decode_symbol_for_pass(br, p, ctx) performs the raw
    D[ctx + histogram_offset(p)] routing through
    EntropyStream::decode_symbol; (2) non_zeros_at(br, p, predicted, block_ctx, nb_block_ctx) composes
    pass_group_hf::non_zeros_context + the per-pass offset routing,
    matching the spec's D[NonZerosContext(predicted) + offset] line
    exactly; (3) coefficient_at(br, p, k, non_zeros, num_blocks, size, prev, block_ctx, nb_block_ctx) composes
    pass_group_hf::coefficient_context + the per-pass offset
    routing, matching the spec's D[CoefficientContext(...) + offset] line, and propagates the num_blocks == 0 rejection
    without touching the BitReader. The (ctx + offset) sum is
    computed in u64 with a defensive u32 overflow check so the
    spec-permitted parameter maxima (nb_block_ctx ≤ 256 ×
    hfp < num_hf_presets ≤ 2^28) cannot silently truncate. Accessor
    surface: num_passes(), histogram_offset(p),
    per_pass_offsets() slice. Adds 10 unit tests + 9 integration
    tests (tests/round252_multi_pass_hf_histogram_decoder.rs)
    pinning: zero-pass rejection (no decode without passes); per-pass
    hfp ≥ num_hf_presets cross-container rejection; per-pass offset
    caching matches PerPassHfHeaders::histogram_offset independent
    read; single-symbol prefix decode for (p, ctx) matrix consumes
    zero bits and returns 0; out-of-range pass index rejection;
    u32-overflow synthetic histogram_offset rejection;
    non_zeros_at composes cleanly with non_zeros_context (cross-
    checked against the standalone helper); coefficient_at composes
    cleanly with coefficient_context (cross-checked against the
    standalone helper); num_blocks == 0 rejection propagation does
    not advance the BitReader; round-trip with
    PerPassHfHeaders::read against a real bitstream (round-232
    derivation) preserves the per-pass offsets. Lib test count
    717 → 727 (+10). Pure-control-flow wiring primitive — no spec
    re-derivation, no ANS state initialisation, no per-block raster
    walk. The per-channel BlockContext() history threading, per-
    channel coefficient-order lookup against hf_pass::HfPass, and
    the per-block raster walk remain caller-side concerns above this
    primitive.

  • Round 247 — hf_coefficient_histograms::HfCoefficientHistograms
    typed wrapper closing the round-238 deferred next-step. Performs
    the actual ISO/IEC FDIS 18181-1:2021 §C.7.2 codestream read of the
    495 × num_hf_presets × nb_block_ctx clustered-distributions block
    by routing HfCoefficientHistogramSize::num_distributions() into
    modular_fdis::EntropyStream::read as num_dist. Two entry-points:
    read(br, size) for a caller-built sizing descriptor, and
    read_after_hf_pass_sequence(br, num_hf_presets, nb_block_ctx)
    for the §C.7.1 → §C.7.2 transition convenience (constructs the
    sizing descriptor inline so a caller that has just walked
    hf_pass::read_hf_pass_sequence can drive the §C.7.2 step against
    the same BitReader without a separate constructor call). ANS
    state initialisation is deferred to read_ans_state_init per the
    round-3 2024-spec correction (the u(32) initialiser is read
    between the prelude and the first symbol decode); forwarded
    straight through to EntropyStream::read_ans_state_init. Defensive
    usize-cap guard on num_distributions() rejects 32-bit overflow
    before the EntropyStream::read call. Sizing accessors
    (num_distributions, offset_for_hfp, num_hf_presets,
    nb_block_ctx) forward through the underlying
    HfCoefficientHistogramSize. entropy_mut() exposes the
    underlying stream for the downstream §C.8.3 per-block decode loop.
    Adds 7 unit tests + 6 integration tests
    (tests/round247_hf_coefficient_histograms.rs). Lib test count
    710 → 717 (+7). Pure wiring primitive — the per-block decode walk
    through the freshly-read histograms (Listing C.13 contexts already
    landed by rounds 90 / 214 / 221 / 228 / 232) remains the next
    deferred step.

  • Round 238 — hf_coeff_histogram_size::HfCoefficientHistogramSize
    typed sizing primitive for the §C.7.2 HF coefficient histogram
    block. Encapsulates the spec line "Let nb_block_ctx be equal to
    max(block_ctx_map)+1. The decoder reads a histogram with
    495 × num_hf_presets × nb_block_ctx clustered distributions D
    from the codestream as specified in D.3." behind a single typed
    constructor pair (new(num_hf_presets, nb_block_ctx) and
    from_block_ctx_map(map, num_hf_presets)), plus accessors
    per_preset() (495 × nb_block_ctx), num_distributions()
    (495 × num_hf_presets × nb_block_ctx — the §C.7.2 total),
    and offset_for_hfp(hfp) (495 × nb_block_ctx × hfp — the
    §C.8.3 per-pass routing offset, with hfp < num_hf_presets
    range check). Spec constant published as
    PER_PRESET_PER_BLOCK_CTX = 495. Defensive zero-input guards
    reject num_hf_presets == 0, nb_block_ctx == 0, and empty
    block_ctx_map. The duplicated 495u64 * num_hf_presets * nb_block_ctx and 495u64 * nb_block_ctx * hfp arithmetic in
    hf_pass::HfPass::read and pass_group_hf::PassGroupHfHeader::read
    is now routed through the primitive so the spec constant has one
    home and the per-pass offset shares its nb_block_ctx factor
    with the §C.7.2 read-size derivation. Sizing-only — the actual
    §C.7.2 EntropyStream::read(br, num_distributions) call against
    the clustered-distributions block remains the deferred next step.
    Adds 5 unit tests + 6 integration tests
    (tests/round238_hf_coeff_histogram_size.rs). Lib test count
    705 → 710 (+5). Pure refactor; no wire-format change. (§C.7.2
    entropy-stream read itself remains a deferred next step.)

  • Round 232 — multi_pass_hf_header::PerPassHfHeaders +
    decode_multi_pass_with_hf_headers per-LfGroup multi-pass driver
    with per-pass hfp reads + per-pass histogram_offset routing
    (FDIS §C.8.3 first paragraph). New multi_pass_hf_header module
    wraps the round-228
    [multi_pass_decode::decode_multi_pass_three_channels_with_resolver]
    driver with the §C.8.3 first-paragraph per-pass header read
    hfp = u(ceil(log2(num_hf_presets))) and the derived
    histogram_offset = 495 × nb_block_ctx × hfp the spec writes as
    the offset term in D[NonZerosContext(...) + offset] and
    D[CoefficientContext(...) + offset]. PerPassHfHeaders::read(br, num_passes, num_hf_presets, nb_block_ctx) consumes the
    per-pass header sequence by invoking the round-90
    [pass_group_hf::PassGroupHfHeader::read] once per pass;
    from_headers builds the container from a pre-built Vec.
    Accessors expose per-pass hfp + histogram_offset + a
    PassHfDigest snapshot. The new driver
    decode_multi_pass_with_hf_headers mirrors the round-228 signature
    with two augmented closure shapes
    read_non_zeros(p, channel, predicted, histogram_offset) /
    decode_symbol(p, channel, coeff_ctx, histogram_offset) — the
    per-pass histogram_offset is pre-resolved once per pass before the
    inner per-varblock walk so the closure body sees a constant offset
    across each pass's per-channel calls. Pass count is taken from
    headers.num_passes() and verified against nz.num_passes()
    (mismatch returns Error::InvalidData). The companion
    read_and_decode_multi_pass_with_hf_headers reads the per-pass
    header sequence inline from a BitReader and invokes the driver
    in one call — the entry-point a future round wiring the §C.7.2
    entropy histogram bundle (#799 DOCS-GAP) into a per-pass
    EntropyStream will use. 16 unit + 12 integration
    (round232_multi_pass_hf_header) tests pin: per-pass header read
    with num_hf_presets ∈ {1, 2, 4, 8} (single-preset zero-bit fast
    path, two-preset one-bit-per-pass, four-preset two-bits-per-pass,
    eight-preset three-bits-per-pass with 15 bits across 5 passes);
    digest round-trip through bits LSB-first; hfp = 0 always yielding
    histogram_offset = 0 regardless of nb_block_ctx;
    histogram_offset scaling with nb_block_ctx (495 × 100 =
    49500); get / histogram_offset / hfp out-of-range errors;
    zero-passes degenerate case yielding an empty container;
    PassGroupHfHeader::read num_hf_presets == 0 rejection
    propagating through PerPassHfHeaders::read; the driver routing
    the per-pass offset uniformly across all three channels (X / Y / B)
    within a pass; both read_non_zeros and decode_symbol closures
    receiving the matching per-pass offset (378 = 2 × 3 × 63
    decode_symbol calls covering the full DCT8×8 k ∈ [num_blocks, size) sweep); per-pass error propagation (pass-1 closure failure
    aborts the outer driver); num_passes mismatch
    (headers.num_passes() != nz.num_passes()) rejected pre-walk;
    pass-distinct qdc_at closure invocation preserving the round-228
    per-pass qdc[3] propagation; mixed transform DCT16×8 + 2 DCT8×8 layout consistency across passes with distinct per-pass
    offsets; inline read_and_decode_multi_pass_with_hf_headers
    end-to-end (header bits consumed exactly, decode walk runs, output
    shape matches); inline-read error path (empty BitReader yields a
    proper Error::InvalidData from read_bit); per-pass-header
    offsets-threaded-through-both-closures invariant verifying
    decode_symbol calls observe the same per-pass offset as
    read_non_zeros across the 2-pass × 3-channel sweep. Lib tests
    689 → 705 (+16). Pure-control-flow primitive in the same shape as
    round-89 [dct_quant_weights], round-95 [hf_dequant], round-121
    [llf_from_lf], round-138 [chroma_from_luma], round-141
    [gaborish], round-144 [epf], round-147 [afv::afv_idct],
    round-159 / 164 [pass_group_hf], round-177 [non_zeros_grid],
    round-183 [per_channel_non_zeros], round-190
    [per_pass_non_zeros], round-208 [varblock_walk], round-214
    [block_context_resolver], round-221's three-channel driver, and
    round-228's multi-pass driver — no bit reads beyond the per-pass
    hfp u-read defined by the spec line, no spec re-derivation, no
    histogram materialisation, no ANS state setup. A future round
    wiring §C.7.2 histograms + per-pass [hf_pass::HfPass] selection
    (the select_pass(passes) method on PassGroupHfHeader already
    performs the per-pass coefficient-order lookup) can drop this
    driver in as the per-LfGroup multi-pass HF-header + histogram-
    routing control-flow layer.

  • Round 228 — multi_pass_decode::decode_multi_pass_three_channels_with_resolver
    per-LfGroup multi-pass three-channel varblock decode driver (FDIS
    §C.8.3 + Table C.6 Passes). New multi_pass_decode module lifts
    the round-221 single-pass three-channel driver into an outer
    per-pass loop that iterates p ∈ [0, num_passes), gathering per-
    pass [block_context_resolver::ThreeChannelVarblock] vectors in
    pass order — out[p][i] is the i-th varblock (raster order)
    decoded in pass p. The driver reads num_passes off
    nz.num_passes() (the
    [per_pass_non_zeros::PerPassNonZerosGrids] container is the
    authoritative pass-count source), walks the
    [dct_select::DctSelectGrid] once per pass, invokes the caller's
    qdc_at(p, &vb) closure once per varblock per pass (so the
    closure may read from a per-pass quantised-LF buffer if the
    upstream signal evolves between passes), and threads each
    (p, c) call through
    [per_pass_non_zeros::PerPassNonZerosGrids::decode_block_at_for_pass_channel].
    The per-pass per-channel NonZeros(x, y) bookkeeping is already
    isolated by p (round-190 invariant), so the caller does not have
    to clear state between passes. The read_non_zeros(p, channel, predicted) / decode_symbol(p, channel, coeff_ctx) closures take
    the pass index as their first argument so the caller can route
    each call to the matching per-pass per-channel histogram without
    rebinding closures for each pass. The new
    MultiPassThreeChannelOutput type alias names the per-LfGroup
    output shape; the new count_decoded_blocks(grid, num_passes)
    helper returns num_passes × count_varblocks(grid) for callers
    that need to size a downstream coefficient buffer ahead of time
    (defensive u64 overflow check on the multiplication). 14 unit +
    12 integration (round228_multi_pass_decode) tests pin: single-
    pass single-DCT8×8 parity with the round-221 inner driver; 4×4
    DCT8×8 grid (16 varblocks) preserving raster order in a single
    pass; two-pass 2×2 raster-order per-pass walk; per-pass qdc
    closure invocation count (3 passes × 4 varblocks = 12 calls, not
    36); three-pass per-channel routing isolation with pass-distinct
    raw_non_zeros values landing on per-pass writeback cells without
    cross-pass leakage; pass error aborts remaining passes (the
    outer Vec is discarded on error); pass-0 inner error aborts
    before pass-1 starts (pass-1 closure never called); per-pass
    predicted invariant (PredictedNonZeros(0, 0) = 32 across every
    pass + channel); per-pass qdc[3] value propagation through the
    outer loop; mixed-transform (DCT16×8 + 2 DCT8×8) layout
    consistency across passes; pass-1 channel routing read from
    pass-1 histogram; count_decoded_blocks helper covers
    num_passes ∈ {0, 1, 2, 5, u32::MAX}; DCT16×16 single-block
    single-pass pass-through; integration coverage of pass-index
    threading through both read_non_zeros and decode_symbol
    closures; inner-driver mid-varblock error (pass 1, X-channel
    decode_symbol failure) propagating through the outer loop.
    Lib tests 675 → 689 (+14). Pure-control-flow primitive in the
    round-89 / 95 / 121 / 138 / 141 / 144 / 147 / 159 / 164 / 177 /
    183 / 190 / 208 / 214 / 221 family; no bit reads, no spec re-
    derivation, no histogram materialisation. The follow-up §C.7.2
    histogram array (#799 DOCS-GAP) + per-pass hfp selection +
    per-channel BlockContext() history threading still apply
    unchanged — round 228 is purely the outer-loop control-flow
    layer above the round-221 inner three-channel driver.

  • Round 221 — block_context_resolver::decode_varblocks_three_channels_with_resolver
    three-channel per-LfGroup varblock decode driver (FDIS §C.8.3
    prose ordering: outer varblock raster, inner X / Y / B channel
    sweep). Walks the dct_select::DctSelectGrid once; computes the
    shared qdc[3] triple once per varblock; invokes
    BlockContextResolver::resolve three times against that shared
    qdc (channel order 0 = X → 1 = Y → 2 = B); routes each (p, c)
    call through
    per_pass_non_zeros::PerPassNonZerosGrids::decode_block_at_for_pass_channel.
    Return is Vec<ThreeChannelVarblock> = per-varblock
    (Varblock, [DecodedHfBlock; 3], [u32; 3]) triples in raster
    order; per-channel ANS closures are
    read_non_zeros(channel, predicted) and
    decode_symbol(channel, coeff_ctx) so the caller routes
    per-channel histograms inside one closure pair. The new
    ThreeChannelVarblock type alias names the per-varblock output
    triple. 11 unit + 12 integration
    (round221_three_channel_resolver) tests pin: single-DCT8×8 with
    3 per-channel decodes per varblock; 4×4 DCT8×8 grid (16 varblocks)
    preserving raster order; single DCT16×16 (1 varblock); qdc
    closure invoked exactly once per varblock (= 4 calls for 4
    varblocks, NOT 12); strict X / Y / B channel order at each
    read_non_zeros / decode_symbol call site; per-channel
    non_zeros writeback at (0, c, 0, 0) with distinct per-channel
    raw counts (10 / 20 / 30); per-pass routing (pass = 1 isolated
    from pass = 0); qdc error aborts before any per-channel reads;
    X-channel error aborts before Y + B reads; mixed-transform
    DCT16×8 + 2 DCT8×8 placement preserved; custom HfBlockContext
    (qf_threshold = 5) round-trip; DCT16×16 num_blocks = 4
    per-channel non_zeros = 4 → 4 decode_symbol calls

    • (4 + 3) / 4 = 1 stored.
  • Round 214 — block_context_resolver module (per-LfGroup
    BlockContext() resolver, FDIS §C.8.3 Listing C.13 + §I.2.2
    HfBlockContext bundle). Exposes the borrow-based
    BlockContextResolver::new(&HfBlockContext) wrapper with a
    per-varblock resolve(channel, &Varblock, qdc) -> Result<u32>
    lookup (applies order_id_for_transform for s, threads
    hf_mul as qf, forwards qdc[3] + the LfGlobal
    qf_thresholds / lf_thresholds / block_ctx_map to the
    round-159 pass_group_hf::block_context formula) plus
    decode_varblocks_with_resolver(grid, nz, p, c, &resolver, qdc_at, read_non_zeros, decode_symbol) driver that pairs the
    round-208 VarblockWalk raster-order iterator with the
    round-190 PerPassNonZerosGrids::decode_block_at_for_pass_channel
    per-block primitive. The resolver eliminates the four-argument
    (qf_thresholds, lf_thresholds, block_ctx_map, nb_block_ctx)
    boilerplate at every per-varblock callsite. 14 unit + 12
    integration (round214_block_context_resolver) tests pin:
    borrow accessor + nb_block_ctx default-15 pass-through;
    default-branch (c=0, s=0) / (c=1, s=0) / (c=2, s=0)
    DCT8×8 → block_ctx_map[{13, 0, 26}] = {7, 0, 7}; DCT16×16 /
    DCT32×32 / DCT16×8 / DCT8×16 / Hornuss order-id mapping;
    default-branch invariance to qdc and hf_mul (empty
    thresholds collapse those knobs); custom-branch
    qf_threshold perturbation; driver pass-through on
    single-DCT8×8 / raster-order 2×2 DCT8×8 / single-DCT16×16
    grids; qdc_at closure called once per varblock in walk
    order; closure-error propagation. Lib tests 650 → 664 (+14).
    Pure-control-flow primitive in the round-89 / 95 / 121 / 138 /
    141 / 144 / 147 / 159 / 164 / 177 / 183 / 190 / 208 family; no
    bit reads, no spec re-derivation, no histogram materialisation.

  • Round 208 — varblock_walk module (per-LfGroup varblock-walk
    driver, FDIS §C.5.4 + §C.8.3). Exposes the Varblock descriptor
    ({x, y, transform, hf_mul}), the borrow-based VarblockWalk
    raster-order iterator over a dct_select::DctSelectGrid (skips
    Continuation cells; residual Empty cell errors cleanly), the
    count_varblocks cell-scan helper, and the typed per-pass
    per-channel driver decode_varblocks_for_pass_channel that
    walks the grid + invokes the caller's block_ctx_for_varblock
    closure (Listing C.13 BlockContext() lookup) + threads each
    varblock through
    per_pass_non_zeros::PerPassNonZerosGrids::decode_block_at_for_pass_channel.
    Returns the in-raster-order Vec<(Varblock, DecodedHfBlock, raw_non_zeros)> triple. 14 unit + 12 integration
    (round208_varblock_walk) tests pin single-DCT8×8 / raster-order
    4×4 / DCT16×16-covers-2×2 / mixed-transform placement order /
    count-vs-walk parity / residual-Empty error / all-Continuation
    tolerance / hf_mul top-left read / typed driver per-pass
    per-channel routing isolation / closure-error propagation /
    DCT16×16 typed-driver pass-through / multi-varblock distinct
    hf_mul. Lib tests 636 → 650 (+14). Pure-control-flow primitive
    in the round-89 / 95 / 121 / 138 / 141 / 144 / 147 / 159 / 164 /
    177 / 183 / 190 family; no bit reads, no spec re-derivation, no
    histogram materialisation.

  • Round 202 — tests/r202_wp_row3_chain.rs (7 tests) widens the
    round-191 / round-195 weighted-predictor diagnostic from a
    one-sample pin into a full-row chain across noise-64x64-lossless
    samples 192..=200, validating the production WP state against the
    trace doc's surrounding-sample context table
    (wp-trace-sample-194.md lines 130-168). New finding: the WP
    divergence is already large at sample 192 (Δ pred8 = -50,
    Δ stored = -50), before the round-191-pinned Δ pred8 = +8 at
    sample 194. Tests pin in-row + cross-row read chains, sample 192's
    left-border zeroing, sample 194's cross-row reads, and the
    production decoded value v(194) = 35.