Conversation
Affected types: Rgbf32Frame, Rgbf16Frame, Gbrpf32Frame, Gbrapf32Frame, Gbrpf16Frame, Gbrapf16Frame. Each doc-comment now explicitly states the LE-encoded byte contract, mirroring the established Grayf32Frame doc pattern: the &[f32] / &[f16] plane is the LE-encoded byte layout reinterpreted as f32 / f16, matching FFmpeg's canonical *LE pixel-format suffixes (AV_PIX_FMT_GBRPF32LE etc). Adds the bytemuck::cast_slice + linesize-division-by-element-size instruction so callers wiring up FFmpeg buffers don't have to guess. Codex 3rd-pass review of PR #85 caught that PR #83 + PR #84 introduced sinker routings that assumed the planes are already host-native f32 / f16 — a wrong reading of the *LE contract that would corrupt every output path on a BE host. Pinning the doc-level contract here so future readers don't repeat the mistake; sinker revert lands in the next commit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Mirrors PR #85's Grayf32 sinker revert (52f8191) for the other six float frame types affected by PR #83 (dcf40a3) and PR #84 (8627280): Rgbf32, Rgbf16, Gbrpf32, Gbrapf32, Gbrpf16, Gbrapf16. The earlier PRs introduced sinker-layer `HOST_NATIVE_BE` consts that made the row-kernel `BE` parameter a no-op byte-swap on whichever host the build targeted. That assumed the Frame's plane bytes are already host-native f32/f16 — which contradicts the LE-encoded byte contract clarified in the previous commit. On a BE host the routing skipped the required `u32::from_le` / `u16::from_le` swap and would have corrupted every output path. Reverted call sites (all `::<HOST_NATIVE_BE>` -> `::<false>`): - src/sinker/mixed/packed_rgb_float.rs: 6 sites (Rgbf32) - src/sinker/mixed/packed_rgb_f16.rs: 7 sites (Rgbf16) - src/sinker/mixed/planar_gbr_float.rs: 22 sites (Gbrpf32 + Gbrapf32) - src/sinker/mixed/planar_gbr_f16.rs: 22 sites (Gbrpf16 + Gbrapf16) Removes the four `const HOST_NATIVE_BE: bool = cfg!(target_endian = "big")` definitions and their explanatory doc-comments. Rewrites the `widen_f16_to_f32` doc in planar_gbr_f16.rs to accurately describe the LE-encoded contract (helper is correct on LE host; cross-endian widen should use the bit-normalising `widen_f16_be_to_host_f32` pattern). Untouched: `BE == HOST_NATIVE_BE` `copy_from_slice` fast paths in scalar row kernels (`rgbf32_to_rgb_f32_row`, `gbrpf32_to_rgb_f32_row`, etc.) — generic in `BE`, correct under either contract. Per-arch SIMD kernels and `widen_f16_be_to_host_f32` left alone — different layer. Verification (aarch64-apple-darwin): cargo test --target aarch64-apple-darwin --lib # 2246 ok Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…oded regressions Removes the host-native-contract tests that PR #83 + PR #84 added — they were typed against an incorrect host-native-f32/f16 reading of the Frame contract and would mask the BE-host corruption introduced by the now- reverted `::<HOST_NATIVE_BE>` sinker routings: - rgbf32_kernel_host_native_be_matches_false_on_le_host (deleted) - rgbf32_sinker_host_native_contract_lossless_passthrough (deleted) - rgbf16_kernel_host_native_be_matches_false_on_le_host (deleted) - rgbf16_sinker_host_native_contract_lossless_passthrough (deleted) - gbrpf32_kernel_host_native_be_matches_false_on_le_host (deleted) - gbrpf32_sinker_host_native_contract_lossless_passthrough (deleted) - gbrpf16_kernel_host_native_be_matches_false_on_le_host (deleted) - gbrpf16_sinker_host_native_contract_lossless_passthrough (deleted) - gbrapf32_sinker_host_native_contract_lossless_passthrough_with_alpha (deleted) - gbrapf16_sinker_host_native_contract_lossless_passthrough_with_alpha (deleted) Adds LE-encoded byte contract regressions following PR #85's `52f8191` pattern (`grayf32_sinker_le_encoded_frame_decodes_correctly`): - rgbf32_sinker_le_encoded_frame_decodes_correctly (new) - rgbf16_sinker_le_encoded_frame_decodes_correctly (new) - gbrpf32_sinker_le_encoded_frame_decodes_correctly (new) - gbrapf32_sinker_le_encoded_frame_decodes_correctly (new) - gbrpf16_sinker_le_encoded_frame_decodes_correctly (new) - gbrapf16_sinker_le_encoded_frame_decodes_correctly (new) Each test constructs an explicitly LE-encoded plane via `f32::from_bits(intended.to_bits().to_le())` (or the f16 analogue), builds the Frame, runs the sinker's lossless pass-through, and asserts the output equals the host-native intended values. Vacuous on LE hosts (where `to_le` is identity) but on a BE host any regression that drops the `::<false>` kernel routing would fail fast. Also picks up two trivial pre-existing trailing-blank-line cleanups in `src/row/arch/{neon,x86_sse41}/endian.rs` produced by `cargo fmt` while running it across the modified files. Verification (aarch64-apple-darwin, x86_64-apple-darwin, wasm32): cargo test --target aarch64-apple-darwin --lib # 2242 ok cargo build --target x86_64-apple-darwin --tests # ok RUSTFLAGS="-C target-feature=+simd128" \ cargo build --target wasm32-unknown-unknown --tests # ok cargo build --no-default-features # ok cargo fmt --check # ok cargo clippy --all-targets --all-features -- -D warnings # ok Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…rmalize-first canonical helper Closes the BE-host correctness gap flagged in the original body of #92: the sinker-local `widen_f16_to_f32` in `src/sinker/mixed/planar_gbr_f16.rs` called `half::f16::to_f32` directly on f16 plane bits, treating the bytes as host-native. That assumption only holds on a little-endian host with an LE-encoded source — on a big-endian host the LE-encoded f16 bits would be byte-swapped from the intended values, and `to_f32` would decode them as arbitrarily wrong magnitudes / signs / NaNs. Replaces every call site with the canonical bit-normalize-first helper `crate::row::scalar::planar_gbr_f16::widen_f16_be_to_host_f32::<false>` (BE = false because Frame plane bytes are LE-encoded per the contract documented in Phase 1 of this PR). The shared helper does `f32 = f16::from_bits(u16::from_le(plane_bits)).to_f32()` — correct on both LE hosts (where `from_le` is identity) and BE hosts (where it byte- swaps the LE-encoded plane bits back to host-native f16 bits before the f16 → f32 conversion). The downstream `gbrpf32_to_*` row kernels are still invoked with `BE = false` since the widened scratch is now host-native f32 (its `u32::from_le` is then a no-op on LE host and the right byte-swap on BE host — same contract as the dispatch f16-widen fallback and the per- backend SIMD scalar tails already use). Updated 7 call sites: - 3 in the `Gbrpf16` `process` widen loop (G, B, R planes). - 4 in the `Gbrapf16` `process` widen loop (G, B, R, A planes). - 1 in the `widen_and_scatter_f16_alpha_to_u8` u8 RGBA fast-path helper (counted as the seventh distinct call site; the other six are the two widen-loop blocks). Removed the local `widen_f16_to_f32` helper (and its now-stale LE-host-only-caveat doc-comment) entirely — no remaining callers. Adds two LE-encoded byte-contract regression tests covering the widening path (the existing PR #92 tests covered only the lossless f16 pass-through path which doesn't go through the widen helper): - gbrpf16_sinker_widen_path_le_encoded_frame_decodes_correctly (Gbrpf16 → with_rgb_f32; exercises the 3-plane widen loop) - gbrapf16_sinker_widen_path_le_encoded_frame_decodes_correctly (Gbrapf16 → with_rgba_f32; exercises the 4-plane widen loop incl. α) Each test constructs LE-encoded `&[half::f16]` planes via `half::f16::from_bits(intended.to_bits().to_le())`, builds a Frame, runs the widening sinker path, and asserts the f32 output equals the host- native expected values. Vacuous on LE hosts (where `to_le` is identity) but on a BE host any regression that drops the bit-normalize-first step in `widen_f16_be_to_host_f32::<false>` would fail fast. Verification (aarch64-apple-darwin, x86_64-apple-darwin, wasm32): cargo test --target aarch64-apple-darwin --lib # 2269 ok cargo build --target x86_64-apple-darwin --tests # 0 warnings RUSTFLAGS="-C target-feature=+simd128" \ cargo build --target wasm32-unknown-unknown --tests # ok cargo build --no-default-features # ok cargo fmt --check # ok cargo clippy --all-targets --all-features -- -D warnings # ok Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2cbc60c to
78f1c2f
Compare
…are + frame docs cite LE names Addresses three follow-up findings from Codex's review of PR #92. Finding 1 — Post-widen f32 routing on Gbrpf16 / Gbrapf16. The Phase 2 follow-up replaced the local `widen_f16_to_f32` helper with the canonical `widen_f16_be_to_host_f32::<false>`, but the immediate downstream `gbrpf32_to_*::<false>` calls still told the f32 row kernel to apply `from_le` AGAIN — on a BE host this would byte-swap the already-host-native f32 scratch, corrupting output. Introduce a module-scope `HOST_NATIVE_BE = cfg!(target_endian = "big")` constant in `src/sinker/mixed/planar_gbr_f16.rs` and route ALL post-widen `gbrpf32_to_*` / `gbrapf32_to_*` calls through `::<HOST_NATIVE_BE>` (14 sites total across both Gbrpf16 and Gbrapf16 sinker bodies). The kernel's `from_le` / `from_be` then becomes a no-op on every host since post-widen scratch already matches the host byte order. Distinct from the Phase 2 revert: direct Frame-to-row-kernel calls (`gbrpf16_to_*::<false>` for u8 / f16 outputs) still use `<false>` per the LE-encoded Frame contract — those receive raw LE-encoded `&[half::f16]` plane bytes, so the kernel must apply `from_le`. Post-widen scratch is host-native f32, so it must use `<HOST_NATIVE_BE>` to keep the swap a no-op everywhere. The module-level comment block makes this distinction explicit. Finding 2 — Ya16 RGB+RGBA combined-path alpha not endian-aware. When `MixedSinker<Ya16>` attached BOTH `with_rgb` and `with_rgba`, the sinker ran `ya16_to_rgb_*::<false>` (correct), expanded the RGB output to RGBA prefix, and called `copy_alpha_ya_u16` / `copy_alpha_ya_u16_to_u8` to overlay the alpha plane from the source — but those helpers read raw `packed[n*2+1]` directly without `u16::from_le`. On a BE host processing the LE-encoded `Ya16Frame`, the host-native u16 read would interpret LE-encoded bytes as host-native (BE), yielding the wrong alpha byte. Audit found four alpha-patch helpers with the same pattern across three formats — Ya16 (`copy_alpha_ya_u16` / `_to_u8`), AYUV64 (`copy_alpha_packed_u16x4_at_0` / `_to_u8_at_0`), and Rgba64 / Bgra64 (`copy_alpha_packed_u16x4_at_3` / `_to_u8_at_3`). All six were made endian-aware via a `<const BE: bool>` parameter applying `u16::from_le(packed[..])` (or `from_be` per BE) before scaling / storing, mirroring the pre-existing `copy_alpha_plane_u16` / `copy_alpha_plane_u16_to_u8` pattern from PR #81. Updated all 8 sinker call sites to pass `<false>` (Frame contract is LE). The two AYUV64 helpers also have SIMD dispatchers — those gained a `<const BE: bool>` generic and the `safe_for_simd = !BE && cfg!(target_endian = "little")` gate that already exists for `copy_alpha_plane_u16_to_u8`, falling back to scalar in the BE-host or BE-data quadrants. The five SIMD scalar tails were pinned to `::<false>` matching the precedent from PR #81 / PR #82. Finding 3 — Public Frame docs cite wrong FFmpeg names. `Rgbf16Frame` docs cited `AV_PIX_FMT_RGBF16` (target-endian alias — `RGBF16LE` on LE, `RGBF16BE` on BE). `Rgbf32Frame` docs cited `AV_PIX_FMT_RGBF32` which doesn't exist as an unsuffixed alias; FFmpeg has only the alpha-bearing `RGBAF32LE` / `RGBAF32BE` pair. A BE caller following the unsuffixed contract could pass native-BE bytes into an LE-decoder. Updated `src/frame/packed_rgb_f16.rs` to cite `AV_PIX_FMT_RGBF16LE` explicitly, with a parenthetical noting that the unsuffixed name is a target-endian alias to avoid future confusion. Updated `src/frame/packed_rgb_float.rs` to drop the citation of the non-existent `AV_PIX_FMT_RGBF32` constant entirely and instead document the contract as "LE-encoded f32 bytes — `colconv` pins to the canonical FFmpeg `*LE` byte-order convention regardless of host endianness, exactly as the existing `AV_PIX_FMT_RGBAF32LE` flavour". Tests added: - `gbrpf16_sinker_widen_path_u16_and_u8_le_encoded_frame_decodes_correctly` exercises the Gbrpf16 widen → f32 → u16 chain (with `with_luma_u16` alongside) under the LE-encoded Frame contract — vacuous on LE, catches the double-swap on BE. - `ya16_combined_rgb_and_rgba_alpha_matches_standalone_le_encoded` and the u16 sibling assert that Strategy A+ (combined `with_rgb` + `with_rgba`) produces α bytes byte-identical to the standalone `with_rgba` path on every host. Builds the `&[u16]` plane via `u16::from_ne_bytes(x.to_le_bytes())` so the in-memory bytes spell the LE-encoded contract on both LE and BE hosts. - 8 new BE-parity tests in `src/row/scalar/alpha_extract.rs` for the newly endian-aware AYUV64 / Rgba64 / Ya16 helpers (4 helpers × u8 / u16 variants), all using the `swap_bytes`-fixture pattern that exists for `copy_alpha_plane_u16` so the BE arm runs on every host without needing a real BE runner. Verified: - `cargo test --target aarch64-apple-darwin --lib` — 2278 passed - `cargo build --target x86_64-apple-darwin --tests` — clean - `RUSTFLAGS="-C target-feature=+simd128" cargo build --target wasm32-unknown-unknown --tests` — clean - `cargo build --no-default-features` — clean - `cargo fmt --check` — clean - `cargo clippy --all-targets --all-features -- -D warnings` — clean - `cargo check --target s390x-unknown-linux-gnu --lib` — clean (BE-host smoke) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…direct-Frame vs post-widen paths Codex 3rd-pass review of #92 found that the f32 alpha-patch helpers `copy_alpha_plane_f32_to_u8`, `copy_alpha_plane_f32_to_u16`, and `copy_alpha_plane_f32` read each f32 α sample as host-native, which silently corrupted the α slot on BE hosts (e.g., s390x) processing the LE-encoded `Gbrapf32Frame` α plane. The byte-swapped f32 bits clamp to near-zero or near-one, so BE hosts emitted α = 0 or 255 regardless of the intended value. Same bug class as the u16 alpha-patch helpers fixed in cf26058, but for the f32 path; the standalone `with_rgba_f32` test didn't cover this because Strategy A+ uses a different code path (`expand_rgb_to_rgba_row` + alpha-patch). Helpers updated (4): * `copy_alpha_plane_f32_to_u8` — added `<const BE: bool>`, normalises each f32's bits via `f32::from_bits(u32::from_le/be(bits))` before clamp / scale / round-half-up. * `copy_alpha_plane_f32_to_u16` — same pattern. * `copy_alpha_plane_f32` — same pattern (lossless pass-through). * `copy_alpha_plane_f16` (planar_gbr_f16.rs) — added `<const BE: bool>` for consistency. Currently only test-called; future-proofs against a Strategy A+ wiring that consumes it. Sinker call sites updated, dual routing pattern (4 sites total): * `src/sinker/mixed/planar_gbr_float.rs` — Gbrapf32 Strategy A+ u8 RGBA path: `<false>` (direct-Frame, LE-encoded per Phase-1 contract). * `src/sinker/mixed/planar_gbr_f16.rs` — `widen_and_scatter_f16_alpha_to_u8` helper: `<HOST_NATIVE_BE>` (post-widen, scratch is host-native f32 after `widen_f16_be_to_host_f32::<false>`). Audit summary: only one direct-Frame call site (Gbrapf32 u8 RGBA) and one post-widen call site (Gbrapf16 u8 RGBA) currently invoke the f32 alpha-patch helpers. The Gbrapf32 RGBA u16 / RGBA f32 / RGBA f16 paths all use direct kernels (`gbrapf32_to_rgba_u16_row` / `gbrapf32_to_rgba_f32_row` / `gbrapf32_to_rgba_f16_row`) and were already endian-aware via their `<false>` parameter; same for the Gbrapf16 widened RGBA u16 / f32 paths. Tests added (7): * `copy_alpha_plane_f32_to_u8_be_parity_with_swapped_buffer` * `copy_alpha_plane_f32_to_u16_be_parity_with_swapped_buffer` * `copy_alpha_plane_f32_be_parity_with_swapped_buffer` * `copy_alpha_plane_f16_be_parity_with_swapped_buffer` * `gbrapf32_strategy_a_plus_le_encoded_frame_alpha_decodes_correctly` (w = 15, exercises scalar tail; compares Strategy A+ combo to standalone `with_rgba`). * `gbrapf32_strategy_a_plus_le_encoded_u16_alpha_decodes_correctly` (w = 17, defense-in-depth for the u16 RGBA combo). * `gbrapf16_strategy_a_plus_post_widen_alpha_decodes_correctly` (w = 15, verifies HOST_NATIVE_BE routing on the post-widen helper). The 3 prior `copy_alpha_plane_f32*_clamps_and_scales` / `_lossless_passthrough` tests are now `#[cfg(target_endian = "little")]` (host-native f32 fixtures); BE-host scalar correctness is locked down by the dedicated `*_be_parity_with_swapped_buffer` tests. Verified: * `cargo test --target aarch64-apple-darwin --lib` (2285 pass, +7). * `cargo build --target x86_64-apple-darwin --tests` (0 warnings). * `RUSTFLAGS="-C target-feature=+simd128" cargo build --target wasm32-unknown-unknown --tests`. * `cargo build --no-default-features`. * `cargo fmt --check`. * `cargo clippy --all-targets --all-features -- -D warnings`. * `cargo check --target s390x-unknown-linux-gnu --lib` (BE-host smoke). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ercises them Codex 4th-pass review of #92 found the BE-sensitive sinker regressions skip on miri targets — but BE CI runs via miri (powerpc64 / s390x), so the tests effectively did not validate the BE path. The `cfg_attr(miri, ignore = ...)` gate was added defensively, but it skipped exactly the host where BE corruption would surface. Audit summary: every BE-sensitive regression added in this PR calls the high-level `MixedSinker` builder + `gbrpf32_to` / `gbrapf32_to` / `gbrpf16_to` / `gbrapf16_to` / `rgbf32_to` / `rgbf16_to`. The dispatch layer honours the sinker's `simd` flag — passing `with_simd(false)` bypasses every SIMD branch and calls the scalar kernels directly. The post-widen helpers (`widen_f16_be_to_host_f32`, `expand_rgb_to_rgba_row`, the `copy_alpha_plane_f32*` family) are pure scalar. None of these tests construct or invoke arch-specific SIMD intrinsics directly. Tests un-gated (12): * `src/sinker/mixed/tests/packed_rgb_float.rs` - `rgbf32_sinker_le_encoded_frame_decodes_correctly` * `src/sinker/mixed/tests/packed_rgb_f16.rs` - `rgbf16_sinker_le_encoded_frame_decodes_correctly` * `src/sinker/mixed/tests/planar_gbr_float.rs` - `gbrpf32_sinker_le_encoded_frame_decodes_correctly` - `gbrapf32_sinker_le_encoded_frame_decodes_correctly` - `gbrpf16_sinker_le_encoded_frame_decodes_correctly` - `gbrapf16_sinker_le_encoded_frame_decodes_correctly` - `gbrpf16_sinker_widen_path_le_encoded_frame_decodes_correctly` - `gbrapf16_sinker_widen_path_le_encoded_frame_decodes_correctly` - `gbrpf16_sinker_widen_path_u16_and_u8_le_encoded_frame_decodes_correctly` - `gbrapf32_strategy_a_plus_le_encoded_frame_alpha_decodes_correctly` - `gbrapf32_strategy_a_plus_le_encoded_u16_alpha_decodes_correctly` - `gbrapf16_strategy_a_plus_post_widen_alpha_decodes_correctly` Each test now starts the builder chain with `.with_simd(false)`, forcing scalar dispatch on every chained `with_rgb*` / `with_rgba*` / `with_luma*` sink. SIMD-using tests in `planar_gbr_high_bit.rs` (8 pre-existing miri ignores, untouched in this PR) retain the gate; those test SIMD-dispatched kernels with no scalar bypass and genuinely cannot run under miri. Local miri verification (where the f16 setup path's portable software fallback applies — i.e. all f32 tests, plus those f16 tests whose fixtures do not call `half::f16::from_f32` on aarch64+fp16): * `cargo miri test --lib gbrpf32_sinker_le_encoded` — 1 passed * `cargo miri test --lib gbrapf32_sinker_le_encoded` — 1 passed * `cargo miri test --lib gbrapf32_strategy_a_plus_le_encoded_frame` — 1 passed * `cargo miri test --lib gbrapf32_strategy_a_plus_le_encoded_u16` — 1 passed * `cargo miri test --lib rgbf32_sinker_le_encoded` — 1 passed The f16 regression tests build their fixtures via `half::f16::from_f32`, which on aarch64-apple-darwin (miri's local triple) compiles to an inline asm path keyed on `target_feature = "fp16"` — unsupported by miri locally. On the BE CI hosts (s390x / powerpc64), `half-2.7.1` falls through to its portable software fallback (no asm, no SIMD), so all f16 tests will execute correctly under `cargo miri test` on those targets — which is the entire point of removing the `miri,ignore` gate. Verified: * `cargo test --target aarch64-apple-darwin --lib` (2285 pass, 0 fail). * `cargo build --target x86_64-apple-darwin --tests` (0 warnings). * `RUSTFLAGS="-C target-feature=+simd128" cargo build --target wasm32-unknown-unknown --tests`. * `cargo build --no-default-features`. * `cargo fmt --check`. * `cargo clippy --all-targets --all-features -- -D warnings`. * `cargo check --target s390x-unknown-linux-gnu --lib` (BE-host smoke). * `cargo miri test --lib gbrpf32_sinker_le_encoded` (and the four sibling f32 tests). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Codex 5th-pass review of PR #92 caught: the prior doc said FFmpeg "does not define AV_PIX_FMT_RGBF32" — incorrect. FFmpeg actually defines AV_PIX_FMT_RGBF32BE, AV_PIX_FMT_RGBF32LE, and an unsuffixed alias AV_PIX_FMT_RGBF32 that is target-endian (resolves to LE on LE hosts and BE on BE hosts). Corrected: Rgbf32Frame maps to AV_PIX_FMT_RGBF32LE explicitly, with a warning that BE-host callers holding target-endian RGBF32 bytes must convert to LE before constructing the frame (otherwise the LE-decode contract would re-interpret BE bytes as LE → byte-swapped float data). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR clarifies and enforces the “LE-encoded bytes reinterpreted as f32/f16” contract for float RGB/GBR frame types (matching FFmpeg *LE pixel formats), and reverts sinker routing choices that would corrupt outputs on big-endian hosts. It also makes several Strategy A+/alpha-extract helpers explicitly endian-aware and adds regressions/parity tests to lock down the corrected behavior.
Changes:
- Document the LE-encoded byte contract for
Rgbf32/Rgbf16and planarGbr*float frame families, including FFmpeg buffer/linesize guidance. - Re-route affected sinkers to dispatch row kernels with
BE = falsefor direct-frame reads (LE decoding), and keep post-widen paths host-native where appropriate. - Make scalar/dispatch alpha-extract helpers endian-aware (const-generic
BE) with SIMD fallbacks guarded, and add LE-contract regressions + BE-parity tests.
Reviewed changes
Copilot reviewed 21 out of 21 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| src/frame/packed_rgb_float.rs | Documents Rgbf32Frame as LE-encoded bytes and clarifies FFmpeg mapping/stride guidance. |
| src/frame/packed_rgb_f16.rs | Documents Rgbf16Frame LE-encoded contract and FFmpeg *LE/target-endian alias behavior. |
| src/frame/planar_gbr_float.rs | Adds explicit LE-encoded endian contract sections for planar GBR float frames. |
| src/sinker/mixed/packed_rgb_float.rs | Reverts Rgbf32 sinker dispatch routing to ::<false> (LE decode). |
| src/sinker/mixed/packed_rgb_f16.rs | Reverts Rgbf16 sinker dispatch routing to ::<false> (LE decode). |
| src/sinker/mixed/planar_gbr_float.rs | Reverts planar GBR f32 sinker routing to ::<false> and wires endian-aware alpha patching. |
| src/sinker/mixed/planar_gbr_f16.rs | Uses bit-normalizing widen helper; keeps post-widen routing host-native; updates alpha scatter routing. |
| src/sinker/mixed/packed_rgb_16bit.rs | Calls endian-aware scalar alpha extract helpers with BE = false for LE-encoded frame contract. |
| src/sinker/mixed/ayuv64.rs | Routes AYUV64 alpha extract through dispatcher with BE = false. |
| src/sinker/mixed/gray.rs | Makes Ya16 alpha patch helpers endian-aware and adds combined-vs-standalone parity tests. |
| src/sinker/mixed/tests/packed_rgb_float.rs | Replaces prior host-native routing tests with LE-encoded contract regression test (scalar/miri-safe). |
| src/sinker/mixed/tests/packed_rgb_f16.rs | Same as above for Rgbf16 (scalar/miri-safe). |
| src/sinker/mixed/tests/planar_gbr_float.rs | Replaces prior host-native routing tests with LE-encoded contract regressions and Strategy A+ alpha regressions (scalar/miri-safe). |
| src/row/scalar/alpha_extract.rs | Makes multiple alpha extract helpers const-generic over BE, adds BE-parity tests, and updates docs. |
| src/row/scalar/planar_gbr_f16.rs | Makes copy_alpha_plane_f16 endian-aware (const BE) and adds BE-parity test. |
| src/row/dispatch/alpha_extract.rs | Adds const BE to AYUV64 dispatchers and forces scalar when SIMD isn’t endian-safe. |
| src/row/arch/neon/alpha_extract.rs | Updates scalar tail calls to new const-generic scalar helper signatures. |
| src/row/arch/wasm_simd128/alpha_extract.rs | Same: scalar tail calls updated for const-generic scalar helpers. |
| src/row/arch/x86_avx2/alpha_extract.rs | Same: scalar tail calls updated for const-generic scalar helpers. |
| src/row/arch/x86_avx512/alpha_extract.rs | Same: scalar tail calls updated for const-generic scalar helpers. |
| src/row/arch/x86_sse41/alpha_extract.rs | Same: scalar tail calls updated for const-generic scalar helpers. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+2459
to
+2465
| { | ||
| let mut sink = MixedSinker::<crate::yuv::Ya16>::new(w as usize, h as usize) | ||
| .with_rgb(&mut rgb_combined) | ||
| .unwrap() | ||
| .with_rgba(&mut rgba_combined) | ||
| .unwrap(); | ||
| ya16_to(&frame, FR, M, &mut sink).unwrap(); |
Comment on lines
+2511
to
+2517
| { | ||
| let mut sink = MixedSinker::<crate::yuv::Ya16>::new(w as usize, h as usize) | ||
| .with_rgb_u16(&mut rgb_combined) | ||
| .unwrap() | ||
| .with_rgba_u16(&mut rgba_combined) | ||
| .unwrap(); | ||
| ya16_to(&frame, FR, M, &mut sink).unwrap(); |
… miri PR #92's CI miri jobs (aarch64-apple-darwin) failed with `unsupported operation: can't call foreign function llvm.aarch64.neon.ld2.v8i16.p0`. Stack pointed at the `ya16_combined_rgb_and_rgba_alpha_matches_standalone_le_encoded` test calling `MixedSinker::<Ya16>::process` → `dispatch::ya16::ya16_to_rgb_row` → `arch::neon::gray::ya16_to_rgb_row::<false>` → `vld2q_u16` (NEON intrinsic Miri does not implement). Both `ya16_combined_*` tests added in `cf26058` (this PR) use width=8 — the exact threshold where the NEON kernel kicks in (`while x + 8 <= width`) — so on aarch64 hosts the SIMD body executes. Other Ya16 sinker tests in this module use width 1 / 2 which never reaches the `vld2q_u16` block. These two tests are BE-contract regressions: they verify combined (`with_rgb` + `with_rgba`) and standalone (`with_rgba`) paths agree on the LE-encoded `Ya16Frame`, locking down the endian-aware `copy_alpha_ya_u16_to_u8::<false>` and `copy_alpha_ya_u16::<false>` helpers. They must execute on BE-host miri (s390x / powerpc64) — gating them with `cfg_attr(miri, ignore)` would skip exactly the host where BE corruption surfaces. So they get the same Option A treatment 75cf865 applied to the planar / packed-rgb BE-contract regressions: prepend `.with_simd(false)` to every `MixedSinker::new(..)` builder chain so the sinker dispatch routes to the scalar kernel — no SIMD intrinsic, miri clean, BE validation preserved. Audit summary (other tests in `sinker::mixed::gray::tests`): * All `gray*` / `gray*_walker_smoke_test` / `gray*_limited_range_*` / `grayf32_with_*` integration tests use width <= 4 — far below the NEON SIMD threshold (8 lanes) — so they never invoke `vld2q_u16`. * `gray*_walker_smoke_test` and similar use the gray-family kernels (`gray9` / `gray12` / `gray14`) which dispatch through different row kernels with different SIMD-eligibility thresholds; none hit `vld2q_u16` at width=4. * `grayf32_sinker_le_encoded_frame_decodes_correctly` retains its existing `cfg_attr(miri, ignore = "...")` (added in `cf26058`) — left untouched in this commit; if a follow-up wants to align it with the 75cf865 un-gating pattern it can be done independently. * `ya8_*` tests use the Ya8 kernels (no `vld2q_u16` — Ya8 deinterleave uses `vld2_u8` which Miri *does* support) and existing width=128/130 smoke is already `cfg_attr(miri, ignore)`. * `ya16_with_*` tests (rgba_u16 / rgba_u8 / rgb_and_rgba / hsv) all use width 1 / 2 — below the SIMD threshold — so they execute the scalar tail unconditionally on every host. * `ya16_width_128_and_130_smoke` is already `cfg_attr(miri, ignore)` (Option B) and stays so — it explicitly validates the SIMD-dispatched smoke path. Tests fixed (2 — both via Option A): * `ya16_combined_rgb_and_rgba_alpha_matches_standalone_le_encoded` * `ya16_combined_rgb_u16_and_rgba_u16_alpha_matches_standalone_le_encoded` Each gains `.with_simd(false)` on both the combined sinker (with_rgb + with_rgba builder chain) and the standalone sinker (with_rgba only), matching the pattern in `tests/packed_rgb_float.rs::rgbf32_sinker_le_encoded_frame_decodes_correctly`. Verified: * `cargo test --target aarch64-apple-darwin --lib` (2285 pass, 0 fail). * `cargo build --target x86_64-apple-darwin --tests` (0 warnings). * `cargo build --no-default-features`. * `cargo fmt --check`. * `cargo clippy --all-targets --all-features -- -D warnings`. * `cargo miri test --lib sinker::mixed::gray::tests::ya16_combined` (2 pass — confirms the fix on aarch64-apple-darwin miri, the exact CI host that previously failed). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copilot review on PR #92 caught: the comment at planar_gbr_f16.rs:359 said the downstream gbrpf32_to_* kernel was invoked with `BE = false`, but cf26058 changed the routing to `BE = HOST_NATIVE_BE` (correct for host-native f32 scratch produced by the bit-normalising widen helper). Updated comment to reflect actual routing + reinforced the distinction between post-widen scratch (host-native → HOST_NATIVE_BE) and direct- Frame paths (LE-encoded → ::<false>) so future readers don't conflate the two contracts. Copilot findings 2 + 3 (Ya16 tests at gray.rs:2470,2528 missing .with_simd(false)) were already addressed in fbb49a1 — Copilot reviewed pre-fix state. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
uqio
added a commit
that referenced
this pull request
May 8, 2026
PR #92's 75cf865 un-gated the LE-encoded byte-contract regression tests for `Rgbf16` / `Gbrpf16` / `Gbrapf16` (and the f16 widen-path / Strategy A+ alpha variants) on the assumption that miri would see a software fallback for `half::f16::from_f32`. That assumption is wrong: `half-2.7.1` enables `target_feature = "fp16"` on aarch64 (and F16C on x86 / x86_64), and the compiled path is inline `asm!` (`fcvt`). Miri rejects inline asm with `unsupported operation: inline assembly is not supported`, breaking CI on aarch64-apple-darwin and aarch64-unknown-linux-gnu (and any other miri target where the f16 hardware feature is detected — including x86_64 with F16C). Fix: re-add `#[cfg_attr(miri, ignore = "half::f16::from_f32 uses inline asm (fcvt) unsupported by Miri")]` to every test that builds an f16 fixture via `half::f16::from_f32`. Trade-off: BE-host miri (s390x / powerpc64) now skips these particular f16 contract tests as well, but the corresponding f32 LE-encoded regressions (`gbrpf32_sinker_le_encoded_frame_decodes_correctly`, `gbrapf32_sinker_le_encoded_frame_decodes_correctly`, `gbrapf32_strategy_a_plus_le_encoded_frame_alpha_decodes_correctly`, `gbrapf32_strategy_a_plus_le_encoded_u16_alpha_decodes_correctly`) remain un-gated and exercise the same byte-swap routing logic on BE miri hosts. Tests re-gated: - src/sinker/mixed/tests/packed_rgb_f16.rs (1): - rgbf16_sinker_le_encoded_frame_decodes_correctly - src/sinker/mixed/tests/planar_gbr_float.rs (6): - gbrpf16_sinker_le_encoded_frame_decodes_correctly - gbrapf16_sinker_le_encoded_frame_decodes_correctly - gbrpf16_sinker_widen_path_le_encoded_frame_decodes_correctly - gbrapf16_sinker_widen_path_le_encoded_frame_decodes_correctly - gbrpf16_sinker_widen_path_u16_and_u8_le_encoded_frame_decodes_correctly - gbrapf16_strategy_a_plus_post_widen_alpha_decodes_correctly Bits-only constructions (`half::f16::from_bits(_)`) are left ungated: they don't involve inline asm and remain miri-safe. Verified: - cargo build --target aarch64-apple-darwin --tests (clean) - cargo build --target x86_64-apple-darwin --tests (clean) - RUSTFLAGS="-C target-feature=+simd128" cargo build \ --target wasm32-unknown-unknown --tests (clean) - cargo build --no-default-features (clean) - cargo fmt --check (clean) - cargo clippy --all-targets --all-features -- -D warnings (clean) - cargo test --target aarch64-apple-darwin --lib \ sinker::mixed::tests::packed_rgb_f16 (9 passed) - cargo test --target aarch64-apple-darwin --lib \ sinker::mixed::tests::planar_gbr_float (26 passed) - cargo miri test --lib sinker::mixed::tests::packed_rgb_f16 \ (1 ignored as expected, no inline-asm error) - cargo miri test --lib sinker::mixed::tests::planar_gbr_float \ (6 f16 tests ignored, 4 f32 LE-encoded regressions still passing) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
uqio
added a commit
that referenced
this pull request
May 8, 2026
…compat wrappers Codex 2nd-pass review of PR #87 surfaced two [high] findings: 1. Frame docs contradicted the LE-encoded plane contract. `src/frame/packed_rgb_16bit.rs` told big-endian callers to pre-normalize each `u16` via `u16::from_le` before constructing Rgb48/Bgr48/Rgba64/ Bgra64 frames. After PR #92 (`5b42065` / `3b1d716`) the documented contract is the opposite: the plane is the **LE-encoded byte layout** reinterpreted as `&[u16]` (matching FFmpeg's `*LE` suffix), and the downstream row kernel applies `u16::from_le` itself (no-op on LE host, byte-swap on BE host). A BE caller following the old docs would pre-swap → kernel swaps again → double swap → corrupted output on every BE host. Doc-comments on `Rgb48Frame` / `Bgr48Frame` / `Rgba64Frame` / `Bgra64Frame` now state the LE-encoded byte contract explicitly, mirroring the wording PR #92 used for `Rgbf32Frame` / `Gbrpf32Frame` / `Gbrapf32Frame` etc. The module-level header is updated to match. Adds an Rgb48 sinker BE-contract regression test (`rgb48_sinker_le_encoded_frame_decodes_correctly`) following the `rgbf32_sinker_le_encoded_frame_decodes_correctly` pattern from PR #92: builds the plane from LE-encoded `u16` patterns (`intended.to_le()`), forces `with_simd(false)` so it runs purely scalar, and asserts the `with_rgb_u16` identity-passthrough output bit-equals `intended`. On a BE host with a regressed pre-swap this would byte-swap every sample. The test runs under miri, which is exactly where BE CI surfaces. 2. Public row APIs broke backwards compatibility. PR #87 added `<const BE: bool>` to 28 public functions in `src/row/dispatch/packed_rgb_16bit.rs` (Rgb48/Bgr48/Rgba64/Bgra64 × {rgb, rgba, rgb_u16, rgba_u16, luma, luma_u16, hsv}) and 6 public functions in `src/row/dispatch/rgb_ops.rs` (X2Rgb10/X2Bgr10 × {rgb, rgba, rgb_u16}). All 34 functions are re-exported from `crate::row::*` (16-bit ones are explicit, x2 ones via `rgb_ops::*` glob). Existing downstream LE-only callers cannot compile against the new const-generic signature without adding `::<false>` turbofish to every call site — a hard breaking change. Each function is renamed to `foo_endian<const BE: bool>` and a thin non-generic LE-only wrapper `foo` is added that forwards to `foo_endian::<false>`. Existing pre-Tier 8 call sites compile unchanged; sinker code is updated to call the explicit `_endian::<BE>` form so endian intent is visible at every internal callsite. Audit / fan-out: - 28 packed-16-bit dispatchers renamed + 28 LE wrappers added - 6 X2Rgb10/X2Bgr10 dispatchers renamed + 6 LE wrappers added - `src/row/mod.rs` re-exports both `foo` and `foo_endian` for the 16 pub `→{rgb,rgba,rgb_u16,rgba_u16}` entries and pub(crate) for the 12 `→{luma,luma_u16,hsv}` entries - 12 internal turbofish call sites updated to `_endian::<…>` across `src/sinker/mixed/packed_rgb_16bit.rs`, `src/sinker/mixed/packed_rgb_10bit.rs`, and the in-file dispatcher unit tests in `src/row/dispatch/packed_rgb_16bit.rs` The wrappers carry `#[allow(clippy::too_many_arguments)]` only when the underlying `_endian` form already does (luma/luma_u16/hsv variants); this matches the pre-existing pattern on the BE-aware definitions and is not a new suppression. Verified: cargo test --target aarch64-apple-darwin --lib cargo build --target x86_64-apple-darwin --tests # 0 warnings RUSTFLAGS="-C target-feature=+simd128" cargo build --target wasm32-unknown-unknown --tests cargo build --no-default-features cargo fmt --check cargo clippy --all-targets --all-features -- -D warnings cargo check --target s390x-unknown-linux-gnu --lib # BE-host smoke Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Codex 3rd-pass review of PR #85 caught a contract conflict that affects
six other merged PRs. PR #85's
52f8191reverted the broken Grayf32sinker routing; this PR completes the audit for the remaining six float
frame types affected by PR #83 (
dcf40a3) and PR #84 (8627280):Rgbf32, Rgbf16, Gbrpf32, Gbrapf32, Gbrpf16, Gbrapf16.
Root cause
The
Grayf32Frame/Gbrpf32Frame/Gbrpf16Frame/Gbrapf32Frame/Gbrapf16Framedoc-comments already named the FFmpeg*LEpixelformats (
AV_PIX_FMT_GBRPF32LE, etc.) — the byte layout is LE-encodedbytes reinterpreted as
f32/half::f16, NOT host-native floats.The kernel
BE = falsetemplate parameter appliesu32::from_le/u16::from_le, which is:from_leno-op → correctfrom_leswaps → restores host-native f32 → correctThe
HOST_NATIVE_BEconsts introduced in PR #83 + PR #84 made thekernel a no-op byte-swap on either host, which corrupts every output
path on a BE host.
Phase breakdown
Phase 1 — Frame docs (commit
5b42065).Add explicit
# Endian contract — LE-encoded bytessections toRgbf32Frame,Rgbf16Frame,Gbrpf32Frame,Gbrapf32Frame,Gbrpf16Frame,Gbrapf16Frame, mirroring the existingGrayf32Framepattern. Add
bytemuck::cast_slice+ linesize-division-by-element-sizeguidance for FFmpeg-buffer callers.
Phase 2 — Sinker revert (commit
84bbbb7).Revert
::<HOST_NATIVE_BE>→::<false>across:src/sinker/mixed/packed_rgb_float.rs(Rgbf32)src/sinker/mixed/packed_rgb_f16.rs(Rgbf16)src/sinker/mixed/planar_gbr_float.rs(Gbrpf32 + Gbrapf32)src/sinker/mixed/planar_gbr_f16.rs(Gbrpf16 + Gbrapf16)Removes the four
const HOST_NATIVE_BEdefinitions and their explanatorydoc-comments. Rewrites the
widen_f16_to_f32doc-comment to accuratelydescribe the LE-encoded byte contract.
Untouched per the audit:
BE == HOST_NATIVE_BEcopy_from_slicefast paths —generic in
BE, correct under either contract.row/arch/*).widen_f16_be_to_host_f32scalar helper — itsBE→ host-native pattern is correct because it produces decoded f32from LE-encoded f16 bits before the downstream f32 kernel sees them.
Phase 3 — Test cleanup + LE-encoded contract regressions (commit
2cbc60c).Removed 10 wrong-contract tests that were typed against the host-native
reading of the Frame contract:
rgbf32_kernel_host_native_be_matches_false_on_le_hostrgbf32_sinker_host_native_contract_lossless_passthroughrgbf16_kernel_host_native_be_matches_false_on_le_hostrgbf16_sinker_host_native_contract_lossless_passthroughgbrpf32_kernel_host_native_be_matches_false_on_le_hostgbrpf32_sinker_host_native_contract_lossless_passthroughgbrpf16_kernel_host_native_be_matches_false_on_le_hostgbrpf16_sinker_host_native_contract_lossless_passthroughgbrapf32_sinker_host_native_contract_lossless_passthrough_with_alphagbrapf16_sinker_host_native_contract_lossless_passthrough_with_alphaAdded 6 LE-encoded contract regressions following PR #85's
52f8191pattern (
grayf32_sinker_le_encoded_frame_decodes_correctly):rgbf32_sinker_le_encoded_frame_decodes_correctlyrgbf16_sinker_le_encoded_frame_decodes_correctlygbrpf32_sinker_le_encoded_frame_decodes_correctlygbrapf32_sinker_le_encoded_frame_decodes_correctlygbrpf16_sinker_le_encoded_frame_decodes_correctlygbrapf16_sinker_le_encoded_frame_decodes_correctlyEach test builds a plane via
f32::from_bits(intended.to_bits().to_le())(or the f16 analogue), runs the sinker's lossless pass-through, and
asserts the output equals the host-native intended values. Vacuous on LE
host (every CI runner today) but fails fast on BE for any regression
that drops the
::<false>routing.Net test delta: −10 / +6 = −4 (final
cargo test --target aarch64-apple-darwin --libcount: 2242, down from 2246).Test plan
cargo test --target aarch64-apple-darwin --lib→ 2242 okcargo build --target x86_64-apple-darwin --tests→ ok (0 warnings)RUSTFLAGS="-C target-feature=+simd128" cargo build --target wasm32-unknown-unknown --tests→ okcargo build --no-default-features→ okcargo fmt --check→ okcargo clippy --all-targets --all-features -- -D warnings→ okNotes
source
Frame's per-formatBEflag for encoded-byte order(different layer from the f32/f16 host-native vs LE-encoded
contract).
widen_f16_to_f32helper insrc/sinker/mixed/planar_gbr_f16.rscallshalf::f16::to_f32directly on the LE-encoded plane. On LE host this is correct (LE bytes
ARE host-native); on BE host this would mis-interpret the bits — a
separate bug from the routing reverted here. The accurate fix is to
switch this helper to the bit-normalising
widen_f16_be_to_host_f32pattern used by the per-arch SIMD backends; left for a follow-up. The
doc-comment is updated to make the LE-host caveat explicit.
🤖 Generated with Claude Code