feat(be-tier5): BE support for V410 / XV36 / AYUV64 row kernels#86
Open
feat(be-tier5): BE support for V410 / XV36 / AYUV64 row kernels#86
Conversation
Add `<const BE: bool>` to all row kernels (scalar + 5 SIMD backends) for the three Tier-5 packed YUV 4:4:4 formats. Dispatch functions gain a `be_input: bool` parameter that branches to the correct monomorphization at runtime; sinker callers forward `false` (LE default) until the sinker layer grows its own BE flag. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Codex flagged a high-severity correctness bug in the tier 5 scalar BE
paths. The pattern `if BE { x.swap_bytes() } else { x }` is unconditional
— it always byte-swaps on BE input regardless of the host endianness.
On a BE host (e.g. s390x), the input is already in host order, so the
swap inverts it, producing garbled samples. The matching SIMD
`load_endian_*` helpers are target-endian aware (cfg-gated reverses),
so the scalar/SIMD paths disagree on BE hosts.
Fix: replace `x.swap_bytes()` with `T::from_be(x)` / `T::from_le(x)`,
which are target-endian aware (no-op when source byte order already
matches the host). All six tier 5 scalar call sites are updated:
- src/row/scalar/v410.rs (4 sites): u32 packed-pixel and luma kernels
- src/row/scalar/xv36.rs (1 site): `load_xv36_u16` helper
- src/row/scalar/ayuv64.rs (1 site): `load_ayuv64_u16` helper
Test helpers in `mod tests` blocks (`p_le.map(|v| v.swap_bytes())`,
`le_word.swap_bytes()`) are intentionally left unchanged — they
synthesize BE-encoded fixtures from LE inputs on the LE host where
CI runs, so an unconditional swap is the correct behaviour there.
Verified:
- cargo test --target aarch64-apple-darwin --lib (2159 passed, 0 failed)
- cargo build --target x86_64-apple-darwin --tests (0 warnings)
- RUSTFLAGS='-C target-feature=+simd128' cargo build --target wasm32-unknown-unknown --tests
- cargo build --no-default-features
- cargo fmt --check
- cargo clippy --all-targets --all-features -- -D warnings
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Phase 2 — Tier 5 BE rollout. Stacked on #81. Adds
<const BE: bool>to all V410, XV36, AYUV64 row kernels across all 6 backends.Note on dispatcher pattern (variant from other tiers):
Dispatcher functions take a runtime
be_input: boolparameter and branch internally to<true>vs<false>const-generic kernels. This avoids forcing the (downstream) Sinker / Frame layer to also be<const BE: bool>-generic, while preserving full monomorphization at the kernel level. Kernels themselves still use<const BE: bool>for dead-code elimination.The runtime branch is one predictable conditional per row, picked once before the per-pixel inner loop.
Implementation:
swap_bytes()for u16/u32 reads (V410 / XV36 use 32-bit packed words; AYUV64 uses 16-bit channels)<const BE: bool>and useload_endian_u32xN::<BE>/load_endian_u16xN::<BE>from feat(be-infra): endian-aware SIMD loaders across 5 backends #81 infrav410_be_and_le_dispatchers_agree,xv36_be_and_le_dispatchers_agree,ayuv64_be_and_le_dispatchers_agreeTest results: 2165 tests pass.
Stacking
Base:
feat/be-infra(#81). Will rebase ontomainonce #81 merges.Test plan
cargo test --features stdcargo build --target x86_64-apple-darwin --tests🤖 Generated with Claude Code