feat(be-tier11): BE support for Gray9-16 / Ya16 / Grayf32 row kernels#85
Open
feat(be-tier11): BE support for Gray9-16 / Ya16 / Grayf32 row kernels#85
Conversation
Add `<const BE: bool>` to all scalar, NEON, SSE4.1, AVX2, AVX512, and wasm-simd128 kernels for Gray9/10/12/14/16, Ya16, and Grayf32 formats. Dispatchers and sinker callers thread BE through; sinker hardcodes `false` pending future Frame-level BE plumbing. BE parity tests added to every SIMD backend (luma path) plus scalar. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The scalar BE branches in Gray9-16, Ya16, and Grayf32 row kernels used
unconditional `swap_bytes()` regardless of host endianness. On a BE host
(s390x), the LE branch passed input through as-is and the BE branch
swapped — both inverted from what the SIMD `load_endian_*::<BE>` helpers
do, so every Tier 11 BE row was corrupted on BE hosts.
Fix: replace `if BE { x.swap_bytes() } else { x }` with target-endian
aware conversions:
- u16 (gray9-16, ya16): `if BE { u16::from_be(x) } else { u16::from_le(x) }`
- f32 (grayf32): `if BE { f32::from_bits(u32::from_be(x.to_bits())) }
else { f32::from_bits(u32::from_le(x.to_bits())) }`
`u16::from_le` and `u32::from_le` compile to no-ops on LE hosts, so the
LE-host fast path keeps the same machine code. On BE hosts both halves
of the branch now correctly produce a host-native value, matching the
SIMD helpers.
Test helpers that synthesize BE buffers from LE input
(`v.swap_bytes()` / `v.to_bits().swap_bytes()` fixtures) are intentionally
left untouched — they encode the wire format and remain a one-direction
LE-host-side simulation.
Call sites fixed:
- u16 (gray.rs + ya16.rs): 23 production sites
- f32 (grayf32.rs): 9 production sites
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Phase 2 — Tier 11 BE rollout. Stacked on #81. Adds
<const BE: bool>to all Gray9-16, Ya16, and Grayf32 row kernels across all 6 backends + dispatcher.Implementation:
swap_bytes()per element (Gray9/10/12/14/16, Ya16);f32::from_bits(raw.to_bits().swap_bytes())for Grayf32load_endian_u16x8::<BE>; f32 loads viavreinterpretq_f32_u32(load_endian_u32x4::<BE>(...)). Ya16 NEON:if BE { return scalar; }guard —vld2q_u16has no endian-aware variant available without invasive deinterleave restructuring; BE Ya16 falls through to scalar (rare path)_mm*_castsi*_ps(load_endian_u32xN::<BE>(...))for f32;load_endian_u16xN::<BE>for u16load_endian_u16x8::<BE>/load_endian_u32x4::<BE>Test results: 2176 tests pass.
cargo build(all-features, no-default-features, x86_64-apple-darwin), clippy, fmt all clean.Stacking
Base:
feat/be-infra(#81). Will rebase ontomainonce #81 merges.Test plan
cargo test --target aarch64-apple-darwincargo build --target x86_64-apple-darwin --testscargo build --no-default-features(alloc-only)🤖 Generated with Claude Code