Ship 8b-2b: Yuva420p family u8 RGBA SIMD across all 5 backends by al8n · Pull Request #36 · Findit-AI/colconv

al8n · 2026-04-27T23:06:39Z

Summary

Adds u8 RGBA SIMD across NEON / SSE4.1 / AVX2 / AVX-512 / wasm simd128 for the YUVA 4:2:0 family — 8-bit Yuva420p plus high-bit Yuva420p9 / Yuva420p10 / Yuva420p16
Wires the 4 u8 RGBA dispatchers in `src/row/mod.rs` that landed as scalar-only stubs in PR Ship 8b-2a: Yuva420p family scalar prep (Yuva420p / Yuva420p9 / Yuva420p10 / Yuva420p16) #35 (Ship 8b-2a) — replaces the `let _ = use_simd` lines with the standard `cfg_select!` per-arch route block
u16 RGBA SIMD for this family is deferred to Ship 8b-2c

Changes

5 SIMD backends — each gain a third const-generic `ALPHA_SRC: bool` added to the existing `<BITS, ALPHA>` (or `` for 8-bit / 16-bit) templates across 3 kernel families:
- 8-bit: `yuv_420_to_rgb_or_rgba_row<ALPHA, ALPHA_SRC>`
- high-bit BITS-generic: `yuv_420p_n_to_rgb_or_rgba_row<BITS, ALPHA, ALPHA_SRC>`
- 16-bit: `yuv_420p16_to_rgb_or_rgba_row<ALPHA, ALPHA_SRC>`
When `ALPHA_SRC = true` the kernel reads the source alpha plane + masks with `bits_mask::()` (high-bit only) + depth-converts (`>> (BITS - 8)` variable shift for u8 output — literal `>> 8` for 16-bit). 8-bit Yuva420p alpha is already u8 so loads directly via the wide load intrinsic. Existing no-alpha / opaque-alpha wrappers stay backward-compat by passing `ALPHA_SRC = false, None`.
4 u8 RGBA dispatchers wired in `src/row/mod.rs` (`yuva420p_to_rgba_row`, `yuva420p9_to_rgba_row`, `yuva420p10_to_rgba_row`, `yuva420p16_to_rgba_row`) — replace the prior `let _ = use_simd` stubs with the standard `cfg_select!` per-arch route block, mirroring the existing Yuva444p10 dispatchers' patterns. `use_simd = false` still forces scalar.
Per-backend RGBA equivalence tests — 31 new `#[test]` functions across the 5 backend test modules (7 NEON, 6 each on SSE4.1 / AVX2 / AVX-512 / wasm simd128). Each new x86 test early-returns on `is_x86_feature_detected!` so the suite stays clean under sanitizer / Miri / non-feature-flagged CI runners. Pseudo-random alpha flushes out lane-order corruption that a solid-alpha buffer would mask.
Compile-time `const { assert!(!ALPHA_SRC || ALPHA) }` retained on every shared template — source alpha requires RGBA output (no 3 bpp store with alpha to put it in).

Test plan

`cargo check --lib --tests` (aarch64) — clean
`cargo test --lib` (aarch64) — 624 passed (+7 NEON)
`RUSTFLAGS=-Dwarnings cargo clippy --lib --tests` (aarch64) — clean
`cargo check --target x86_64-unknown-freebsd --lib --tests` — clean
`RUSTFLAGS=-Dwarnings cargo clippy --target x86_64-unknown-freebsd --lib --tests` — clean
`cargo check --target wasm32-unknown-unknown --lib --tests` — clean

Follow-up

Ship 8b-2c: u16 RGBA SIMD for the same Yuva420p family (extends `yuv_420p_n_to_rgb_or_rgba_u16_row` and `yuv_420p16_to_rgb_or_rgba_u16_row` with the third `ALPHA_SRC` const generic). The `*_to_rgba_u16_row` dispatchers in `src/row/mod.rs` remain scalar-only until then.

🤖 Generated with Claude Code

Adds u8 RGBA SIMD across NEON / SSE4.1 / AVX2 / AVX-512 / wasm simd128 for the YUVA 4:2:0 family — 8-bit Yuva420p plus high-bit Yuva420p9 / Yuva420p10 / Yuva420p16 — and wires them into the 4 u8 RGBA dispatchers in src/row/mod.rs that landed as scalar-only stubs in PR #35 (Ship 8b-2a). The u16 RGBA SIMD work is deferred to Ship 8b-2c. ## Changes - **5 SIMD backends** — each gain a third const-generic `ALPHA_SRC: bool` added to the existing `<BITS, ALPHA>` (or `<ALPHA>` for 8-bit / 16-bit) templates across 3 kernel families: - 8-bit: `yuv_420_to_rgb_or_rgba_row<ALPHA, ALPHA_SRC>` - high-bit BITS-generic: `yuv_420p_n_to_rgb_or_rgba_row<BITS, ALPHA, ALPHA_SRC>` - 16-bit: `yuv_420p16_to_rgb_or_rgba_row<ALPHA, ALPHA_SRC>` When `ALPHA_SRC = true` the kernel reads the source alpha plane + masks with `bits_mask::<BITS>()` (high-bit only) + depth-converts (`>> (BITS - 8)` variable shift for u8 output — literal `>> 8` for 16-bit). 8-bit Yuva420p alpha is already u8 so loads directly via the wide load intrinsic. Existing no-alpha / opaque-alpha wrappers stay backward-compat by passing `ALPHA_SRC = false, None`. - **4 u8 RGBA dispatchers wired** in `src/row/mod.rs` (`yuva420p_to_rgba_row`, `yuva420p9_to_rgba_row`, `yuva420p10_to_rgba_row`, `yuva420p16_to_rgba_row`) — replace the prior `let _ = use_simd` stubs with the standard `cfg_select!` per-arch route block, mirroring the existing Yuva444p10 dispatchers' patterns. `use_simd = false` still forces scalar. - **Per-backend RGBA equivalence tests** — 31 new `#[test]` functions across the 5 backend test modules (7 NEON, 6 each on SSE4.1 / AVX2 / AVX-512 / wasm simd128). Each new x86 test early-returns on `is_x86_feature_detected!` so the suite stays clean under sanitizer / Miri / non-feature-flagged CI runners. Pseudo-random alpha is used to flush out lane-order corruption that a solid-alpha buffer would mask. - Compile-time `const { assert!(!ALPHA_SRC || ALPHA) }` retained on every shared template — source alpha requires RGBA output (no 3 bpp store with alpha to put it in). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

`16 - BITS` is already `u32` (BITS is `const BITS: u32`), so the trailing `as u32` was a no-op. Clippy's `unnecessary_cast` (`u32` → `u32`) flagged all 4 occurrences in `wasm_simd128.rs` (lines 1904 / 2082 / 4075 / 4253) as errors under `RUSTFLAGS=-Dwarnings`. These predate this branch, but were exposed once `clippy --target wasm32-unknown-unknown --lib --tests` ran clean otherwise. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

Adds SIMD-accelerated u8 RGBA conversion for the YUVA 4:2:0 family (Yuva420p / Yuva420p9 / Yuva420p10 / Yuva420p16) across all supported SIMD backends, wiring the previously scalar-only dispatchers and adding backend equivalence tests.

Changes:

Wire yuva420p*_to_rgba_row u8 dispatchers in src/row/mod.rs to per-arch SIMD wrappers via cfg_select! (with scalar fallback when use_simd = false or SIMD isn’t available).
Extend each SIMD backend’s shared 4:2:0 kernels with a third const generic ALPHA_SRC and add *_with_alpha_src_row wrappers that read/depth-convert the source alpha plane.
Add per-backend SIMD-vs-scalar equivalence tests for YUVA 4:2:0 u8 RGBA paths (including varying alpha seeds).

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated no comments.

Show a summary per file

File	Description
src/row/mod.rs	Replaces scalar-only stubs for YUVA 4:2:0 u8 RGBA dispatchers with per-arch SIMD routing + scalar fallback; keeps u16 RGBA dispatchers scalar.
src/row/arch/neon.rs	Adds `ALPHA_SRC` support and YUVA420 u8 RGBA-with-alpha-source wrappers to NEON kernels.
src/row/arch/neon/tests.rs	Adds NEON SIMD-vs-scalar equivalence tests for YUVA 4:2:0 u8 RGBA with source alpha.
src/row/arch/x86_sse41.rs	Adds `ALPHA_SRC` support and YUVA420 u8 RGBA-with-alpha-source wrappers to SSE4.1 kernels.
src/row/arch/x86_sse41/tests.rs	Adds SSE4.1 SIMD-vs-scalar equivalence tests for YUVA 4:2:0 u8 RGBA with source alpha.
src/row/arch/x86_avx2.rs	Adds `ALPHA_SRC` support and YUVA420 u8 RGBA-with-alpha-source wrappers to AVX2 kernels.
src/row/arch/x86_avx2/tests.rs	Adds AVX2 SIMD-vs-scalar equivalence tests for YUVA 4:2:0 u8 RGBA with source alpha.
src/row/arch/x86_avx512.rs	Adds `ALPHA_SRC` support and YUVA420 u8 RGBA-with-alpha-source wrappers to AVX-512BW kernels.
src/row/arch/x86_avx512/tests.rs	Adds AVX-512 SIMD-vs-scalar equivalence tests for YUVA 4:2:0 u8 RGBA with source alpha.
src/row/arch/wasm_simd128.rs	Adds `ALPHA_SRC` support and YUVA420 u8 RGBA-with-alpha-source wrappers to wasm simd128 kernels.
src/row/arch/wasm_simd128/tests.rs	Adds wasm simd128 SIMD-vs-scalar equivalence tests for YUVA 4:2:0 u8 RGBA with source alpha.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot

Pull request overview

Copilot reviewed 11 out of 11 changed files in this pull request and generated no new comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot AI review requested due to automatic review settings April 27, 2026 23:06

Copilot started reviewing on behalf of al8n April 27, 2026 23:07 View session

uqio and others added 2 commits April 28, 2026 11:07

update

2d90e19

Copilot AI reviewed Apr 27, 2026

View reviewed changes

al8n requested a review from Copilot April 27, 2026 23:18

al8n merged commit e0392c7 into main Apr 27, 2026
45 checks passed

al8n deleted the feat/ship8b-2b-yuva420p-family-u8-simd branch April 27, 2026 23:18

Copilot started reviewing on behalf of al8n April 27, 2026 23:18 View session

Copilot AI reviewed Apr 27, 2026

View reviewed changes

al8n mentioned this pull request Apr 27, 2026

Ship 8b-2c: Yuva420p family u16 RGBA SIMD across all 5 backends #37

Merged

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ship 8b-2b: Yuva420p family u8 RGBA SIMD across all 5 backends#36

Ship 8b-2b: Yuva420p family u8 RGBA SIMD across all 5 backends#36
al8n merged 3 commits intomainfrom
feat/ship8b-2b-yuva420p-family-u8-simd

al8n commented Apr 27, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

al8n commented Apr 27, 2026

Summary

Changes

Test plan

Follow-up

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants