feat(sinker): Ship 8 — Yuv444p RGBA via const-ALPHA template#19
Conversation
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
This PR adds native RGBA output support for YUV 4:4:4 planar (Yuv444p) across the MixedSinker path and row-kernel layer, including scalar and SIMD backends, plus tests to validate correctness and SIMD equivalence.
Changes:
- Add
yuv_444_to_rgba_rowdispatcher and scalar/SIMD implementations (NEON, SSE4.1, AVX2, AVX-512, wasm simd128). - Extend
MixedSinker<Yuv444p>withwith_rgba/set_rgbaand wire RGBA writing inPixelSink. - Add focused tests for Yuv444p RGBA output and SIMD-vs-scalar equivalence.
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| src/sinker/mixed.rs | Adds RGBA buffer attachment for Yuv444p and emits RGBA during sink conversion + tests. |
| src/row/mod.rs | Exposes public yuv_444_to_rgba_row dispatcher with SIMD selection. |
| src/row/scalar.rs | Adds scalar RGBA kernel and refactors RGB/RGBA into a shared const-generic implementation. |
| src/row/arch/x86_sse41.rs | Adds SSE4.1 RGBA kernel and shared RGB/RGBA core + equivalence tests. |
| src/row/arch/x86_avx2.rs | Adds AVX2 RGBA kernel and shared RGB/RGBA core + equivalence tests. |
| src/row/arch/x86_avx512.rs | Adds AVX-512 RGBA kernel and shared RGB/RGBA core + equivalence tests. |
| src/row/arch/wasm_simd128.rs | Adds wasm simd128 RGBA kernel and shared RGB/RGBA core + equivalence tests. |
| src/row/arch/neon.rs | Adds NEON RGBA kernel and shared RGB/RGBA core + equivalence tests. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| debug_assert!(y.len() >= width, "y row too short"); | ||
| debug_assert!(u.len() >= width, "u row too short"); | ||
| debug_assert!(v.len() >= width, "v row too short"); | ||
| debug_assert!(rgb_out.len() >= width * 3, "rgb_out row too short"); | ||
| let bpp: usize = if ALPHA { 4 } else { 3 }; | ||
| debug_assert!(out.len() >= width * bpp, "out row too short for {bpp}bpp"); |
There was a problem hiding this comment.
The debug_assert! message uses {bpp} formatting but does not pass bpp as an argument, which will fail to compile due to an unmatched named format parameter. Pass bpp to the macro (e.g., via positional or named formatting), or escape the braces if you intended them to be literal.
| // Native RGBA: independent kernel run, separate from RGB. Default | ||
| // alpha = 0xFF since Yuv444p has no alpha plane. | ||
| if let Some(buf) = rgba.as_deref_mut() { | ||
| let rgba_plane_end = | ||
| one_plane_end | ||
| .checked_mul(4) | ||
| .ok_or(MixedSinkerError::GeometryOverflow { | ||
| width: w, | ||
| height: h, | ||
| channels: 4, | ||
| })?; | ||
| let rgba_plane_start = one_plane_start * 4; | ||
| yuv_444_to_rgba_row( | ||
| row.y(), | ||
| row.u(), | ||
| row.v(), | ||
| &mut buf[rgba_plane_start..rgba_plane_end], | ||
| w, | ||
| row.matrix(), | ||
| row.full_range(), | ||
| use_simd, | ||
| ); | ||
| } |
There was a problem hiding this comment.
When callers request both RGB and RGBA outputs, this design runs two independent YUV→RGB conversions (one for RGBA here and another later for RGB/HSV), duplicating the core per-pixel math. Consider a combined path when both rgb and rgba are set (single conversion loop that stores into both buffers), or preferentially compute one format and derive the other with minimal overhead, to avoid doubling the hottest work.
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
) Tranche 4b of Ship 8 sink-side RGBA. Adds `Nv24` / `Nv42` (semi-planar 4:4:4) RGBA output via the dual-const-generic `<SWAP_UV, ALPHA>` template established by PR #17 (NV12 / NV21), and **retro-applies a Strategy A combined RGB→RGBA fan-out to all 8 wired families** so callers attaching both `with_rgb` and `with_rgba` no longer pay the per-pixel YUV→RGB math twice — addresses the Copilot review finding from PR #19 (`src/sinker/mixed.rs:1648`).
) Tranche 4c of Ship 8 sink-side RGBA. Wiring-only PR — adds `Yuv440p` (4:4:0 planar, 8-bit) RGBA output by reusing the `yuv_444_to_rgba_row` dispatcher that shipped in PR #19. **No new kernel code** anywhere in the crate; per-row math is identical to 4:4:4 (full-width chroma) — only the walker reads chroma row `r / 2`.
Tranche 4a of Ship 8 sink-side RGBA. Refactors the Yuv444p planar 4:4:4 kernel family across all 6 backends (scalar + NEON + SSE4.1 + AVX2 + AVX-512 + wasm simd128) using the const-generic-ALPHA template established by PR #16 (Yuv420p) and extended in PR #17 (NV12/NV21).
Tranche 4 was split into three sub-PRs because each format family in it has a different shape:
Yuv444p(planar): own kernel familyyuv_444_to_rgb_row, full refactor neededNv24/Nv42(semi-planar): shared<SWAP_UV, ALPHA>template, mirrors PR feat(sinker): Ship 8 — NV12 / NV21 RGBA via const-ALPHA template #17Yuv440p(planar 4:4:0): wiring-only, reuses 4a'syuv_444_to_rgba_rowScope
Sink-side only, default opaque alpha (
0xFF). Per the tracker indocs/color-conversion-functions.md§ Ship 8:Yuv420pNv12,Nv21Yuv422p,Nv16Yuv444pNv24,Nv42<SWAP_UV, ALPHA>template (mirrors PR #17 NV12/NV21)Yuv440pyuv_444_to_rgba_row)Yuv420p9/10/12/14/16,P010/P012/P016Yuv422p9/10/12/14/16,Yuv440p10/12,P210/P212/P216Yuv444p9/10/12/14/16,P410/P412/P416Usage:
What's in this PR
Public API
MixedSinker<Yuv444p>::with_rgba(&mut [u8])/set_rgba— format-specific impl block.row::yuv_444_to_rgba_row(...)— public dispatcher paralleling the RGB variant.Kernel work
row/scalar.rsyuv_444_to_rgba_row+ sharedyuv_444_to_rgb_or_rgba_row<const ALPHA: bool>templatearch/neon.rsvst4q_u8whenALPHA = true,vst3q_u8otherwisearch/x86_sse41.rswrite_rgba_16from PR #16arch/x86_avx2.rswrite_rgba_32from PR #16arch/x86_avx512.rswrite_rgba_64from PR #16arch/wasm_simd128.rswrite_rgba_16from PR #16The 4:4:4 kernel is structurally simpler than 4:2:0 — one UV pair per Y pixel, no chroma upsampling — so the const-generic-ALPHA refactor is mechanical: only the per-block store branches on
ALPHA. Each kernel has 3 wrappers now (yuv_444_to_rgb_row,yuv_444_to_rgba_row,yuv_444_to_rgb_or_rgba_row) thinning to the same monomorphized template.MixedSinker integration
RGBA runs as an independent kernel call (not compose) — same pattern as Yuv420p (PR #16) and NV12/NV21 (PR #17). Default alpha =
0xFFsince Yuv444p has no alpha plane.Doc updates
docs/color-conversion-functions.md§ Ship 8 — split tranche 4 into 4a (this PR — Yuv444p), 4b (Nv24 / Nv42), 4c (Yuv440p wiring).MixedSinker::<Yuv420p>::with_rgbamoved forward fromYuv444ptoNv24(next not-yet-wired format).Tests
+6 lib tests, total 459 (was 453):
Per-backend tests bypass the dispatcher (call each backend's
unsafe yuv_444_to_rgba_rowdirectly under runtime feature detection) so on AVX-512-capable CI runners all three x86 paths run; the existing CI matrix (avx512SDE + AVX2-max + SSE4.1-max + scalar tarpaulin tiers) covers every backend.Local results (aarch64 macOS): 459 lib tests + 1 doctest pass; wasm32 + x86_64 cross-targets compile clean.
What's deferred
Nv24+Nv42semi-planar 4:4:4 — next PR. Same dual-const-generic shape as PR feat(sinker): Ship 8 — NV12 / NV21 RGBA via const-ALPHA template #17 (NV12/NV21).Yuv440p— wiring-only PR after 4a, reuses this PR'syuv_444_to_rgba_row(4:4:0 = 4:4:4 with half-height chroma).with_rgba_u16ships in tranches 5–7.Test plan
test,test-sde-avx512,cross,coverage,clippy,build,miri-*jobs.colconv_disable_*rustflags.cargo doc --lib --no-depsclean (no new doc warnings vs. main).🤖 Generated with Claude Code