feat(sinker): Ship 8 — NV12 / NV21 RGBA via const-ALPHA template by uqio · Pull Request #17 · Findit-AI/colconv

uqio · 2026-04-26T03:54:46Z

Tranche 2 of Ship 8 sink-side RGBA. Mass-applies the const-generic-ALPHA template proven by PR #16 to the 4:2:0 semi-planar family — NV12 (UV-ordered, FFmpeg's nv12, the most common HW-decoder output across CUDA / NVDEC / VideoToolbox / VAAPI / Android MediaCodec / QSV) and NV21 (VU-ordered, Android camera default).

Same shape as PR 1: kernel const-generic refactor + format-specific with_rgba impl block + MixedSinker wiring + per-backend equivalence tests + format-level tests. No new SIMD code outside the kernel — the write_rgba_* helpers shipped with PR 1 are reused.

Scope

Sink-side only, default opaque alpha (0xFF). Source-side YUVA is Ship 8b (separate follow-up). Per the tracker in docs/color-conversion-functions.md § Ship 8, this is tranche 2 of 7+:

#	Tranche	Formats	Status
1	4:2:0 planar	`Yuv420p`	✅ shipped (PR #16)
2	4:2:0 semi-planar	`Nv12`, `Nv21`	⏳ this PR
3	4:2:2 planar + semi-planar	`Yuv422p`, `Nv16`	next
4	4:4:4 + 4:4:0	`Yuv444p`, `Nv24`, `Nv42`, `Yuv440p`
5	High-bit-depth 4:2:0	`Yuv420p9/10/12/14/16`, `P010/P012/P016`
6	High-bit-depth 4:2:2	`Yuv422p9/10/12/14/16`, `Yuv440p10/12`, `P210/P212/P216`
7	High-bit-depth 4:4:4	`Yuv444p9/10/12/14/16`, `P410/P412/P416`

Usage:

use colconv::{
    frame::{Nv12Frame, Nv21Frame},
    sinker::MixedSinker,
    yuv::{Nv12, Nv21, nv12_to, nv21_to},
    ColorMatrix,
};

let frame = Nv12Frame::new(&y_plane, &uv_plane, w, h, w, w);
let mut rgba = vec![0u8; (w * h * 4) as usize];
let mut sinker = MixedSinker::<Nv12>::new(w as usize, h as usize)
    .with_rgba(&mut rgba)?;

nv12_to(&frame, /*full_range=*/ true, ColorMatrix::Bt709, &mut sinker)?;
// rgba[4*i..4*i+3] = R,G,B; rgba[4*i+3] = 0xFF.

MixedSinker::<Nv21> works identically with Nv21Frame + nv21_to.

What's in this PR

Public API

MixedSinker<Nv12>::with_rgba(&mut [u8]) / set_rgba — format-specific impl block, mirrors the Yuv420p pattern.
MixedSinker<Nv21>::with_rgba(&mut [u8]) / set_rgba — same.
row::nv12_to_rgba_row(...) and row::nv21_to_rgba_row(...) — public dispatchers paralleling the RGB variants.

Kernel work

The shared NV12/NV21 kernel was already const-generic on <const SWAP_UV: bool> (NV12 = false, NV21 = true). This PR adds a second const generic <const ALPHA: bool>:

File	Refactored function
`row/scalar.rs`	`nv12_or_nv21_to_rgb_or_rgba_row_impl<SWAP_UV, ALPHA>`
`row/arch/neon.rs`	Same shape; uses `vst4q_u8` when `ALPHA` (native AArch64 4-channel store)
`row/arch/x86_sse41.rs`	Same shape; reuses `write_rgba_16` from PR 1
`row/arch/x86_avx2.rs`	Same shape; reuses `write_rgba_32` from PR 1
`row/arch/x86_avx512.rs`	Same shape; reuses `write_rgba_64` from PR 1
`row/arch/wasm_simd128.rs`	Same shape; reuses wasm `write_rgba_16` from PR 1

Each kernel has 4 wrappers now (nv12_to_rgb_row, nv12_to_rgba_row, nv21_to_rgb_row, nv21_to_rgba_row) thinning to the same monomorphized template. The compiler DCE's the unused branches at every call site.

MixedSinker integration

RGBA runs as an independent kernel call (not compose) — same as Yuv420p in PR 1. Default alpha = 0xFF since neither NV12 nor NV21 carries an alpha plane.

Doc updates

docs/color-conversion-functions.md § Ship 8:

Marked PR 1 (Yuv420p RGBA) as ✅ shipped (PR feat(sinker): Ship 8 — Yuv420p RGBA output via const-ALPHA template #16).
Documented the const-generic-ALPHA strategy choice (option 3 over compose / fully forked native — same speed as native, ~1/5th the LOC).
Added the tranche tracker (this PR is tranche 2 of 7+).
Promoted Ship 8b source-side YUVA to a separate follow-up.
Added the format-specific impl scoping rationale (the load-bearing safety property pinned by the compile_fail doctest).

Tests

+14 lib tests, total 443 (was 429):

Layer	Tests added
Format-level NV12	5: gray-to-gray + opaque alpha, RGB-byte invariant, buffer-too-short, random-YUV SIMD parity, cross-format invariant against Yuv420p RGBA
Format-level NV21	4: gray-to-gray + opaque alpha, RGB-byte invariant, random-YUV SIMD parity, cross-format invariant against NV12 RGBA (with byte-swapped chroma — catches SWAP_UV bugs specific to the new RGBA path)
NEON per-backend	5: 16-pixel all-matrices, varied widths, NV12-vs-Yuv420p RGBA cross-format, NV21 16-pixel all-matrices, NV21 widths
SSE4.1 per-backend (CI)	5: same shape
AVX2 per-backend (CI)	5: 32-pixel main + tail widths + cross-format
AVX-512 per-backend (CI)	5: 64-pixel main + tail widths + cross-format
wasm simd128 per-backend (CI)	4: 16-pixel main + tail widths

Per-backend tests bypass the dispatcher — call each backend's unsafe nv12_to_rgba_row / nv21_to_rgba_row directly under runtime feature detection. So on AVX-512-capable CI runners, all three x86 paths run; the existing CI matrix (avx512 SDE + AVX2-max + SSE4.1-max + scalar tarpaulin tiers) covers every backend.

Compile-fail doctest update — PR 1's doctest used MixedSinker::<Nv12>::with_rgba as the negative example. Since NV12 now has RGBA wired, the negative example moves to MixedSinker::<Nv16> (tranche 3, not yet wired). Each future tranche moves it forward similarly.

Local results (aarch64 macOS): 443 lib tests + 1 doctest pass; wasm32 + x86_64 cross-targets compile clean.

What's deferred

Tranche 3+ — Yuv422p / NV16 next, then Yuv444p / NV24 / NV42 / Yuv440p, then high-bit-depth families. Same template, same test shape per tranche.
with_rgba_u16 ships in tranches 5–7 (where u16 output exists).
YUVA source frames (Ship 8b) — when the source carries an alpha plane, the kernel copies it through instead of writing the opaque default. Independent of the sink-side mass-apply.
Bench — nv12_to_rgba vs. nv12_to_rgb. Deferred until enough tranches have shipped to make a comparative table meaningful.

Test plan

CI green on test, test-sde-avx512, cross, coverage, clippy, build jobs.
Per-tier coverage matrix exercises SSE4.1 / AVX2 / scalar paths via existing colconv_disable_* rustflags.
Verify NV12 → RGBA pipeline end-to-end with a real frame (gray + non-gray patches).
cargo doc --lib --no-deps clean (no new doc warnings vs. main).

🤖 Generated with Claude Code

codecov · 2026-04-26T04:15:36Z

Codecov Report

❌ Patch coverage is 82.53165% with 69 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
src/row/mod.rs	68.00%	16 Missing ⚠️
src/row/arch/neon.rs	79.41%	14 Missing ⚠️
src/row/arch/x86_avx2.rs	81.81%	12 Missing ⚠️
src/row/arch/x86_avx512.rs	81.81%	12 Missing ⚠️
src/row/arch/x86_sse41.rs	81.81%	12 Missing ⚠️
src/sinker/mixed.rs	94.44%	3 Missing ⚠️

📢 Thoughts on this report? Let us know!

Copilot

Pull request overview

Adds sink-side RGBA output support for the 8‑bit 4:2:0 semi‑planar formats NV12 and NV21, reusing the const-generic ALPHA template approach introduced previously and wiring it through MixedSinker plus the row-kernel dispatch layer and backend kernels.

Changes:

Add MixedSinker::<Nv12>::with_rgba / set_rgba and MixedSinker::<Nv21>::with_rgba / set_rgba, and run native NV12/NV21→RGBA kernels during process.
Add public dispatchers row::nv12_to_rgba_row and row::nv21_to_rgba_row.
Refactor NV12/NV21 scalar + SIMD kernels to a shared <const SWAP_UV: bool, const ALPHA: bool> implementation, plus add/extend equivalence tests.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
src/sinker/mixed.rs	Adds NV12/NV21 `with_rgba`/`set_rgba` APIs, wires RGBA kernel calls into `process`, updates compile-fail doctest, adds format-level tests.
src/row/scalar.rs	Adds NV12/NV21 RGBA scalar wrappers and refactors shared scalar kernel to `<SWAP_UV, ALPHA>`.
src/row/mod.rs	Introduces public NV12/NV21→RGBA row dispatchers with SIMD selection and scalar fallback.
src/row/arch/neon.rs	Refactors NEON NV12/NV21 kernel to `<SWAP_UV, ALPHA>` and adds RGBA wrappers + tests.
src/row/arch/x86_sse41.rs	Refactors SSE4.1 NV12/NV21 kernel to `<SWAP_UV, ALPHA>` and adds RGBA wrappers + tests.
src/row/arch/x86_avx2.rs	Refactors AVX2 NV12/NV21 kernel to `<SWAP_UV, ALPHA>` and adds RGBA wrappers + tests.
src/row/arch/x86_avx512.rs	Refactors AVX-512 NV12/NV21 kernel to `<SWAP_UV, ALPHA>` and adds RGBA wrappers + tests.
src/row/arch/wasm_simd128.rs	Refactors wasm simd128 NV12/NV21 kernel to `<SWAP_UV, ALPHA>` and adds RGBA wrappers + tests.

Comments suppressed due to low confidence (1)

src/row/arch/x86_avx2.rs:1533

These unsafe NV12/NV21 wrapper functions (and the RGBA wrappers below) lack a /// # Safety section, while other unsafe wrappers in this file document their contracts. Please add a Safety section (or forward to the shared impl’s Safety contract) stating the AVX2 feature requirement plus width/slice length invariants to prevent accidental UB by direct callers.

/// AVX2 NV12 → packed RGB. Thin wrapper over
/// [`nv12_or_nv21_to_rgb_or_rgba_row_impl`] with
/// `SWAP_UV = false, ALPHA = false`.
#[inline]
#[target_feature(enable = "avx2")]
pub(crate) unsafe fn nv12_to_rgb_row(
  y: &[u8],
  uv_half: &[u8],

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-26T04:23:56Z

-/// Same as [`nv12_or_nv21_to_rgb_row_impl`].
+/// WASM simd128 NV12 → packed RGB. Thin wrapper over
+/// [`nv12_or_nv21_to_rgb_or_rgba_row_impl`] with
+/// `SWAP_UV = false, ALPHA = false`.


The NV12/NV21 unsafe wrapper functions here (and the new RGBA wrappers) are missing a /// # Safety section, while other unsafe wrappers in this file include one. Please add a Safety section (or forward to the shared impl’s Safety contract) stating that simd128 must be enabled and that width/slice length invariants must hold, to avoid accidental UB by direct callers.

Suggested change

/// `SWAP_UV = false, ALPHA = false`.

/// `SWAP_UV = false, ALPHA = false`.

///

/// # Safety

///

/// Same safety contract as

/// [`nv12_or_nv21_to_rgb_or_rgba_row_impl`]: `simd128` must be

/// enabled at compile time, and the caller must uphold that `width`

/// and all slice lengths satisfy the implementation's invariants so

/// that its unchecked SIMD loads/stores remain in-bounds.

Copilot · 2026-04-26T04:23:56Z

-/// Same as [`nv12_or_nv21_to_rgb_row_impl`].
+/// NEON NV12 → packed RGB (UV-ordered chroma). Thin wrapper over
+/// [`nv12_or_nv21_to_rgb_or_rgba_row_impl`] with
+/// `SWAP_UV = false, ALPHA = false`.


The unsafe NV12/NV21 wrapper fns (nv12_to_rgb_row / nv21_to_rgb_row and the new RGBA variants) don't have a /// # Safety section, but similar wrappers in this module (e.g. nv24_to_rgb_row) do. Please document the Safety contract (or explicitly forward to nv12_or_nv21_to_rgb_or_rgba_row_impl’s Safety) so direct callers know they must ensure NEON availability and uphold width/slice length invariants.

Suggested change

/// `SWAP_UV = false, ALPHA = false`.

/// `SWAP_UV = false, ALPHA = false`.

///

/// # Safety

///

/// The caller must uphold the same safety contract as

/// [`nv12_or_nv21_to_rgb_or_rgba_row_impl`]:

///

/// 1. **NEON must be available on the current CPU.** If this wrapper

/// is called directly, the caller is responsible for ensuring that

/// executing NEON instructions is valid.

/// 2. `width & 1 == 0` (4:2:0/NV12 requires even width).

/// 3. `y.len() >= width`.

/// 4. `uv_half.len() >= width`.

/// 5. `rgb_out.len() >= 3 * width`.

Copilot · 2026-04-26T04:23:57Z

-/// Same as [`nv12_or_nv21_to_rgb_row_impl`].
+/// SSE4.1 NV12 → packed RGB. Thin wrapper over
+/// [`nv12_or_nv21_to_rgb_or_rgba_row_impl`] with
+/// `SWAP_UV = false, ALPHA = false`.


The new unsafe NV12/NV21 wrapper fns here (and the RGBA variants below) don't include a /// # Safety section, unlike other unsafe row wrappers in this module (e.g. yuv_420_to_* and nv24_to_rgb_row). Please add a Safety section (or explicitly forward to the shared impl’s Safety contract) so callers know they must ensure SSE4.1 is available and slices meet the length/width invariants.

Suggested change

/// `SWAP_UV = false, ALPHA = false`.

/// `SWAP_UV = false, ALPHA = false`.

///

/// # Safety

/// Callers must ensure SSE4.1 is available on the current CPU before

/// invoking this function.

///

/// In addition, the slice/width preconditions are the same as for

/// [`nv12_or_nv21_to_rgb_or_rgba_row_impl`]: `y` must contain at least

/// `width` bytes, `uv_half` must contain enough interleaved chroma data

/// for the row width, and `rgb_out` must contain at least `width * 3`

/// bytes.

Copilot · 2026-04-26T04:23:57Z

-/// Same as [`nv12_or_nv21_to_rgb_row_impl`].
+/// AVX‑512 NV12 → packed RGB. Thin wrapper over
+/// [`nv12_or_nv21_to_rgb_or_rgba_row_impl`] with
+/// `SWAP_UV = false, ALPHA = false`.


The unsafe NV12/NV21 wrapper fns added/updated here (including the new RGBA wrappers) don't document a /// # Safety contract, unlike many other unsafe kernels in this module. Please add a Safety section (or explicitly refer to nv12_or_nv21_to_rgb_or_rgba_row_impl’s Safety) covering the AVX-512F/BW CPU feature requirement and the width/slice length invariants.

Suggested change

/// `SWAP_UV = false, ALPHA = false`.

/// `SWAP_UV = false, ALPHA = false`.

///

/// # Safety

/// The caller must ensure that the current CPU supports `avx512f` and

/// `avx512bw`.

///

/// In addition, all slice-length and `width` invariants required by

/// [`nv12_or_nv21_to_rgb_or_rgba_row_impl`] must hold for `y`, `uv_half`,

/// and `rgb_out`.

Copilot

Pull request overview

Copilot reviewed 12 out of 12 changed files in this pull request and generated no new comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

#18) Tranche 3 of Ship 8 sink-side RGBA. **Wiring-only PR** — no new SIMD code. The 4:2:2 formats reuse the 4:2:0 kernels from tranches 1 + 2: `Yuv422p` calls `yuv_420_to_rgba_row` (already shipped in PR #16), and `Nv16` calls `nv12_to_rgba_row` (already shipped in PR #17). 4:2:2's per-row contract is identical to 4:2:0's — half-width chroma, one pair per Y pair — so the same kernel handles both with no changes.

Tranche 4a of Ship 8 sink-side RGBA. Refactors the Yuv444p planar 4:4:4 kernel family across all 6 backends (scalar + NEON + SSE4.1 + AVX2 + AVX-512 + wasm simd128) using the const-generic-ALPHA template established by PR #16 (Yuv420p) and extended in PR #17 (NV12/NV21).

) Tranche 4b of Ship 8 sink-side RGBA. Adds `Nv24` / `Nv42` (semi-planar 4:4:4) RGBA output via the dual-const-generic `<SWAP_UV, ALPHA>` template established by PR #17 (NV12 / NV21), and **retro-applies a Strategy A combined RGB→RGBA fan-out to all 8 wired families** so callers attaching both `with_rgb` and `with_rgba` no longer pay the per-pixel YUV→RGB math twice — addresses the Copilot review finding from PR #19 (`src/sinker/mixed.rs:1648`).

update

44a23b9

al8n changed the title ~~feat~~ feat(sinker): Ship 8 — Yuv420p (PR 1 follow-up) NV12 / NV21 RGBA via const-ALPHA template Apr 26, 2026

al8n changed the title ~~feat(sinker): Ship 8 — Yuv420p (PR 1 follow-up) NV12 / NV21 RGBA via const-ALPHA template~~ feat(sinker): Ship 8 — NV12 / NV21 RGBA via const-ALPHA template Apr 26, 2026

al8n requested a review from Copilot April 26, 2026 04:14

Copilot started reviewing on behalf of al8n April 26, 2026 04:18 View session

Copilot AI reviewed Apr 26, 2026

View reviewed changes

update

d8c0f64

al8n requested a review from Copilot April 26, 2026 04:40

Copilot started reviewing on behalf of al8n April 26, 2026 04:41 View session

Copilot AI reviewed Apr 26, 2026

View reviewed changes

uqio merged commit bd51e47 into main Apr 26, 2026
7 of 70 checks passed

uqio deleted the feat/ship8-rgba-nv12-nv21 branch April 26, 2026 04:46

al8n mentioned this pull request Apr 26, 2026

feat(sinker): Ship 8 — Yuv422p / NV16 RGBA (kernel reuse, no new SIMD) #18

Merged

3 tasks

al8n mentioned this pull request Apr 26, 2026

feat(sinker): Ship 8 — Yuv444p RGBA via const-ALPHA template #19

Merged

4 tasks

al8n mentioned this pull request Apr 26, 2026

feat(sinker): Ship 8 — Nv24/Nv42 RGBA + Strategy A RGB→RGBA fan-out #20

Merged

4 tasks

This was referenced Apr 26, 2026

feat(sinker): Ship 8 — Yuv440p RGBA wiring (reuses Yuv444p kernels) #22

Merged

feat(row): Ship 8 — high-bit 4:2:0 RGBA scalar (SIMD lands in 5a/5b) #24

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(sinker): Ship 8 — NV12 / NV21 RGBA via const-ALPHA template#17

feat(sinker): Ship 8 — NV12 / NV21 RGBA via const-ALPHA template#17
uqio merged 2 commits intomainfrom
feat/ship8-rgba-nv12-nv21

uqio commented Apr 26, 2026 •

edited by al8n

Loading

Uh oh!

codecov Bot commented Apr 26, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 26, 2026

Uh oh!

Copilot AI Apr 26, 2026

Uh oh!

Copilot AI Apr 26, 2026

Uh oh!

Copilot AI Apr 26, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

-/// `SWAP_UV = false, ALPHA = false`.
+/// `SWAP_UV = false, ALPHA = false`.
+///
+/// # Safety
+///
+/// Same safety contract as
+/// [`nv12_or_nv21_to_rgb_or_rgba_row_impl`]: `simd128` must be
+/// enabled at compile time, and the caller must uphold that `width`
+/// and all slice lengths satisfy the implementation's invariants so
+/// that its unchecked SIMD loads/stores remain in-bounds.

-/// `SWAP_UV = false, ALPHA = false`.
+/// `SWAP_UV = false, ALPHA = false`.
+///
+/// # Safety
+///
+/// The caller must uphold the same safety contract as
+/// [`nv12_or_nv21_to_rgb_or_rgba_row_impl`]:
+///
+/// 1. **NEON must be available on the current CPU.** If this wrapper
+///    is called directly, the caller is responsible for ensuring that
+///    executing NEON instructions is valid.
+/// 2. `width & 1 == 0` (4:2:0/NV12 requires even width).
+/// 3. `y.len() >= width`.
+/// 4. `uv_half.len() >= width`.
+/// 5. `rgb_out.len() >= 3 * width`.

-/// `SWAP_UV = false, ALPHA = false`.
+/// `SWAP_UV = false, ALPHA = false`.
+///
+/// # Safety
+/// Callers must ensure SSE4.1 is available on the current CPU before
+/// invoking this function.
+///
+/// In addition, the slice/width preconditions are the same as for
+/// [`nv12_or_nv21_to_rgb_or_rgba_row_impl`]: `y` must contain at least
+/// `width` bytes, `uv_half` must contain enough interleaved chroma data
+/// for the row width, and `rgb_out` must contain at least `width * 3`
+/// bytes.

-/// `SWAP_UV = false, ALPHA = false`.
+/// `SWAP_UV = false, ALPHA = false`.
+///
+/// # Safety
+/// The caller must ensure that the current CPU supports `avx512f` and
+/// `avx512bw`.
+///
+/// In addition, all slice-length and `width` invariants required by
+/// [`nv12_or_nv21_to_rgb_or_rgba_row_impl`] must hold for `y`, `uv_half`,
+/// and `rgb_out`.

Conversation

uqio commented Apr 26, 2026 • edited by al8n Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Scope

What's in this PR

Public API

Kernel work

MixedSinker integration

Doc updates

Tests

What's deferred

Test plan

Uh oh!

codecov Bot commented Apr 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

uqio commented Apr 26, 2026 •

edited by al8n

Loading

codecov Bot commented Apr 26, 2026 •

edited

Loading