Skip to content

feat(sinker): Ship 8 — NV12 / NV21 RGBA via const-ALPHA template#17

Merged
uqio merged 2 commits intomainfrom
feat/ship8-rgba-nv12-nv21
Apr 26, 2026
Merged

feat(sinker): Ship 8 — NV12 / NV21 RGBA via const-ALPHA template#17
uqio merged 2 commits intomainfrom
feat/ship8-rgba-nv12-nv21

Conversation

@uqio
Copy link
Copy Markdown
Collaborator

@uqio uqio commented Apr 26, 2026

Tranche 2 of Ship 8 sink-side RGBA. Mass-applies the const-generic-ALPHA template proven by PR #16 to the 4:2:0 semi-planar family — NV12 (UV-ordered, FFmpeg's nv12, the most common HW-decoder output across CUDA / NVDEC / VideoToolbox / VAAPI / Android MediaCodec / QSV) and NV21 (VU-ordered, Android camera default).

Same shape as PR 1: kernel const-generic refactor + format-specific with_rgba impl block + MixedSinker wiring + per-backend equivalence tests + format-level tests. No new SIMD code outside the kernel — the write_rgba_* helpers shipped with PR 1 are reused.

Scope

Sink-side only, default opaque alpha (0xFF). Source-side YUVA is Ship 8b (separate follow-up). Per the tracker in docs/color-conversion-functions.md § Ship 8, this is tranche 2 of 7+:

# Tranche Formats Status
1 4:2:0 planar Yuv420p ✅ shipped (PR #16)
2 4:2:0 semi-planar Nv12, Nv21 this PR
3 4:2:2 planar + semi-planar Yuv422p, Nv16 next
4 4:4:4 + 4:4:0 Yuv444p, Nv24, Nv42, Yuv440p
5 High-bit-depth 4:2:0 Yuv420p9/10/12/14/16, P010/P012/P016
6 High-bit-depth 4:2:2 Yuv422p9/10/12/14/16, Yuv440p10/12, P210/P212/P216
7 High-bit-depth 4:4:4 Yuv444p9/10/12/14/16, P410/P412/P416

Usage:

use colconv::{
    frame::{Nv12Frame, Nv21Frame},
    sinker::MixedSinker,
    yuv::{Nv12, Nv21, nv12_to, nv21_to},
    ColorMatrix,
};

let frame = Nv12Frame::new(&y_plane, &uv_plane, w, h, w, w);
let mut rgba = vec![0u8; (w * h * 4) as usize];
let mut sinker = MixedSinker::<Nv12>::new(w as usize, h as usize)
    .with_rgba(&mut rgba)?;

nv12_to(&frame, /*full_range=*/ true, ColorMatrix::Bt709, &mut sinker)?;
// rgba[4*i..4*i+3] = R,G,B; rgba[4*i+3] = 0xFF.

MixedSinker::<Nv21> works identically with Nv21Frame + nv21_to.

What's in this PR

Public API

  • MixedSinker<Nv12>::with_rgba(&mut [u8]) / set_rgba — format-specific impl block, mirrors the Yuv420p pattern.
  • MixedSinker<Nv21>::with_rgba(&mut [u8]) / set_rgba — same.
  • row::nv12_to_rgba_row(...) and row::nv21_to_rgba_row(...) — public dispatchers paralleling the RGB variants.

Kernel work

The shared NV12/NV21 kernel was already const-generic on <const SWAP_UV: bool> (NV12 = false, NV21 = true). This PR adds a second const generic <const ALPHA: bool>:

File Refactored function
row/scalar.rs nv12_or_nv21_to_rgb_or_rgba_row_impl<SWAP_UV, ALPHA>
row/arch/neon.rs Same shape; uses vst4q_u8 when ALPHA (native AArch64 4-channel store)
row/arch/x86_sse41.rs Same shape; reuses write_rgba_16 from PR 1
row/arch/x86_avx2.rs Same shape; reuses write_rgba_32 from PR 1
row/arch/x86_avx512.rs Same shape; reuses write_rgba_64 from PR 1
row/arch/wasm_simd128.rs Same shape; reuses wasm write_rgba_16 from PR 1

Each kernel has 4 wrappers now (nv12_to_rgb_row, nv12_to_rgba_row, nv21_to_rgb_row, nv21_to_rgba_row) thinning to the same monomorphized template. The compiler DCE's the unused branches at every call site.

MixedSinker integration

RGBA runs as an independent kernel call (not compose) — same as Yuv420p in PR 1. Default alpha = 0xFF since neither NV12 nor NV21 carries an alpha plane.

Doc updates

docs/color-conversion-functions.md § Ship 8:

  • Marked PR 1 (Yuv420p RGBA) as ✅ shipped (PR feat(sinker): Ship 8 — Yuv420p RGBA output via const-ALPHA template #16).
  • Documented the const-generic-ALPHA strategy choice (option 3 over compose / fully forked native — same speed as native, ~1/5th the LOC).
  • Added the tranche tracker (this PR is tranche 2 of 7+).
  • Promoted Ship 8b source-side YUVA to a separate follow-up.
  • Added the format-specific impl scoping rationale (the load-bearing safety property pinned by the compile_fail doctest).

Tests

+14 lib tests, total 443 (was 429):

Layer Tests added
Format-level NV12 5: gray-to-gray + opaque alpha, RGB-byte invariant, buffer-too-short, random-YUV SIMD parity, cross-format invariant against Yuv420p RGBA
Format-level NV21 4: gray-to-gray + opaque alpha, RGB-byte invariant, random-YUV SIMD parity, cross-format invariant against NV12 RGBA (with byte-swapped chroma — catches SWAP_UV bugs specific to the new RGBA path)
NEON per-backend 5: 16-pixel all-matrices, varied widths, NV12-vs-Yuv420p RGBA cross-format, NV21 16-pixel all-matrices, NV21 widths
SSE4.1 per-backend (CI) 5: same shape
AVX2 per-backend (CI) 5: 32-pixel main + tail widths + cross-format
AVX-512 per-backend (CI) 5: 64-pixel main + tail widths + cross-format
wasm simd128 per-backend (CI) 4: 16-pixel main + tail widths

Per-backend tests bypass the dispatcher — call each backend's unsafe nv12_to_rgba_row / nv21_to_rgba_row directly under runtime feature detection. So on AVX-512-capable CI runners, all three x86 paths run; the existing CI matrix (avx512 SDE + AVX2-max + SSE4.1-max + scalar tarpaulin tiers) covers every backend.

Compile-fail doctest update — PR 1's doctest used MixedSinker::<Nv12>::with_rgba as the negative example. Since NV12 now has RGBA wired, the negative example moves to MixedSinker::<Nv16> (tranche 3, not yet wired). Each future tranche moves it forward similarly.

Local results (aarch64 macOS): 443 lib tests + 1 doctest pass; wasm32 + x86_64 cross-targets compile clean.

What's deferred

  • Tranche 3+ — Yuv422p / NV16 next, then Yuv444p / NV24 / NV42 / Yuv440p, then high-bit-depth families. Same template, same test shape per tranche.
  • with_rgba_u16 ships in tranches 5–7 (where u16 output exists).
  • YUVA source frames (Ship 8b) — when the source carries an alpha plane, the kernel copies it through instead of writing the opaque default. Independent of the sink-side mass-apply.
  • Benchnv12_to_rgba vs. nv12_to_rgb. Deferred until enough tranches have shipped to make a comparative table meaningful.

Test plan

  • CI green on test, test-sde-avx512, cross, coverage, clippy, build jobs.
  • Per-tier coverage matrix exercises SSE4.1 / AVX2 / scalar paths via existing colconv_disable_* rustflags.
  • Verify NV12 → RGBA pipeline end-to-end with a real frame (gray + non-gray patches).
  • cargo doc --lib --no-deps clean (no new doc warnings vs. main).

🤖 Generated with Claude Code

@al8n al8n changed the title feat feat(sinker): Ship 8 — Yuv420p (PR 1 follow-up) NV12 / NV21 RGBA via const-ALPHA template Apr 26, 2026
@al8n al8n changed the title feat(sinker): Ship 8 — Yuv420p (PR 1 follow-up) NV12 / NV21 RGBA via const-ALPHA template feat(sinker): Ship 8 — NV12 / NV21 RGBA via const-ALPHA template Apr 26, 2026
@al8n al8n requested a review from Copilot April 26, 2026 04:14
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 26, 2026

Codecov Report

❌ Patch coverage is 82.53165% with 69 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/row/mod.rs 68.00% 16 Missing ⚠️
src/row/arch/neon.rs 79.41% 14 Missing ⚠️
src/row/arch/x86_avx2.rs 81.81% 12 Missing ⚠️
src/row/arch/x86_avx512.rs 81.81% 12 Missing ⚠️
src/row/arch/x86_sse41.rs 81.81% 12 Missing ⚠️
src/sinker/mixed.rs 94.44% 3 Missing ⚠️

📢 Thoughts on this report? Let us know!

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds sink-side RGBA output support for the 8‑bit 4:2:0 semi‑planar formats NV12 and NV21, reusing the const-generic ALPHA template approach introduced previously and wiring it through MixedSinker plus the row-kernel dispatch layer and backend kernels.

Changes:

  • Add MixedSinker::<Nv12>::with_rgba / set_rgba and MixedSinker::<Nv21>::with_rgba / set_rgba, and run native NV12/NV21→RGBA kernels during process.
  • Add public dispatchers row::nv12_to_rgba_row and row::nv21_to_rgba_row.
  • Refactor NV12/NV21 scalar + SIMD kernels to a shared <const SWAP_UV: bool, const ALPHA: bool> implementation, plus add/extend equivalence tests.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
src/sinker/mixed.rs Adds NV12/NV21 with_rgba/set_rgba APIs, wires RGBA kernel calls into process, updates compile-fail doctest, adds format-level tests.
src/row/scalar.rs Adds NV12/NV21 RGBA scalar wrappers and refactors shared scalar kernel to <SWAP_UV, ALPHA>.
src/row/mod.rs Introduces public NV12/NV21→RGBA row dispatchers with SIMD selection and scalar fallback.
src/row/arch/neon.rs Refactors NEON NV12/NV21 kernel to <SWAP_UV, ALPHA> and adds RGBA wrappers + tests.
src/row/arch/x86_sse41.rs Refactors SSE4.1 NV12/NV21 kernel to <SWAP_UV, ALPHA> and adds RGBA wrappers + tests.
src/row/arch/x86_avx2.rs Refactors AVX2 NV12/NV21 kernel to <SWAP_UV, ALPHA> and adds RGBA wrappers + tests.
src/row/arch/x86_avx512.rs Refactors AVX-512 NV12/NV21 kernel to <SWAP_UV, ALPHA> and adds RGBA wrappers + tests.
src/row/arch/wasm_simd128.rs Refactors wasm simd128 NV12/NV21 kernel to <SWAP_UV, ALPHA> and adds RGBA wrappers + tests.
Comments suppressed due to low confidence (1)

src/row/arch/x86_avx2.rs:1533

  • These unsafe NV12/NV21 wrapper functions (and the RGBA wrappers below) lack a /// # Safety section, while other unsafe wrappers in this file document their contracts. Please add a Safety section (or forward to the shared impl’s Safety contract) stating the AVX2 feature requirement plus width/slice length invariants to prevent accidental UB by direct callers.
/// AVX2 NV12 → packed RGB. Thin wrapper over
/// [`nv12_or_nv21_to_rgb_or_rgba_row_impl`] with
/// `SWAP_UV = false, ALPHA = false`.
#[inline]
#[target_feature(enable = "avx2")]
pub(crate) unsafe fn nv12_to_rgb_row(
  y: &[u8],
  uv_half: &[u8],

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

/// Same as [`nv12_or_nv21_to_rgb_row_impl`].
/// WASM simd128 NV12 → packed RGB. Thin wrapper over
/// [`nv12_or_nv21_to_rgb_or_rgba_row_impl`] with
/// `SWAP_UV = false, ALPHA = false`.
Copy link

Copilot AI Apr 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The NV12/NV21 unsafe wrapper functions here (and the new RGBA wrappers) are missing a /// # Safety section, while other unsafe wrappers in this file include one. Please add a Safety section (or forward to the shared impl’s Safety contract) stating that simd128 must be enabled and that width/slice length invariants must hold, to avoid accidental UB by direct callers.

Suggested change
/// `SWAP_UV = false, ALPHA = false`.
/// `SWAP_UV = false, ALPHA = false`.
///
/// # Safety
///
/// Same safety contract as
/// [`nv12_or_nv21_to_rgb_or_rgba_row_impl`]: `simd128` must be
/// enabled at compile time, and the caller must uphold that `width`
/// and all slice lengths satisfy the implementation's invariants so
/// that its unchecked SIMD loads/stores remain in-bounds.

Copilot uses AI. Check for mistakes.
Comment thread src/row/arch/neon.rs
/// Same as [`nv12_or_nv21_to_rgb_row_impl`].
/// NEON NV12 → packed RGB (UV-ordered chroma). Thin wrapper over
/// [`nv12_or_nv21_to_rgb_or_rgba_row_impl`] with
/// `SWAP_UV = false, ALPHA = false`.
Copy link

Copilot AI Apr 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The unsafe NV12/NV21 wrapper fns (nv12_to_rgb_row / nv21_to_rgb_row and the new RGBA variants) don't have a /// # Safety section, but similar wrappers in this module (e.g. nv24_to_rgb_row) do. Please document the Safety contract (or explicitly forward to nv12_or_nv21_to_rgb_or_rgba_row_impl’s Safety) so direct callers know they must ensure NEON availability and uphold width/slice length invariants.

Suggested change
/// `SWAP_UV = false, ALPHA = false`.
/// `SWAP_UV = false, ALPHA = false`.
///
/// # Safety
///
/// The caller must uphold the same safety contract as
/// [`nv12_or_nv21_to_rgb_or_rgba_row_impl`]:
///
/// 1. **NEON must be available on the current CPU.** If this wrapper
/// is called directly, the caller is responsible for ensuring that
/// executing NEON instructions is valid.
/// 2. `width & 1 == 0` (4:2:0/NV12 requires even width).
/// 3. `y.len() >= width`.
/// 4. `uv_half.len() >= width`.
/// 5. `rgb_out.len() >= 3 * width`.

Copilot uses AI. Check for mistakes.
Comment thread src/row/arch/x86_sse41.rs
/// Same as [`nv12_or_nv21_to_rgb_row_impl`].
/// SSE4.1 NV12 → packed RGB. Thin wrapper over
/// [`nv12_or_nv21_to_rgb_or_rgba_row_impl`] with
/// `SWAP_UV = false, ALPHA = false`.
Copy link

Copilot AI Apr 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new unsafe NV12/NV21 wrapper fns here (and the RGBA variants below) don't include a /// # Safety section, unlike other unsafe row wrappers in this module (e.g. yuv_420_to_* and nv24_to_rgb_row). Please add a Safety section (or explicitly forward to the shared impl’s Safety contract) so callers know they must ensure SSE4.1 is available and slices meet the length/width invariants.

Suggested change
/// `SWAP_UV = false, ALPHA = false`.
/// `SWAP_UV = false, ALPHA = false`.
///
/// # Safety
/// Callers must ensure SSE4.1 is available on the current CPU before
/// invoking this function.
///
/// In addition, the slice/width preconditions are the same as for
/// [`nv12_or_nv21_to_rgb_or_rgba_row_impl`]: `y` must contain at least
/// `width` bytes, `uv_half` must contain enough interleaved chroma data
/// for the row width, and `rgb_out` must contain at least `width * 3`
/// bytes.

Copilot uses AI. Check for mistakes.
/// Same as [`nv12_or_nv21_to_rgb_row_impl`].
/// AVX‑512 NV12 → packed RGB. Thin wrapper over
/// [`nv12_or_nv21_to_rgb_or_rgba_row_impl`] with
/// `SWAP_UV = false, ALPHA = false`.
Copy link

Copilot AI Apr 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The unsafe NV12/NV21 wrapper fns added/updated here (including the new RGBA wrappers) don't document a /// # Safety contract, unlike many other unsafe kernels in this module. Please add a Safety section (or explicitly refer to nv12_or_nv21_to_rgb_or_rgba_row_impl’s Safety) covering the AVX-512F/BW CPU feature requirement and the width/slice length invariants.

Suggested change
/// `SWAP_UV = false, ALPHA = false`.
/// `SWAP_UV = false, ALPHA = false`.
///
/// # Safety
/// The caller must ensure that the current CPU supports `avx512f` and
/// `avx512bw`.
///
/// In addition, all slice-length and `width` invariants required by
/// [`nv12_or_nv21_to_rgb_or_rgba_row_impl`] must hold for `y`, `uv_half`,
/// and `rgb_out`.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 12 out of 12 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@uqio uqio merged commit bd51e47 into main Apr 26, 2026
7 of 70 checks passed
@uqio uqio deleted the feat/ship8-rgba-nv12-nv21 branch April 26, 2026 04:46
uqio added a commit that referenced this pull request Apr 26, 2026
#18)

Tranche 3 of Ship 8 sink-side RGBA. **Wiring-only PR** — no new SIMD code. The 4:2:2 formats reuse the 4:2:0 kernels from tranches 1 + 2: `Yuv422p` calls `yuv_420_to_rgba_row` (already shipped in PR #16), and `Nv16` calls `nv12_to_rgba_row` (already shipped in PR #17). 4:2:2's per-row contract is identical to 4:2:0's — half-width chroma, one pair per Y pair — so the same kernel handles both with no changes.
uqio pushed a commit that referenced this pull request Apr 26, 2026
Tranche 4a of Ship 8 sink-side RGBA. Refactors the Yuv444p planar 4:4:4 kernel family across all 6 backends (scalar + NEON + SSE4.1 + AVX2 + AVX-512 + wasm simd128) using the const-generic-ALPHA template established by PR #16 (Yuv420p) and extended in PR #17 (NV12/NV21).
uqio added a commit that referenced this pull request Apr 26, 2026
)

Tranche 4b of Ship 8 sink-side RGBA. Adds `Nv24` / `Nv42` (semi-planar 4:4:4) RGBA output via the dual-const-generic `<SWAP_UV, ALPHA>` template established by PR #17 (NV12 / NV21), and **retro-applies a Strategy A combined RGB→RGBA fan-out to all 8 wired families** so callers attaching both `with_rgb` and `with_rgba` no longer pay the per-pixel YUV→RGB math twice — addresses the Copilot review finding from PR #19 (`src/sinker/mixed.rs:1648`).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants