feat(row): Ship 8 — high-bit 4:2:0 RGBA scalar (SIMD lands in 5a/5b) by uqio · Pull Request #24 · Findit-AI/colconv

uqio · 2026-04-26T10:27:10Z

Ship 8 Tranche 5 — scalar foundation. Adds RGBA output (both 8-bit and native-depth u16) for all 8 high-bit-depth 4:2:0 source formats: Yuv420p9 / 10 / 12 / 14 / 16 and P010 / P012 / P016. Scalar paths are fully wired and shippable today — the SIMD per-arch routes land in the follow-up Tranche 5a (u8 RGBA) and 5b (u16 RGBA) PRs.

This is split out of Tranche 5 (which would have been ~6–8k LOC end-to-end) so the foundational scalar work, the public API surface, and the Strategy A kernel-design pattern can land independently of the per-backend SIMD work.

Scope

#	Tranche	Formats	Status
1	4:2:0 planar	`Yuv420p`	✅ shipped (PR #16)
2	4:2:0 semi-planar	`Nv12`, `Nv21`	✅ shipped (PR #17)
3	4:2:2 planar + semi-planar	`Yuv422p`, `Nv16`	✅ shipped (PR #18)
4a	4:4:4 planar	`Yuv444p`	✅ shipped (PR #19)
4b	4:4:4 semi-planar	`Nv24`, `Nv42` + Strategy A retrofit	✅ shipped (PR #20)
4c	4:4:0 planar	`Yuv440p`	✅ shipped (PR #22)
5-prep	High-bit 4:2:0 RGBA — scalar	`Yuv420p9/10/12/14/16` + `P010/P012/P016` (u8 + u16 RGBA)	⏳ this PR
5a	High-bit 4:2:0 — u8 RGBA SIMD	same formats	next
5b	High-bit 4:2:0 — u16 RGBA SIMD + sinker integration	same formats	after 5a
6	High-bit-depth 4:2:2	`Yuv422p9/10/12/14/16`, `Yuv440p10/12`, `P210/P212/P216`
7	High-bit-depth 4:4:4	`Yuv444p9/10/12/14/16`, `P410/P412/P416`

Usage:

```rust
use colconv::{row, ColorMatrix};

// 10-bit YUV 4:2:0 → 8-bit packed RGBA (Strategy A scalar; SIMD branch
// fills in via Tranche 5a).
row::yuv420p10_to_rgba_row(
y_row, u_half, v_half,
&mut rgba_out,
width,
ColorMatrix::Bt2020Ncl,
/full_range=/ false,
/use_simd=/ true, // accepted today; routed only when 5a lands
);

// P010 (HEVC HDR HW decode) → native-depth u16 packed RGBA (alpha = 0x3FF).
row::p010_to_rgba_u16_row(
y_row, uv_half,
&mut rgba_u16_out,
width,
ColorMatrix::Bt2020Ncl, false, true,
);
```

What's in this PR

Public API — 16 new dispatcher functions in `src/row/mod.rs`

Each format gets both a u8 RGBA dispatcher and a native-depth u16 RGBA dispatcher, paralleling the existing RGB ones:

Format	Bit depth	u8 RGBA	u16 RGBA
Yuv420p9	9	`yuv420p9_to_rgba_row`	`yuv420p9_to_rgba_u16_row`
Yuv420p10	10	`yuv420p10_to_rgba_row`	`yuv420p10_to_rgba_u16_row`
P010	10	`p010_to_rgba_row`	`p010_to_rgba_u16_row`
Yuv420p12	12	`yuv420p12_to_rgba_row`	`yuv420p12_to_rgba_u16_row`
Yuv420p14	14	`yuv420p14_to_rgba_row`	`yuv420p14_to_rgba_u16_row`
P012	12	`p012_to_rgba_row`	`p012_to_rgba_u16_row`
Yuv420p16	16	`yuv420p16_to_rgba_row`	`yuv420p16_to_rgba_u16_row`
P016	16	`p016_to_rgba_row`	`p016_to_rgba_u16_row`

The use_simd: bool parameter is held on every signature so the follow-up SIMD PRs (5a, 5b) can fill in per-arch branches without breaking callers. Today it's a no-op (let _ = use_simd; with a comment) — every dispatcher always runs the scalar reference. This is functionally correct (scalar matches the eventual SIMD output bit-for-bit) but slower than the eventual SIMD path.

Plus rgba_row_elems(width) helper — parallel to the existing rgba_row_bytes / rgb_row_elems, sizes &mut [u16] RGBA buffers (width × 4 u16 elements).

Kernel work — 8 new const-ALPHA scalar templates + 13 RGBA wrappers + u16 expand helper

Mirrors the <const ALPHA: bool> template pattern established by PRs #16–22 for the 8-bit RGBA paths. Existing *_to_rgb_*_row<...> functions are now thin ::<false> wrappers — zero behavior change.

BITS	Planar (`Yuv420p` family)	Semi-planar (`P010` family)
9	`yuv_420p_n_to_rgb_or_rgba_row<9, ALPHA>` (+ u16 sibling)	—
10	`<10, ALPHA>`	`p_n_to_rgb_or_rgba_row<10, ALPHA>` (+ u16)
12	`<12, ALPHA>`	`<12, ALPHA>`
14	`<14, ALPHA>`	—
16	`yuv_420p16_to_rgb_or_rgba_row<ALPHA>` (+ u16; non-generic, i64 chroma for u16 path)	`p16_to_rgb_or_rgba_row<ALPHA>` (+ u16, i64 chroma)

Compile-time BITS guards: the Pn shared kernels now use const { assert!(BITS == 10 \|\| BITS == 12) } instead of debug_assert! — this is a hard fix for a pre-existing release-only corruption trap that the new RGBA paths would have widened. If a future dispatcher accidentally instantiated p_n_to_rgb_or_rgba_*_row::<16> it would silently route the i32-chroma path at 16-bit input and overflow before clamp; the compile-time assertion now fails monomorphization for any BITS outside {10, 12}, eliminating that bug class. Routing for P016 stays unambiguous — its dedicated p16_to_rgb_or_rgba_*_row<ALPHA> kernel uses i64 chroma multiply, and the comments on the Pn wrappers explicitly call out "P016 has its own kernel family — never routed here."

`expand_rgb_u16_to_rgba_u16_row` — u16 analogue of the existing expand_rgb_to_rgba_row helper. Strategy A on the u16 path (run RGB once, fan-out to RGBA via memory-bound copy + alpha pad). Alpha element is (1 << BITS) - 1, resolved at compile time per format. Marked #[allow(dead_code)] for now — the consumer (MixedSinker with_rgba_u16 Strategy A) lands in 5b alongside the rest of the high-bit sinker integration.

What's deferred

Tranche 5a (next PR): Add u8 RGBA SIMD per-arch routes across all 5 backends (NEON, SSE4.1, AVX2, AVX-512, wasm simd128) for all 8 formats. Fills in the let _ = use_simd; stubs in the new u8 RGBA dispatchers. Removes the use_simd no-op annotation. ~3k LOC.
Tranche 5b (after 5a): Add u16 RGBA SIMD per-arch routes + new write_rgba_u16_* SIMD store helpers (parallel to existing write_rgb_u16_* in src/row/arch/x86_common.rs and per-backend equivalents) + 8 MixedSinker<F>::with_rgba_u16 blocks across the high-bit sinker module + Strategy A wiring on the u16 path (consumes expand_rgb_u16_to_rgba_u16_row). ~3–4k LOC.
Tests: format-level + per-backend equivalence tests land alongside the SIMD impls in 5a / 5b — the scalar paths added here are exercised by the existing high-bit RGB tests' kernel reuse (they go through the same const-ALPHA template, so the ALPHA = false half of every test already covers the new shared kernel body).
Compile_fail doctest advance: stays pointing at Yuv420p10 until 5b (which is when MixedSinker with_rgba_u16 for Yuv420p10 lands).

Resolved Codex review findings during this branch

feat(0.1.0): row-primitive kernels with SIMD dispatch + Sink API (yuv420p, rgb→hsv, bgr↔rgb) #1 — pre-existing debug_assert!(BITS == 10 \|\| BITS == 12) on the Pn kernels: a release-only corruption trap if a future dispatcher misroutes P016 through the Pn family. Upgraded to const { assert!(...) } (compile-time monomorphization failure) on both the u8 and u16 Pn shared kernels. Comments on the Pn wrappers corrected to call out P016's separate kernel family explicitly.
feat(NV12): NV12(semi-planar 4:2:0) + fallible PixelSink contract #2 — initial scalar-only branch had no public dispatchers, so the new RGBA paths were unreachable. Added all 16 dispatchers in this PR; the new RGBA scalar wrappers now have callers and the #[allow(dead_code)] annotations were removed (kept only on expand_rgb_u16_to_rgba_u16_row whose consumer is in 5b).

Verification

cargo test --lib — 479 passed; 0 failed (unchanged — pure foundation; tests land with SIMD in 5a/5b)
cargo test --doc — 1 passed
RUSTFLAGS=-Dwarnings cargo clippy --lib --tests — clean (matches CI)
RUSTFLAGS=-Dwarnings cargo clippy --lib --no-default-features — clean
RUSTFLAGS=-Dwarnings cargo clippy --lib --no-default-features --features alloc — clean
cargo check --lib --target wasm32-unknown-unknown — clean
cargo check --lib --target x86_64-unknown-linux-gnu — clean

Test plan

CI green on test, test-sde-avx512, cross, coverage, clippy, build, miri-* jobs.
Spot-check that callers of the existing high-bit RGB dispatchers (yuv420p10_to_rgb_row etc.) still produce identical output — the _or_rgba::<false> refactor is a pure restructuring with no behavior change.
Verify the new RGBA dispatchers route through the scalar reference at the expected speed (this PR's intentional regression vs the future SIMD-enabled path; tracked for 5a/5b).

🤖 Generated with Claude Code

codecov · 2026-04-26T10:32:00Z

Codecov Report

❌ Patch coverage is 34.38596% with 187 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
src/row/mod.rs	0.00%	142 Missing ⚠️
src/row/scalar.rs	68.53%	45 Missing ⚠️

📢 Thoughts on this report? Let us know!

Copilot

Pull request overview

Adds high-bit-depth 4:2:0 RGBA support (prep work for “Ship 8” tranche) by introducing scalar RGBA kernels and wiring new public row dispatchers (currently scalar-only).

Changes:

Add scalar RGB→RGBA expansion for u16 rows and new RGBA variants for high-bit-depth planar/semi-planar YUV 4:2:0 conversions (u8 RGBA + native-depth u16 RGBA).
Refactor several existing scalar RGB kernels into shared RGB/RGBA kernels via a const ALPHA: bool monomorphization.
Add public RGBA row dispatchers for Yuv420p{9,10,12,14,16} and P01{0,2,6} plus a new rgba_row_elems() sizing helper.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 6 comments.

File	Description
`src/row/scalar.rs`	Introduces/extends scalar kernels to support RGBA outputs (u8 and u16) and adds a u16 RGB→RGBA fan-out helper; refactors into shared RGB/RGBA kernels.
`src/row/mod.rs`	Adds public RGBA dispatchers for high-bit 4:2:0 formats and a `u16` RGBA row sizing helper.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-26T10:36:32Z

+  }
+}
+
 // ---- High-bit-depth YUV 4:2:0 → RGB (BITS ∈ {10, 12, 14}) -------------


The section header comment says BITS ∈ {10, 12, 14}, but this module now supports BITS = 9 as well (via the compile-time guard and new 9-bit RGBA/RGB dispatchers). Please update the header to include 9-bit so documentation matches the actual supported set.

Suggested change

// ---- High-bit-depth YUV 4:2:0 → RGB (BITS ∈ {10, 12, 14}) -------------

// ---- High-bit-depth YUV 4:2:0 → RGB (BITS ∈ {9, 10, 12, 14}) ----------

Copilot · 2026-04-26T10:36:33Z

  matrix: ColorMatrix,
  full_range: bool,
 ) {
+  yuv_420p16_to_rgb_or_rgba_row::<false>(y, u_half, v_half, rgb_out, width, matrix, full_range);


This call is written as a long single line and will be reformatted by cargo fmt (CI runs cargo fmt -- --check). Please run rustfmt or format this call to match the rest of the file’s wrapping style.

Suggested change

yuv_420p16_to_rgb_or_rgba_row::<false>(y, u_half, v_half, rgb_out, width, matrix, full_range);

yuv_420p16_to_rgb_or_rgba_row::<false>(

y,

u_half,

v_half,

rgb_out,

width,

matrix,

full_range,

);

Copilot · 2026-04-26T10:36:33Z

+    y, u_half, v_half, rgba_out, width, matrix, full_range,
+  );
+}
+


New RGBA conversion entry points are introduced here, but there are no accompanying scalar tests validating (a) alpha is set to the expected opaque value (0xFF for u8 output / (1<<BITS)-1 for u16 output) and (b) the RGB bytes match the existing *_to_rgb_row outputs for the same inputs. Since src/row/scalar/tests.rs already covers the RGB variants, please add analogous tests for these RGBA wrappers/shared kernels.

Suggested change

#[cfg(test)]

mod tests {

use super::*;

fn assert_rgba_matches_rgb_and_has_opaque_alpha(rgb: &[u8], rgba: &[u8], width: usize) {

for x in 0..width {

assert_eq!(&rgba[4 * x..4 * x + 3], &rgb[3 * x..3 * x + 3]);

assert_eq!(rgba[4 * x + 3], 0xFF);

}

}

#[test]

fn yuv_420_to_rgba_row_matches_rgb_and_sets_opaque_alpha() {

let width = 4;

let y = [0u8, 64, 128, 255];

let u_half = [128u8, 128];

let v_half = [128u8, 128];

let mut rgb = [0u8; 12];

let mut rgba = [0u8; 16];

yuv_420_to_rgb_row(&y, &u_half, &v_half, &mut rgb, width, ColorMatrix::Bt601, true);

yuv_420_to_rgba_row(

&y,

&u_half,

&v_half,

&mut rgba,

width,

ColorMatrix::Bt601,

true,

);

assert_rgba_matches_rgb_and_has_opaque_alpha(&rgb, &rgba, width);

}

#[test]

fn yuv_420p10_to_rgba_row_matches_rgb_and_sets_opaque_alpha() {

let width = 4;

let y = [0u16, 256, 768, 1023];

let u_half = [512u16, 512];

let v_half = [512u16, 512];

let mut rgb = [0u8; 12];

let mut rgba = [0u8; 16];

yuv_420p_n_to_rgb_row::<10>(

&y,

&u_half,

&v_half,

&mut rgb,

width,

ColorMatrix::Bt601,

true,

);

yuv_420p_n_to_rgba_row::<10>(

&y,

&u_half,

&v_half,

&mut rgba,

width,

ColorMatrix::Bt601,

true,

);

assert_rgba_matches_rgb_and_has_opaque_alpha(&rgb, &rgba, width);

}

}

Copilot · 2026-04-26T10:36:33Z

+/// `use_simd = false` forces scalar. SIMD per-arch routes land in the
+/// follow-up Ship 8 Tranche 5a PR — for now this dispatcher always
+/// runs the scalar reference regardless of `use_simd`.


The docs currently say "use_simd = false forces scalar" but the implementation ignores use_simd entirely (let _ = use_simd; and always scalar). To avoid misleading callers, please document this explicitly in the rustdoc in the same style used elsewhere (e.g. "use_simd is currently a no-op"), until the SIMD branches land.

Copilot · 2026-04-26T10:36:33Z

+  assert!(rgba_out.len() >= rgba_min, "rgba_out row too short");
+
+  let _ = use_simd; // SIMD per-arch routes land in Ship 8 Tranche 5a.
+  scalar::yuv_420p_n_to_rgba_row::<9>(y, u_half, v_half, rgba_out, width, matrix, full_range);


This call (and several nearby ones) is long enough that cargo fmt will re-wrap it; CI enforces cargo fmt -- --check. Please run rustfmt before merging so formatting matches the project standard.

Suggested change

scalar::yuv_420p_n_to_rgba_row::<9>(y, u_half, v_half, rgba_out, width, matrix, full_range);

scalar::yuv_420p_n_to_rgba_row::<9>(

y,

u_half,

v_half,

rgba_out,

width,

matrix,

full_range,

);

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

## Summary Adds u8 RGBA SIMD across all 5 backends for high-bit 4:2:0 YUV (`yuv420p9/10/12/14/16`, `p010/p012/p016`) and wires them into the 8 high-bit u8 RGBA dispatchers in \`src/row/mod.rs\`. Builds on the scalar prep + dispatcher signatures landed in PR #24. The companion u16 RGBA SIMD work is deferred to Tranche 5b. ## Changes - **5 SIMD backends** — NEON / SSE4.1 / AVX2 / AVX-512 / wasm simd128 — each gain a const-generic \`*_to_rgb_or_rgba_row<BITS, ALPHA>\` template across 4 kernel families: - planar BITS-generic: \`yuv_420p_n_to_rgb_or_rgba_row<BITS={9,10,12,14}, ALPHA>\` - semi-planar BITS-generic: \`p_n_to_rgb_or_rgba_row<BITS={10,12}, ALPHA>\` (P016 has its own family) - 16-bit planar: \`yuv_420p16_to_rgb_or_rgba_row<ALPHA>\` - 16-bit semi-planar: \`p16_to_rgb_or_rgba_row<ALPHA>\` Existing RGB and new RGBA wrappers are thin shims over the shared template. Only the store (\`vst3q_u8\` vs \`vst4q_u8\`, \`write_rgb_*\` vs \`write_rgba_*\`) and the scalar tail dispatch branch on \`ALPHA\`; per-pixel math is unchanged. - **8 high-bit u8 RGBA dispatchers** wired in \`src/row/mod.rs\` (\`yuv420p9/10/12/14/16_to_rgba_row\`, \`p010/p012/p016_to_rgba_row\`) — replace the prior \`let _ = use_simd\` stubs with the standard \`cfg_select!\` per-arch route block, mirroring the existing RGB dispatchers. \`use_simd = false\` still forces scalar. - **Per-backend RGBA equivalence tests** — ~30 new \`#[test]\` functions across the 5 backend test modules. Each new x86 test gates on \`is_x86_feature_detected!\` so the suite stays clean under sanitizer/Miri/non-feature-flagged CI runners. - Compile-time \`const { assert!(BITS == ...) }\` retained on every shared template (was already a Codex-flagged hardening from prior tranches).

uqio added 4 commits April 26, 2026 21:51

update

6f6e213

update

80fbcda

update

809a0f9

update

5fc4ab5

al8n requested a review from Copilot April 26, 2026 10:27

al8n changed the title ~~feat: ship8 rgba high bit 420~~ feat(row): Ship 8 — high-bit 4:2:0 RGBA scalar (SIMD lands in 5a/5b) Apr 26, 2026

Copilot started reviewing on behalf of al8n April 26, 2026 10:31 View session

Copilot AI reviewed Apr 26, 2026

View reviewed changes

Update src/row/scalar.rs

ae2a471

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

uqio merged commit 32b03b5 into main Apr 26, 2026
43 checks passed

uqio deleted the feat/ship8-rgba-high-bit-420 branch April 26, 2026 10:40

al8n mentioned this pull request Apr 26, 2026

Ship 8 Tranche 5a: high-bit 4:2:0 RGBA u8 SIMD #25

Merged

7 tasks

This was referenced Apr 26, 2026

Ship 8 Tranche 5b: high-bit 4:2:0 RGBA u16 SIMD + sinker integration #26

Merged

feat(row): Ship 8 — high-bit 4:4:4 RGBA scalar (SIMD lands in 6b/6c) #29

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(row): Ship 8 — high-bit 4:2:0 RGBA scalar (SIMD lands in 5a/5b)#24

feat(row): Ship 8 — high-bit 4:2:0 RGBA scalar (SIMD lands in 5a/5b)#24
uqio merged 5 commits intomainfrom
feat/ship8-rgba-high-bit-420

uqio commented Apr 26, 2026 •

edited by al8n

Loading

Uh oh!

codecov Bot commented Apr 26, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 26, 2026

Uh oh!

Uh oh!

Copilot AI Apr 26, 2026

Uh oh!

Copilot AI Apr 26, 2026

Uh oh!

Copilot AI Apr 26, 2026

Uh oh!

Copilot AI Apr 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	// ---- High-bit-depth YUV 4:2:0 → RGB (BITS ∈ {10, 12, 14}) -------------
	// ---- High-bit-depth YUV 4:2:0 → RGB (BITS ∈ {9, 10, 12, 14}) ----------

+#[cfg(test)]
+mod tests {
+  use super::*;
+  fn assert_rgba_matches_rgb_and_has_opaque_alpha(rgb: &[u8], rgba: &[u8], width: usize) {
+    for x in 0..width {
+      assert_eq!(&rgba[4 * x..4 * x + 3], &rgb[3 * x..3 * x + 3]);
+      assert_eq!(rgba[4 * x + 3], 0xFF);
+    }
+  }
+  #[test]
+  fn yuv_420_to_rgba_row_matches_rgb_and_sets_opaque_alpha() {
+    let width = 4;
+    let y = [0u8, 64, 128, 255];
+    let u_half = [128u8, 128];
+    let v_half = [128u8, 128];
+    let mut rgb = [0u8; 12];
+    let mut rgba = [0u8; 16];
+    yuv_420_to_rgb_row(&y, &u_half, &v_half, &mut rgb, width, ColorMatrix::Bt601, true);
+    yuv_420_to_rgba_row(
+      &y,
+      &u_half,
+      &v_half,
+      &mut rgba,
+      width,
+      ColorMatrix::Bt601,
+      true,
+    );
+    assert_rgba_matches_rgb_and_has_opaque_alpha(&rgb, &rgba, width);
+  }
+  #[test]
+  fn yuv_420p10_to_rgba_row_matches_rgb_and_sets_opaque_alpha() {
+    let width = 4;
+    let y = [0u16, 256, 768, 1023];
+    let u_half = [512u16, 512];
+    let v_half = [512u16, 512];
+    let mut rgb = [0u8; 12];
+    let mut rgba = [0u8; 16];
+    yuv_420p_n_to_rgb_row::<10>(
+      &y,
+      &u_half,
+      &v_half,
+      &mut rgb,
+      width,
+      ColorMatrix::Bt601,
+      true,
+    );
+    yuv_420p_n_to_rgba_row::<10>(
+      &y,
+      &u_half,
+      &v_half,
+      &mut rgba,
+      width,
+      ColorMatrix::Bt601,
+      true,
+    );
+    assert_rgba_matches_rgb_and_has_opaque_alpha(&rgb, &rgba, width);
+  }
+}

Conversation

uqio commented Apr 26, 2026 • edited by al8n Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Scope

What's in this PR

Public API — 16 new dispatcher functions in src/row/mod.rs

Kernel work — 8 new const-ALPHA scalar templates + 13 RGBA wrappers + u16 expand helper

What's deferred

Resolved Codex review findings during this branch

Verification

Test plan

Uh oh!

codecov Bot commented Apr 26, 2026

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 26, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI Apr 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 26, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

uqio commented Apr 26, 2026 •

edited by al8n

Loading

Public API — 16 new dispatcher functions in `src/row/mod.rs`