feat(row): Ship 8 — high-bit 4:4:4 RGBA scalar (SIMD lands in 6b/6c) by uqio · Pull Request #29 · Findit-AI/colconv

uqio · 2026-04-27T00:09:29Z

Summary

First of three sub-PRs that add high-bit-depth 4:4:4 RGBA support — exactly mirroring the 3-PR structure that landed high-bit 4:2:0 RGBA in PRs #24 → #25 → #26.

This PR (analogous to #24): scalar reference RGBA kernels + 16 public dispatchers. The dispatchers are SIMD-shaped (accept use_simd: bool) but always route to scalar — the use_simd flag is held in the signature so the follow-up SIMD PRs (6b: u8 SIMD, 6c: u16 SIMD + sinker) wire per-arch routes without breaking call-site signatures.

Why ship the scalar layer first

Three reasons match the 4:2:0 precedent:

Call-site stability: callers can wire `with_rgba` / `with_rgba_u16` once; subsequent SIMD PRs become non-API-breaking changes.
Reviewability: the scalar layer is the numerical reference — easier to validate in isolation than alongside 5 backends × 8 kernels of SIMD math.
Bisect surface: if a future SIMD PR introduces a regression, the scalar reference is a stable byte-identical baseline to compare against.

The dispatchers' `use_simd` parameter is deliberately ignored in this PR — see the "Out of scope" + "Codex review verdict" sections below.

Changes

Scalar (`src/row/scalar.rs`)

8 new const-ALPHA template wrappers + 8 thin RGB shims over those templates. Mirrors the established 4:2:0 pattern (e.g. yuv_420p_n_to_rgb_or_rgba_row<BITS, ALPHA>):

Family	Shared template	RGB wrapper	RGBA wrapper
Yuv444p_n (BITS ∈ {9,10,12,14})	`yuv_444p_n_to_rgb_or_rgba_row<BITS, ALPHA>`	`yuv_444p_n_to_rgb_row<BITS>`	`yuv_444p_n_to_rgba_row<BITS>`
Yuv444p_n u16	`yuv_444p_n_to_rgb_or_rgba_u16_row<BITS, ALPHA>`	`yuv_444p_n_to_rgb_u16_row<BITS>`	`yuv_444p_n_to_rgba_u16_row<BITS>`
Yuv444p16 (16-bit dedicated)	`yuv_444p16_to_rgb_or_rgba_row<ALPHA>`	`yuv_444p16_to_rgb_row`	`yuv_444p16_to_rgba_row`
Yuv444p16 u16	`yuv_444p16_to_rgb_or_rgba_u16_row<ALPHA>`	`yuv_444p16_to_rgb_u16_row`	`yuv_444p16_to_rgba_u16_row`
P_n_444 (BITS ∈ {10,12})	`p_n_444_to_rgb_or_rgba_row<BITS, ALPHA>`	`p_n_444_to_rgb_row<BITS>`	`p_n_444_to_rgba_row<BITS>`
P_n_444 u16	`p_n_444_to_rgb_or_rgba_u16_row<BITS, ALPHA>`	`p_n_444_to_rgb_u16_row<BITS>`	`p_n_444_to_rgba_u16_row<BITS>`
P_n_444_16 (P416)	`p_n_444_16_to_rgb_or_rgba_row<ALPHA>`	`p_n_444_16_to_rgb_row`	`p_n_444_16_to_rgba_row`
P_n_444_16 u16	`p_n_444_16_to_rgb_or_rgba_u16_row<ALPHA>`	`p_n_444_16_to_rgb_u16_row`	`p_n_444_16_to_rgba_u16_row`

Alpha contracts:

u8 RGBA: 0xFF (always opaque, no alpha plane in source)
u16 RGBA, BITS-generic: (1 << BITS) - 1
u16 RGBA, 16-bit dedicated: 0xFFFF

Each shared template preserves the original RGB function's const { assert!(BITS == ...) } BITS guard.

Dispatchers (`src/row/mod.rs`)

16 new public dispatchers under a new // ---- High-bit 4:4:4 RGBA dispatchers (Ship 8 Tranche 6 prep) ---- section header:

u8 RGBA (8): yuv444p9_to_rgba_row, yuv444p10_to_rgba_row, yuv444p12_to_rgba_row, yuv444p14_to_rgba_row, yuv444p16_to_rgba_row, p410_to_rgba_row, p412_to_rgba_row, p416_to_rgba_row.
u16 RGBA (8): same names + _u16_row suffix.

Each dispatcher validates slice lengths (mirroring the existing RGB dispatchers' bounds checks), then drops use_simd and calls the scalar reference:

let _ = use_simd; // SIMD per-arch routes land in Ship 8 Tranche 6b (u8) / 6c (u16) PR.
scalar::<fn>(...);

This is identical to how PR #24 staged the 4:2:0 dispatchers before #25/#26 wired SIMD.

Tests (`src/row/scalar/tests.rs`)

+6 scalar reference tests, one per kernel family:

yuv_444p_n_to_rgba_row::<10> gray-to-gray (validates u8 alpha = 0xFF)
yuv_444p_n_to_rgba_u16_row::<10> gray-to-gray (validates alpha = 1023)
yuv_444p16_to_rgba_row gray-to-gray (validates alpha = 0xFF)
yuv_444p16_to_rgba_u16_row gray-to-gray (validates alpha = 0xFFFF)
p_n_444_to_rgba_row::<10> gray-to-gray
p_n_444_16_to_rgba_u16_row gray-to-gray (validates alpha = 0xFFFF)

These exercise the scalar code path and pin the alpha contract per family. SIMD equivalence tests (per backend × per kernel × per BITS) land alongside the SIMD wiring in 6b/6c.

Out of scope (deferred to follow-up sub-PRs)

Per-arch SIMD kernels (NEON / SSE4.1 / AVX2 / AVX-512 / wasm simd128) — Tranche 6b for u8 RGBA, 6c for u16 RGBA. The use_simd: bool parameter is held in every dispatcher's signature now so 6b/6c can wire cfg_select! blocks without changing the public API.
Sinker integration (MixedSinker<Yuv444p9..16>, <P410/P412/P416>, <Yuv440p10/12>) — Tranche 6c, alongside u16 SIMD.
Per-format RGBA equivalence tests (5 backends × 6 widths × 6 matrices × 2 ranges) — land with their respective SIMD backends in 6b/6c.

Codex adversarial review verdict

Verdict: needs-attention, with one finding flagging that use_simd is silently ignored in the new dispatchers. This is intentional and matches the established precedent set by PR #24 (titled "SIMD lands in 5a/5b"); see PR #24 description for the same rationale. The flag will become meaningful when 6b (u8 SIMD) and 6c (u16 SIMD) wire per-arch routes — as it did for the 4:2:0 family in PRs #25 and #26.

Codex did not flag any correctness or design issues with the scalar refactor itself.

Test plan

cargo test --lib: 513 pass (was 507 before the new scalar tests; +6 in this PR)
cargo check --tests --lib clean across host (aarch64-darwin), x86_64-unknown-freebsd, wasm32-unknown-unknown
RUSTFLAGS=\"-Dwarnings\" cargo clippy --lib --tests clean (zero dead-code warnings — every new template is consumed by its RGB+RGBA wrapper pair, and every new dispatcher consumes its scalar)
cargo doc --lib --no-deps clean (56 doc warnings, all pre-existing baseline)

Follow-ups

Tranche 6b: u8 RGBA SIMD across all 5 backends + per-backend equivalence tests.
Tranche 6c: u16 RGBA SIMD across all 5 backends + sinker integration (Yuv444p9/10/12/14/16, P410/P412/P416, Yuv440p10/12) + sinker tests.

🤖 Generated with Claude Code

Copilot

Pull request overview

Adds the scalar-reference layer and public row-level API surface for high-bit-depth 4:4:4 → RGBA conversions (u8 and native-depth u16), mirroring the earlier high-bit 4:2:0 staged rollout so SIMD wiring can land later without signature churn.

Changes:

Add scalar shared-template kernels for high-bit 4:4:4 RGBA across planar (Yuv444p{9,10,12,14,16}) and semi-planar (P410/P412/P416) families, with constant-opaque alpha.
Add 16 new public row dispatchers in row:: for u8 RGBA and u16 RGBA outputs (currently scalar-routed; use_simd intentionally ignored for now).
Add scalar reference tests that pin the alpha contract for representative kernel families.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File	Description
src/row/scalar.rs	Introduces shared `_to_rgb_or_rgba_` scalar templates and RGBA wrappers for high-bit 4:4:4 planar + semi-planar kernels.
src/row/mod.rs	Adds 16 new public high-bit 4:4:4 RGBA dispatchers (u8 + u16) that validate slice lengths and route to scalar.
src/row/scalar/tests.rs	Adds 6 scalar tests validating gray mapping sanity and the required opaque-alpha values per family.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-27T00:16:29Z

  // Compile-time guard — fails monomorphization for any BITS outside
-  // {10, 12, 14}. The 16-bit path lives in `yuv_444p16_to_rgb_row`
+  // {9, 10, 12, 14}. The 16-bit path lives in `yuv_444p16_to_rgb_row`
  // (i32 u8-output kernel family). Without this guard a caller
-  // invoking ::<16> would reach the NEON clamp where
+  // invoking ::<16, _> would reach the NEON clamp where
  // `(1 << BITS) - 1 as i16` silently wraps to -1.


The compile-time guard comment here references a “NEON clamp”, but this is the scalar kernel implementation. Consider rewording this comment to describe the generic 16-bit footgun (e.g., i16/i32 clamping/overflow behavior) without mentioning a specific SIMD backend, or move the backend-specific rationale to the SIMD dispatcher/docs.

The 4:4:4 high-bit YUV planar SIMD docs claimed `BITS ∈ {10, 12, 14}` across all 5 backends, but the const-assert in every implementation accepts `BITS == 9 || 10 || 12 || 14` and the `yuv444p9_to_rgba_row` public dispatcher (added in PR #29) instantiates the kernel with `<9>`. The doc string was stale from before BITS=9 was added in Ship 6b. Updates both the const-generic bound (`{10, 12, 14}` → `{9, 10, 12, 14}`) and the prose bit-list (`10/12/14-bit` → `9/10/12/14-bit`) on every 4:4:4 planar SIMD doc — covers the u8 RGB, u8 RGBA (added in this PR), and u16 RGB siblings across NEON, SSE4.1, AVX2, AVX-512, and wasm simd128. 23 lines updated total. Addresses Copilot review comments on PR #30. Also retroactively fixes the matching drift on the u16 RGB and pre-existing u8 RGB docs that Copilot didn't explicitly flag but had identical wording. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

update

d88b249

al8n changed the title ~~update~~ feat(row): Ship 8 — high-bit 4:4:4 RGBA scalar (SIMD lands in 6b/6c) Apr 27, 2026

al8n requested a review from Copilot April 27, 2026 00:12

Copilot started reviewing on behalf of al8n April 27, 2026 00:12 View session

Copilot AI reviewed Apr 27, 2026

View reviewed changes

al8n merged commit 85277c2 into main Apr 27, 2026
47 checks passed

al8n deleted the feat/ship8-rgba-high-bit-444-scalar branch April 27, 2026 00:17

al8n pushed a commit that referenced this pull request Apr 27, 2026

feat(row): Ship 8 — high-bit 4:4:4 RGBA scalar (#29)

0742f73

al8n mentioned this pull request Apr 27, 2026

Ship 8 Tranche 7b: high-bit 4:4:4 RGBA u8 SIMD #30

Merged

4 tasks

al8n mentioned this pull request Apr 27, 2026

Ship 8 Tranche 7c: high-bit 4:4:4 RGBA u16 SIMD + sinker integration #31

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(row): Ship 8 — high-bit 4:4:4 RGBA scalar (SIMD lands in 6b/6c)#29

feat(row): Ship 8 — high-bit 4:4:4 RGBA scalar (SIMD lands in 6b/6c)#29
al8n merged 1 commit intomainfrom
feat/ship8-rgba-high-bit-444-scalar

uqio commented Apr 27, 2026 •

edited by al8n

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

uqio commented Apr 27, 2026 • edited by al8n Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why ship the scalar layer first

Changes

Scalar (src/row/scalar.rs)

Dispatchers (src/row/mod.rs)

Tests (src/row/scalar/tests.rs)

Out of scope (deferred to follow-up sub-PRs)

Codex adversarial review verdict

Test plan

Follow-ups

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

uqio commented Apr 27, 2026 •

edited by al8n

Loading

Scalar (`src/row/scalar.rs`)

Dispatchers (`src/row/mod.rs`)

Tests (`src/row/scalar/tests.rs`)