feat(0.1.0): row-primitive kernels with SIMD dispatch + Sink API (yuv420p, rgb→hsv, bgr↔rgb)#1
feat(0.1.0): row-primitive kernels with SIMD dispatch + Sink API (yuv420p, rgb→hsv, bgr↔rgb)#1
Conversation
There was a problem hiding this comment.
Pull request overview
This PR introduces a YUV420p (4:2:0 planar) conversion pipeline built around a Sink-based API, with scalar row primitives and multiple SIMD backends (NEON + x86 tiers) selected via runtime/compile-time dispatch.
Changes:
- Adds a YUV420p row-walker kernel (
yuv420p_to) and associated row/sink types. - Introduces
MixedSinkerto optionally produce RGB, luma, and/or HSV outputs per row. - Implements row-level primitives with scalar + SIMD backends (aarch64 NEON, x86 SSE4.1/AVX2/AVX-512) plus benches and CI coverage/benchmark workflows.
Reviewed changes
Copilot reviewed 26 out of 26 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| src/yuv/yuv420p.rs | Adds YUV420p kernel + Yuv420pRow/Yuv420pSink row model. |
| src/yuv/mod.rs | Exposes YUV module and re-exports YUV420p APIs. |
| src/sinker/mod.rs | Adds sinker module entry + re-export for MixedSinker. |
| src/sinker/mixed.rs | Implements MixedSinker (RGB/Luma/HSV), plus unit tests. |
| src/row/scalar.rs | Adds scalar reference implementations for YUV420→RGB, RGB→HSV, and channel swap. |
| src/row/mod.rs | Adds public row primitive dispatchers and CPU-feature detection helpers. |
| src/row/arch/mod.rs | Wires up arch-specific SIMD backend modules. |
| src/row/arch/neon.rs | Adds aarch64 NEON implementations for row primitives + equivalence tests. |
| src/row/arch/x86_common.rs | Adds shared x86 SIMD helpers (interleave/deinterleave, swap, HSV helper). |
| src/row/arch/x86_sse41.rs | Adds x86_64 SSE4.1 backend + equivalence tests. |
| src/row/arch/x86_avx2.rs | Adds x86_64 AVX2 backend + equivalence tests. |
| src/row/arch/x86_avx512.rs | Adds x86_64 AVX-512 backend + equivalence tests. |
| src/frame.rs | Adds validated Yuv420pFrame with error reporting and tests. |
| src/lib.rs | Establishes crate public API surface and top-level docs/types. |
| docs/color-conversion-functions.md | Adds design/inventory doc for the Sink-based conversion plan. |
| benches/yuv_420_to_rgb.rs | Adds Criterion benchmark for YUV420→RGB row primitive. |
| benches/rgb_to_hsv.rs | Adds Criterion benchmark for RGB→HSV row primitive. |
| benches/foo.rs | Removes placeholder benchmark. |
| ci/miri_tb.sh | Adjusts Miri invocation flags/targets. |
| ci/miri_sb.sh | Adjusts Miri invocation flags/targets. |
| Cargo.toml | Renames crate to colconv, updates features/deps, configures benches. |
| .github/workflows/loc.yml | Updates gist upload key/name and github-script version. |
| .github/workflows/ci.yml | Updates scheduled cadence and cargo-hack invocations; removes loom/old coverage job. |
| .github/workflows/coverage.yml | Adds multi-platform/tier tarpaulin coverage workflow + Codecov upload. |
| .github/workflows/benchmark.yml | Adds multi-platform benchmark workflow + result aggregation/commenting. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| pub fn yuv_420_to_rgb_row( | ||
| y: &[u8], | ||
| u_half: &[u8], | ||
| v_half: &[u8], | ||
| rgb_out: &mut [u8], | ||
| width: usize, | ||
| matrix: ColorMatrix, | ||
| full_range: bool, | ||
| use_simd: bool, | ||
| ) { | ||
| if use_simd { | ||
| cfg_select! { | ||
| target_arch = "aarch64" => { | ||
| if neon_available() { | ||
| // SAFETY: `neon_available()` verified NEON is present on this | ||
| // CPU. Bounds / parity invariants are the caller's obligation | ||
| // (same contract as the scalar reference); they are checked | ||
| // with `debug_assert` in debug builds. | ||
| unsafe { | ||
| arch::neon::yuv_420_to_rgb_row(y, u_half, v_half, rgb_out, width, matrix, full_range); | ||
| } | ||
| return; |
There was a problem hiding this comment.
These row dispatchers are pub fn (safe) but may call unsafe SIMD kernels that rely on unchecked pointer arithmetic. In release builds there are only debug_assert!s for width parity and slice lengths, so invalid inputs could cause UB. Add non-debug assert!/length checks before any SIMD call (or make these APIs unsafe and document the invariants).
| //! Shared helpers for the x86_64 SIMD backends. | ||
| //! | ||
| //! Items here use only SSE2 + SSSE3 intrinsics, so they're safe to | ||
| //! call from any x86 backend at SSSE3 or above (currently SSE4.1 and | ||
| //! AVX2; AVX‑512 will reuse them too). `#[inline(always)]` guarantees | ||
| //! they inline into the caller, inheriting its `#[target_feature]` | ||
| //! context. | ||
|
|
||
| use core::arch::x86_64::{ | ||
| __m128, __m128i, _mm_add_ps, _mm_blendv_ps, _mm_cmpeq_ps, _mm_cmplt_ps, _mm_cvtepi32_ps, | ||
| _mm_cvtepu8_epi32, _mm_cvttps_epi32, _mm_loadu_si128, _mm_max_ps, _mm_min_ps, _mm_mul_ps, | ||
| _mm_or_si128, _mm_packus_epi16, _mm_packus_epi32, _mm_rcp_ps, _mm_set1_ps, _mm_setr_epi8, | ||
| _mm_setzero_ps, _mm_shuffle_epi8, _mm_srli_si128, _mm_storeu_si128, _mm_sub_ps, | ||
| }; |
There was a problem hiding this comment.
The module header claims these helpers use only SSE2+SSSE3, but this file uses SSE4.1 intrinsics like _mm_cvtepu8_epi32, _mm_packus_epi32, and _mm_blendv_ps. Please update the comment (and any feature assumptions) so it’s clear SSE4.1 is required for these helpers.
Benchmark ResultsBenchmark Results SummaryDate: 2026-04-18 12:26:10 UTC Benchmark Results for macos-aarch64-neonSystem Information
allBenchmark Results for macos-aarch64-scalarSystem Information
allBenchmark Results for ubuntu-x86_64-avx2-maxSystem Information
allBenchmark Results for ubuntu-x86_64-defaultSystem Information
allBenchmark Results for ubuntu-x86_64-nativeSystem Information
allBenchmark Results for ubuntu-x86_64-scalarSystem Information
allBenchmark Results for ubuntu-x86_64-sse41-maxSystem Information
allBenchmark Results for windows-x86_64-defaultSystem Information
allView detailed resultsDetailed Criterion results have been uploaded as artifacts. Download them from the workflow run to view charts and detailed statistics. |
Benchmark ResultsBenchmark Results SummaryDate: 2026-04-18 12:27:58 UTC Benchmark Results for macos-aarch64-neonSystem Information
allBenchmark Results for macos-aarch64-scalarSystem Information
allBenchmark Results for ubuntu-x86_64-avx2-maxSystem Information
allBenchmark Results for ubuntu-x86_64-defaultSystem Information
allBenchmark Results for ubuntu-x86_64-nativeSystem Information
allBenchmark Results for ubuntu-x86_64-scalarSystem Information
allBenchmark Results for ubuntu-x86_64-sse41-maxSystem Information
allBenchmark Results for windows-x86_64-defaultSystem Information
allView detailed resultsDetailed Criterion results have been uploaded as artifacts. Download them from the workflow run to view charts and detailed statistics. |
Welcome to Codecov 🎉Once you merge this PR into your default branch, you're all set! Codecov will compare coverage reports and display results in all future pull requests. ℹ️ You can also turn on project coverage checks and project coverage reporting on Pull Request comment Thanks for integrating Codecov - We've got you covered ☂️ |
Benchmark ResultsBenchmark Results SummaryDate: 2026-04-18 12:50:08 UTC Benchmark Results for macos-aarch64-neonSystem Information
allBenchmark Results for macos-aarch64-scalarSystem Information
allBenchmark Results for ubuntu-x86_64-avx2-maxSystem Information
allBenchmark Results for ubuntu-x86_64-defaultSystem Information
allBenchmark Results for ubuntu-x86_64-nativeSystem Information
allBenchmark Results for ubuntu-x86_64-scalarSystem Information
allBenchmark Results for ubuntu-x86_64-sse41-maxSystem Information
allBenchmark Results for windows-x86_64-defaultSystem Information
allView detailed resultsDetailed Criterion results have been uploaded as artifacts. Download them from the workflow run to view charts and detailed statistics. |
Benchmark ResultsBenchmark Results SummaryDate: 2026-04-18 13:01:53 UTC Benchmark Results for macos-aarch64-neonSystem Information
allBenchmark Results for macos-aarch64-scalarSystem Information
allBenchmark Results for ubuntu-x86_64-avx2-maxSystem Information
allBenchmark Results for ubuntu-x86_64-defaultSystem Information
allBenchmark Results for ubuntu-x86_64-nativeSystem Information
allBenchmark Results for ubuntu-x86_64-scalarSystem Information
allBenchmark Results for ubuntu-x86_64-sse41-maxSystem Information
allBenchmark Results for windows-x86_64-defaultSystem Information
allView detailed resultsDetailed Criterion results have been uploaded as artifacts. Download them from the workflow run to view charts and detailed statistics. |
Benchmark ResultsBenchmark Results SummaryDate: 2026-04-18 13:43:17 UTC Benchmark Results for macos-aarch64-neonSystem Information
allBenchmark Results for macos-aarch64-scalarSystem Information
allBenchmark Results for ubuntu-x86_64-avx2-maxSystem Information
allBenchmark Results for ubuntu-x86_64-defaultSystem Information
allBenchmark Results for ubuntu-x86_64-nativeSystem Information
allBenchmark Results for ubuntu-x86_64-scalarSystem Information
allBenchmark Results for ubuntu-x86_64-sse41-maxSystem Information
allBenchmark Results for windows-x86_64-defaultSystem Information
allView detailed resultsDetailed Criterion results have been uploaded as artifacts. Download them from the workflow run to view charts and detailed statistics. |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 26 out of 27 changed files in this pull request and generated 7 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // Luma — YUV420p luma *is* the Y plane. Just copy. | ||
| if let Some(luma) = luma.as_deref_mut() { | ||
| luma[idx * w..(idx + 1) * w].copy_from_slice(&row.y()[..w]); | ||
| } |
There was a problem hiding this comment.
luma[idx * w..(idx + 1) * w] will panic with an unhelpful slice error if the caller’s luma buffer is too short for the incoming row index. Consider adding an explicit bounds check/assert with a clearer message (or track/validate expected height up front) to make contract violations easier to diagnose.
| let rgb_row: &mut [u8] = match rgb.as_deref_mut() { | ||
| Some(buf) => &mut buf[idx * w * 3..(idx + 1) * w * 3], | ||
| None => { |
There was a problem hiding this comment.
The RGB output path slices buf[idx * w * 3..(idx + 1) * w * 3] without validating the buffer length against the current row index. This will panic on short buffers and the error won’t clearly indicate which output is undersized. Consider adding an explicit assert (and possibly documenting/encoding expected height) before slicing.
| if let Some(hsv) = hsv.as_mut() { | ||
| rgb_to_hsv_row( | ||
| rgb_row, | ||
| &mut hsv.h[idx * w..(idx + 1) * w], | ||
| &mut hsv.s[idx * w..(idx + 1) * w], | ||
| &mut hsv.v[idx * w..(idx + 1) * w], | ||
| w, |
There was a problem hiding this comment.
HSV output slices (hsv.h[idx * w..], etc.) also assume the caller provided >= width * height bytes per plane, but MixedSinker doesn’t validate this. Adding explicit asserts here (or validating once when the sink is configured) would provide clearer failures than a generic slice panic.
| #[test] | ||
| fn yuv420_bgr_black() { | ||
| // Full-range Y=0, neutral chroma → black. | ||
| let y = [0u8; 4]; | ||
| let u = [128u8; 2]; | ||
| let v = [128u8; 2]; | ||
| let mut rgb = [0u8; 12]; | ||
| yuv_420_to_rgb_row(&y, &u, &v, &mut rgb, 4, ColorMatrix::Bt601, true); | ||
| assert!(rgb.iter().all(|&c| c == 0), "got {rgb:?}"); | ||
| } | ||
|
|
||
| #[test] | ||
| fn yuv420_bgr_white_full_range() { | ||
| let y = [255u8; 4]; | ||
| let u = [128u8; 2]; | ||
| let v = [128u8; 2]; | ||
| let mut rgb = [0u8; 12]; | ||
| yuv_420_to_rgb_row(&y, &u, &v, &mut rgb, 4, ColorMatrix::Bt601, true); | ||
| assert!(rgb.iter().all(|&c| c == 255), "got {rgb:?}"); | ||
| } | ||
|
|
||
| #[test] | ||
| fn yuv420_bgr_gray_is_gray() { | ||
| let y = [128u8; 4]; | ||
| let u = [128u8; 2]; | ||
| let v = [128u8; 2]; | ||
| let mut rgb = [0u8; 12]; | ||
| yuv_420_to_rgb_row(&y, &u, &v, &mut rgb, 4, ColorMatrix::Bt601, true); | ||
| for x in 0..4 { | ||
| let (b, g, r) = (rgb[x * 3], rgb[x * 3 + 1], rgb[x * 3 + 2]); | ||
| assert_eq!(b, g); |
There was a problem hiding this comment.
Several YUV→RGB scalar tests are named yuv420_bgr_* and destructure pixels as (b, g, r), but yuv_420_to_rgb_row is documented to output packed R,G,B. Even though many assertions pass due to grayscale inputs, the naming/order here is misleading and could mask channel-order regressions—consider renaming tests/variables to match RGB semantics.
Benchmark ResultsBenchmark Results SummaryDate: 2026-04-18 13:48:51 UTC Benchmark Results for macos-aarch64-neonSystem Information
allBenchmark Results for macos-aarch64-scalarSystem Information
allBenchmark Results for ubuntu-x86_64-avx2-maxSystem Information
allBenchmark Results for ubuntu-x86_64-defaultSystem Information
allBenchmark Results for ubuntu-x86_64-nativeSystem Information
allBenchmark Results for ubuntu-x86_64-scalarSystem Information
allBenchmark Results for ubuntu-x86_64-sse41-maxSystem Information
allBenchmark Results for windows-x86_64-defaultSystem Information
allView detailed resultsDetailed Criterion results have been uploaded as artifacts. Download them from the workflow run to view charts and detailed statistics. |
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Benchmark ResultsBenchmark Results SummaryDate: 2026-04-18 14:01:04 UTC Benchmark Results for macos-aarch64-neonSystem Information
allBenchmark Results for macos-aarch64-scalarSystem Information
allBenchmark Results for ubuntu-x86_64-avx2-maxSystem Information
allBenchmark Results for ubuntu-x86_64-defaultSystem Information
allBenchmark Results for ubuntu-x86_64-nativeSystem Information
allBenchmark Results for ubuntu-x86_64-scalarSystem Information
allBenchmark Results for ubuntu-x86_64-sse41-maxSystem Information
allBenchmark Results for windows-x86_64-defaultSystem Information
allView detailed resultsDetailed Criterion results have been uploaded as artifacts. Download them from the workflow run to view charts and detailed statistics. |
Benchmark ResultsBenchmark Results SummaryDate: 2026-04-18 14:03:13 UTC Benchmark Results for macos-aarch64-neonSystem Information
allBenchmark Results for macos-aarch64-scalarSystem Information
allBenchmark Results for ubuntu-x86_64-avx2-maxSystem Information
allBenchmark Results for ubuntu-x86_64-defaultSystem Information
allBenchmark Results for ubuntu-x86_64-nativeSystem Information
allBenchmark Results for ubuntu-x86_64-scalarSystem Information
allBenchmark Results for ubuntu-x86_64-sse41-maxSystem Information
allBenchmark Results for windows-x86_64-defaultSystem Information
allView detailed resultsDetailed Criterion results have been uploaded as artifacts. Download them from the workflow run to view charts and detailed statistics. |
Benchmark ResultsBenchmark Results SummaryDate: 2026-04-18 14:10:03 UTC Benchmark Results for macos-aarch64-neonSystem Information
allBenchmark Results for macos-aarch64-scalarSystem Information
allBenchmark Results for ubuntu-x86_64-avx2-maxSystem Information
allBenchmark Results for ubuntu-x86_64-defaultSystem Information
allBenchmark Results for ubuntu-x86_64-nativeSystem Information
allBenchmark Results for ubuntu-x86_64-scalarSystem Information
allBenchmark Results for ubuntu-x86_64-sse41-maxSystem Information
allBenchmark Results for windows-x86_64-defaultSystem Information
allView detailed resultsDetailed Criterion results have been uploaded as artifacts. Download them from the workflow run to view charts and detailed statistics. |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 26 out of 27 changed files in this pull request and generated 6 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 26 out of 27 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| /// Bit‑identical to the scalar reference. Every scalar op has the | ||
| /// same SIMD counterpart in the same order: `vmaxq_f32` / `vminq_f32` | ||
| /// mirror `f32::max` / `f32::min`; `vdivq_f32` is true f32 division | ||
| /// (not reciprocal estimate); branch cascade uses `vbslq_f32` in the | ||
| /// same `delta == 0 → v == r → v == g → v == b` priority. |
There was a problem hiding this comment.
The docstring for rgb_to_hsv_row claims the NEON path is “byte‑identical” / “Bit‑identical” to scalar::rgb_to_hsv_row, but the NEON test suite below explicitly documents and tolerates ±1 LSB differences (and circular hue distance). Please align this API documentation with the actual numerical contract (e.g., “matches within ±1 LSB” and note the hue wrap behavior).
| /// Bit‑identical to the scalar reference. Every scalar op has the | |
| /// same SIMD counterpart in the same order: `vmaxq_f32` / `vminq_f32` | |
| /// mirror `f32::max` / `f32::min`; `vdivq_f32` is true f32 division | |
| /// (not reciprocal estimate); branch cascade uses `vbslq_f32` in the | |
| /// same `delta == 0 → v == r → v == g → v == b` priority. | |
| /// Numerically matches the scalar reference within the tolerance | |
| /// accepted by the NEON equivalence tests, rather than bit‑identically | |
| /// for every lane. In particular, `S` and `V` may differ by up to | |
| /// ±1 LSB, and `H` is compared with circular wraparound semantics, so | |
| /// values that differ by 1 across the 0/255 boundary are equivalent. | |
| /// The SIMD path still mirrors the scalar algorithm closely: | |
| /// `vmaxq_f32` / `vminq_f32` mirror `f32::max` / `f32::min`; | |
| /// `vdivq_f32` is true f32 division (not a reciprocal estimate); and | |
| /// the branch cascade uses `vbslq_f32` in the same | |
| /// `delta == 0 → v == r → v == g → v == b` priority. |
| // The 3-plane × (slice, stride, dim) shape is intrinsic to YUV 4:2:0; | ||
| // `div_ceil` on u32 isn't const-stable yet, so the `(x + 1) / 2` | ||
| // idiom stays. | ||
| #[allow(clippy::too_many_arguments)] | ||
| pub const fn try_new( | ||
| y: &'a [u8], | ||
| u: &'a [u8], | ||
| v: &'a [u8], | ||
| width: u32, | ||
| height: u32, | ||
| y_stride: u32, | ||
| u_stride: u32, | ||
| v_stride: u32, | ||
| ) -> Result<Self, Yuv420pFrameError> { | ||
| if width == 0 || height == 0 { | ||
| return Err(Yuv420pFrameError::ZeroDimension { width, height }); | ||
| } | ||
| if width & 1 != 0 || height & 1 != 0 { | ||
| return Err(Yuv420pFrameError::OddDimension { width, height }); | ||
| } | ||
| if y_stride < width { | ||
| return Err(Yuv420pFrameError::YStrideTooSmall { width, y_stride }); | ||
| } | ||
| let chroma_width = width.div_ceil(2); | ||
| if u_stride < chroma_width { | ||
| return Err(Yuv420pFrameError::UStrideTooSmall { |
There was a problem hiding this comment.
try_new is a const fn, but the comment says div_ceil “isn't const-stable yet” while the code calls width.div_ceil(2) / height.div_ceil(2). If div_ceil is indeed not const-stable on the crate MSRV, this won’t compile; if it is const-stable now, the comment is misleading. Since width/height are already validated even, consider replacing these with width / 2 and height / 2 (or (x + 1) / 2 if you want to support odd sizes later) and update/remove the comment accordingly.
Benchmark ResultsBenchmark Results SummaryDate: 2026-04-18 14:26:17 UTC Benchmark Results for macos-aarch64-neonSystem Information
allBenchmark Results for macos-aarch64-scalarSystem Information
allBenchmark Results for ubuntu-x86_64-avx2-maxSystem Information
allBenchmark Results for ubuntu-x86_64-defaultSystem Information
allBenchmark Results for ubuntu-x86_64-nativeSystem Information
allBenchmark Results for ubuntu-x86_64-scalarSystem Information
allBenchmark Results for ubuntu-x86_64-sse41-maxSystem Information
allBenchmark Results for windows-x86_64-defaultSystem Information
allView detailed resultsDetailed Criterion results have been uploaded as artifacts. Download them from the workflow run to view charts and detailed statistics. |
Benchmark ResultsBenchmark Results SummaryDate: 2026-04-18 14:42:50 UTC Benchmark Results for macos-aarch64-neonSystem Information
allBenchmark Results for macos-aarch64-scalarSystem Information
allBenchmark Results for ubuntu-x86_64-avx2-maxSystem Information
allBenchmark Results for ubuntu-x86_64-defaultSystem Information
allBenchmark Results for ubuntu-x86_64-nativeSystem Information
allBenchmark Results for ubuntu-x86_64-scalarSystem Information
allBenchmark Results for ubuntu-x86_64-sse41-maxSystem Information
allBenchmark Results for windows-x86_64-defaultSystem Information
allView detailed resultsDetailed Criterion results have been uploaded as artifacts. Download them from the workflow run to view charts and detailed statistics. |
Resolves the medium-severity Codex finding (round 7): finite WB
gains and CCM coefficients can be arbitrarily large under the
existing `is_finite()` check, and a pathological metadata pipeline
could produce values that overflow per-pixel f32 matmul during
fusion or per-pixel multiply-add — clamps would saturate (Inf →
255) or NaN-cast (NaN → 0), silently corrupting pixels.
Bounded both validators to `1e6` (`WhiteBalance::MAX_GAIN`,
`ColorCorrectionMatrix::MAX_COEFFICIENT_ABS`):
- Real-world WB gains are O(1–10) (extreme tungsten ~3, daylight
~1.5–2). 1e6 is six orders of magnitude over typical, but
21 orders under the value at which f32 matmul could overflow
given 16-bit samples — provides massive headroom while closing
the door on bad metadata.
- Real-world CCM coefficients are O(1–5) (crosstalk subtraction
on the off-diagonals can be negative ~-0.3, diagonal entries
~1.0–1.5). Same 1e6 bound, with both positive and negative
cap (`-MAX..=MAX`).
- Overflow analysis spelled out in the doc comments: the largest
per-channel sum at the bound is `3 * 1e6 * 1e6 * 65535 ≈
1.97e17`, ~21 orders of magnitude under `f32::MAX ≈ 3.4e38`.
No Inf, no NaN possible from validated inputs.
New error variants:
- `WhiteBalanceError::OutOfBounds { channel, value, max }`
- `ColorCorrectionMatrixError::OutOfBounds { row, col, value, max_abs }`
- Both `#[non_exhaustive]`, `IsVariant`-derived, `thiserror::Error`.
- Public constants `WhiteBalance::MAX_GAIN` and
`ColorCorrectionMatrix::MAX_COEFFICIENT_ABS` so callers can
introspect the bound rather than hard-coding.
Codex's high-severity Finding #1 (Bayer16 sample-range failure
panics rather than returning Err) is **rejected** for this PR
with rationale on the GitHub thread: matches the crate-wide
pattern used by `Yuv420pFrame16` / `Yuv422pFrame16` /
`Yuv444pFrame16` / `Yuv440pFrame16` (each pairs `try_new` /
`try_new_checked` the same way), Bayer16 diverging would create
asymmetry, and the recoverable path already exists via
`BayerFrame16::try_new_checked`. Adding a wrapper-error walker
signature for the unchecked-input case is unjustified API surface
for a contract that's well-documented and consistent with the
established crate-wide model.
Tests (6 new, all green):
- `wb_try_new_rejects_extreme_finite_gain` — `1e10` rejected
even though it's finite + non-negative.
- `wb_try_new_accepts_value_at_bound` — exactly `MAX_GAIN`
permitted (boundary inclusive).
- `ccm_try_new_rejects_extreme_finite_coefficient` — `1e30`
rejected as positive overflow risk.
- `ccm_try_new_rejects_extreme_negative_coefficient` — `-1e10`
rejected (symmetric bound).
- `ccm_try_new_accepts_typical_negative_off_diagonal` — real
CCM with crosstalk subtraction validates cleanly.
- `fuse_wb_ccm_at_bounds_with_max_sample_stays_finite` —
worst-case stress: every coefficient at the bound, every
sample at u16 max; per-pixel sum stays finite (no overflow).
Full lib suite: 398 tests passing (was 392; +6). wasm32 +
x86_64 cross-targets clean; `cargo doc` clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
First substantive PR landing the v0.1.0 surface of
colconv:cfg_select!, Rust 1.95+):yuv_420_to_rgb_row— YUV 4:2:0 → packed RGB, chroma upsampled in registers.rgb_to_hsv_row— OpenCV-compatible 8-bit HSV (integer LUT on scalar, rcp+Newton on x86, native divide on NEON/wasm).bgr_to_rgb_row/rgb_to_bgr_row— channel-swap primitive.std, compile-time onno_std).PixelSinktrait + per-format subtrait (Yuv420pSink); shipsMixedSinkerthat writes any subset of{RGB, Luma, HSV}into caller buffers with a lazily-allocated scratch for HSV-without-RGB.Yuv420pFramewith stride-aware validated constructor.ColorMatrix:Bt601/Bt709/Bt2020Ncl/Smpte240m/Fcc/YCgCo(non-exhaustive).use_simdtoggle on the dispatcher andMixedSinker::with_simdfor A/B scalar-vs-SIMD benchmarking on the same input.row,yuv, and thePixelSinktrait compile with--no-default-features; onlyMixedSinkerrequiresallocfor itsVec<u8>scratch.CI
ci.yml: rustfmt, clippy (each-feature), feature-powerset build/test on Linux/macOS/Windows, cross-compile matrix (15 targets incl. wasm32, riscv64, powerpc64), Miri (TB + SB) on 7 targets, sanitizers, Intel SDE runner for AVX-512 coverage under Ice Lake emulation.coverage.yml: 7-tier Codecov matrix (macOS NEON / scalar, Linux x86_64 default / avx2-max / sse41-max / scalar, Windows) so every dispatcher branch is exercised.benchmark.yml: Criterion benches (yuv_420_to_rgb,rgb_to_hsv) with per-SIMD-tier runs and PR-comment integration.Docs
docs/color-conversion-functions.md— full design rationale + 48-entry per-format implementation plan.docs/hardware-decode-with-ffmpeg-next.md— how to feed VideoToolbox / VA-API / CUDA / D3D11VA hardware-decoded NV12/P010 frames into colconv.Test plan
cargo test --all-features— 38 passingcargo hack check --feature-powerset(7 combos incl.--no-default-features)cargo hack clippy --each-feature -- -D warningscargo fmt --checkNext
Hardware-decoder follow-ups (NV12 / P010 for VideoToolbox + VA-API HDR path) will land in a separate PR.