feat(simd): missing-lanes sweep — U16x16/U32x8/U64x4/I32x8/I64x4#179
Conversation
…ss all backends PR #178's matrix audit surfaced five 256-bit int lane types that were either entirely missing or stranded in `simd_nightly` only. Adds them across every backend so `crate::simd::{U16x16, U32x8, U64x4, I32x8, I64x4}` resolves uniformly on v3 / v4 / native / nightly / scalar / aarch64 paths. `src/simd_avx2.rs` + 5× `avx2_int_type!` instantiations producing scalar-storage `[$elem; $lanes]` polyfills (align 64). Same macro pattern as the existing 512-bit polyfills (U8x64, U16x32, …). Native AVX2 `__m256i` upgrades are TD-SIMD-3. + 5× lowercase aliases (`u16x16 = U16x16`, etc.) matching the std::simd convention used by every other lane type in the file. `src/simd_scalar.rs` + 5× `impl_int_type!` instantiations mirroring the AVX2 polyfills above. Consumers on non-x86/non-aarch64 (wasm32, riscv, thumb) reach the same type names through `crate::simd::*`. + Lowercase aliases. `src/simd_avx512.rs` + Re-export of the new types from `simd_avx2` so the v4 dispatch arm in `simd.rs` can surface them without forking the macro into this file. Both files are already gated on `target_arch = "x86_64"`, so the re-export is cheap. Native `__m256i` upgrades here are TD-SIMD-3 (same story as the v3 polyfills). `src/simd_nightly/u_word_types.rs` + `U16x16` wrapper backed by `core::simd::u16x16`. Same API surface as the existing 32-/16-/8-lane wrappers — splat, from_slice, from_array, to_array, copy_to_slice, reduce_{sum,min,max}, simd_min/max, cmpeq_mask, cmpgt_mask, Default. `src/simd_nightly/i_word_types.rs` + `I32x8` and `I64x4` wrappers backed by `core::simd::{i32x8, i64x4}`. Same API surface as siblings; PartialEq via array compare. `src/simd_nightly/mod.rs` + Re-exports for the three new types + lowercase aliases. `src/simd.rs` + All 5 dispatch arms (nightly, v4, v3, aarch64, scalar fallback) updated to surface the new types through `crate::simd::*`. `.claude/knowledge/simd-dispatch-architecture.md` + Parity matrix updated — the five rows previously marked ❌ across most backends now show 🟠 polyfill (v3, v4-via-v3, scalar) / 🔵 (nightly via `core::simd`). Verified: `cargo check` clean under default v3 features and under `-Ctarget-cpu=x86-64-v4` (via `CARGO_TARGET_X86_64_UNKNOWN_LINUX_GNU_RUSTFLAGS` + explicit `--target` so build scripts don't SIGILL on non-AVX-512 runners — same pattern as the tier4-avx512-check job).
f6a12ca to
2ef97c0
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f6a12cae0b
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| f32x16, f32x8, f64x4, f64x8, i16x16, i16x32, i32x16, i32x8, i64x4, i64x8, i8x32, i8x64, u16x16, u16x32, u32x16, | ||
| u32x8, u64x4, u64x8, u8x32, u8x64, BF16x16, BF16x8, F16x16, F32Mask16, F32Mask8, F32x16, F32x8, F64Mask4, F64Mask8, | ||
| F64x4, F64x8, I16x16, I16x32, I32x16, I32x8, I64x4, I64x8, I8x32, I8x64, U16x16, U16x32, U32x16, U32x8, U64x4, |
There was a problem hiding this comment.
Add nightly arithmetic/bitwise ops for newly exported int lanes
This change publicly exposes U16x16, I32x8, and I64x4 through crate::simd::* when nightly-simd is enabled, but src/simd_nightly/ops.rs still only implements impl_int_ops!/impl_int_neg! for the older set (U16x32, U32x16, U32x8, U64x8, U64x4, I16x16, I16x32, I32x16, I64x8). As a result, code that uses operators like +, -, &, |, ^ (and unary - for signed types) on these new lanes will fail to compile only under the nightly backend, creating backend-specific API breakage.
Useful? React with 👍 / 👎.
Summary
Implements the five missing 256-bit int lane types surfaced by PR #178's matrix audit:
U16x16U32x8U64x4I32x8I64x4Consumers can now
use crate::simd::{U16x16, U32x8, U64x4, I32x8, I64x4}on any backend without an#[cfg]dance.Files touched
src/simd_avx2.rs— 5×avx2_int_type!instantiations (scalar-storage polyfills, align 64) + lowercase aliases. Native AVX2__m256iupgrade tracked as TD-SIMD-3.src/simd_scalar.rs— 5×impl_int_type!instantiations mirroring the AVX2 polyfills + lowercase aliases. Consumers on wasm32 / riscv / thumb reach the same names.src/simd_avx512.rs— re-exports the new types fromsimd_avx2so the v4 dispatch arm can surface them without forking the macro. Native__m256iupgrade also TD-SIMD-3.src/simd_nightly/u_word_types.rs—U16x16wrapper backed bycore::simd::u16x16.src/simd_nightly/i_word_types.rs—I32x8,I64x4wrappers backed bycore::simd::{i32x8, i64x4}.src/simd_nightly/mod.rs— re-exports + lowercase aliases.src/simd.rs— all 5 dispatch arms (nightly, v4, v3, aarch64, scalar fallback) updated..claude/knowledge/simd-dispatch-architecture.md— parity matrix updated.Verification
cargo check -p ndarray --features approx,serde,rayon— clean (v3 default, 44 s cold).CARGO_TARGET_X86_64_UNKNOWN_LINUX_GNU_RUSTFLAGS="-D warnings -Ctarget-cpu=x86-64-v4" cargo check --target=x86_64-unknown-linux-gnu -p ndarray --features approx,serde,rayon— clean (v4 path, 17 s cached). Same RUSTFLAGS env /--targetpattern as thetier4-avx512-checkCI job, so build scripts don't SIGILL on non-AVX-512 runners.cargo fmt --all --check— clean.Out of scope
Native
__m256iintrinsic upgrades for these wrappers (currently all scalar-storage polyfills on x86) are tracked as TD-SIMD-3 — same task that already covers the existing 512-bit AVX2 polyfills.Generated by Claude Code