feat(simd): missing-lanes sweep — U16x16/U32x8/U64x4/I32x8/I64x4 by AdaWorldAPI · Pull Request #179 · AdaWorldAPI/ndarray

AdaWorldAPI · 2026-05-20T16:55:44Z

Summary

Implements the five missing 256-bit int lane types surfaced by PR #178's matrix audit:

Type	Was	Now
`U16x16`	Missing on every backend (incl. nightly)	🟠 v3, v4 via re-export, scalar, 🔵 nightly
`U32x8`	Nightly-only	🟠 v3, v4 via re-export, scalar, 🔵 nightly
`U64x4`	Nightly-only	🟠 v3, v4 via re-export, scalar, 🔵 nightly
`I32x8`	Missing on every backend	🟠 v3, v4 via re-export, scalar, 🔵 nightly
`I64x4`	Missing on every backend	🟠 v3, v4 via re-export, scalar, 🔵 nightly

Consumers can now use crate::simd::{U16x16, U32x8, U64x4, I32x8, I64x4} on any backend without an #[cfg] dance.

Files touched

src/simd_avx2.rs — 5× avx2_int_type! instantiations (scalar-storage polyfills, align 64) + lowercase aliases. Native AVX2 __m256i upgrade tracked as TD-SIMD-3.
src/simd_scalar.rs — 5× impl_int_type! instantiations mirroring the AVX2 polyfills + lowercase aliases. Consumers on wasm32 / riscv / thumb reach the same names.
src/simd_avx512.rs — re-exports the new types from simd_avx2 so the v4 dispatch arm can surface them without forking the macro. Native __m256i upgrade also TD-SIMD-3.
src/simd_nightly/u_word_types.rs — U16x16 wrapper backed by core::simd::u16x16.
src/simd_nightly/i_word_types.rs — I32x8, I64x4 wrappers backed by core::simd::{i32x8, i64x4}.
src/simd_nightly/mod.rs — re-exports + lowercase aliases.
src/simd.rs — all 5 dispatch arms (nightly, v4, v3, aarch64, scalar fallback) updated.
.claude/knowledge/simd-dispatch-architecture.md — parity matrix updated.

Verification

cargo check -p ndarray --features approx,serde,rayon — clean (v3 default, 44 s cold).
CARGO_TARGET_X86_64_UNKNOWN_LINUX_GNU_RUSTFLAGS="-D warnings -Ctarget-cpu=x86-64-v4" cargo check --target=x86_64-unknown-linux-gnu -p ndarray --features approx,serde,rayon — clean (v4 path, 17 s cached). Same RUSTFLAGS env / --target pattern as the tier4-avx512-check CI job, so build scripts don't SIGILL on non-AVX-512 runners.
cargo fmt --all --check — clean.

Out of scope

Native __m256i intrinsic upgrades for these wrappers (currently all scalar-storage polyfills on x86) are tracked as TD-SIMD-3 — same task that already covers the existing 512-bit AVX2 polyfills.

Generated by Claude Code

…ss all backends PR #178's matrix audit surfaced five 256-bit int lane types that were either entirely missing or stranded in `simd_nightly` only. Adds them across every backend so `crate::simd::{U16x16, U32x8, U64x4, I32x8, I64x4}` resolves uniformly on v3 / v4 / native / nightly / scalar / aarch64 paths. `src/simd_avx2.rs` + 5× `avx2_int_type!` instantiations producing scalar-storage `[$elem; $lanes]` polyfills (align 64). Same macro pattern as the existing 512-bit polyfills (U8x64, U16x32, …). Native AVX2 `__m256i` upgrades are TD-SIMD-3. + 5× lowercase aliases (`u16x16 = U16x16`, etc.) matching the std::simd convention used by every other lane type in the file. `src/simd_scalar.rs` + 5× `impl_int_type!` instantiations mirroring the AVX2 polyfills above. Consumers on non-x86/non-aarch64 (wasm32, riscv, thumb) reach the same type names through `crate::simd::*`. + Lowercase aliases. `src/simd_avx512.rs` + Re-export of the new types from `simd_avx2` so the v4 dispatch arm in `simd.rs` can surface them without forking the macro into this file. Both files are already gated on `target_arch = "x86_64"`, so the re-export is cheap. Native `__m256i` upgrades here are TD-SIMD-3 (same story as the v3 polyfills). `src/simd_nightly/u_word_types.rs` + `U16x16` wrapper backed by `core::simd::u16x16`. Same API surface as the existing 32-/16-/8-lane wrappers — splat, from_slice, from_array, to_array, copy_to_slice, reduce_{sum,min,max}, simd_min/max, cmpeq_mask, cmpgt_mask, Default. `src/simd_nightly/i_word_types.rs` + `I32x8` and `I64x4` wrappers backed by `core::simd::{i32x8, i64x4}`. Same API surface as siblings; PartialEq via array compare. `src/simd_nightly/mod.rs` + Re-exports for the three new types + lowercase aliases. `src/simd.rs` + All 5 dispatch arms (nightly, v4, v3, aarch64, scalar fallback) updated to surface the new types through `crate::simd::*`. `.claude/knowledge/simd-dispatch-architecture.md` + Parity matrix updated — the five rows previously marked ❌ across most backends now show 🟠 polyfill (v3, v4-via-v3, scalar) / 🔵 (nightly via `core::simd`). Verified: `cargo check` clean under default v3 features and under `-Ctarget-cpu=x86-64-v4` (via `CARGO_TARGET_X86_64_UNKNOWN_LINUX_GNU_RUSTFLAGS` + explicit `--target` so build scripts don't SIGILL on non-AVX-512 runners — same pattern as the tier4-avx512-check job).

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f6a12cae0b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-20T16:59:19Z

+    f32x16, f32x8, f64x4, f64x8, i16x16, i16x32, i32x16, i32x8, i64x4, i64x8, i8x32, i8x64, u16x16, u16x32, u32x16,
+    u32x8, u64x4, u64x8, u8x32, u8x64, BF16x16, BF16x8, F16x16, F32Mask16, F32Mask8, F32x16, F32x8, F64Mask4, F64Mask8,
+    F64x4, F64x8, I16x16, I16x32, I32x16, I32x8, I64x4, I64x8, I8x32, I8x64, U16x16, U16x32, U32x16, U32x8, U64x4,


Add nightly arithmetic/bitwise ops for newly exported int lanes

This change publicly exposes U16x16, I32x8, and I64x4 through crate::simd::* when nightly-simd is enabled, but src/simd_nightly/ops.rs still only implements impl_int_ops!/impl_int_neg! for the older set (U16x32, U32x16, U32x8, U64x8, U64x4, I16x16, I16x32, I32x16, I64x8). As a result, code that uses operators like +, -, &, |, ^ (and unary - for signed types) on these new lanes will fail to compile only under the nightly backend, creating backend-specific API breakage.

Useful? React with 👍 / 👎.

AdaWorldAPI force-pushed the claude/pr-x-missing-lanes branch from f6a12ca to 2ef97c0 Compare May 20, 2026 16:57

chatgpt-codex-connector Bot reviewed May 20, 2026

View reviewed changes

AdaWorldAPI merged commit 4556d81 into master May 20, 2026
17 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(simd): missing-lanes sweep — U16x16/U32x8/U64x4/I32x8/I64x4#179

feat(simd): missing-lanes sweep — U16x16/U32x8/U64x4/I32x8/I64x4#179
AdaWorldAPI merged 1 commit into
masterfrom
claude/pr-x-missing-lanes

AdaWorldAPI commented May 20, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot May 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

AdaWorldAPI commented May 20, 2026

Summary

Files touched

Verification

Out of scope

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 20, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants