Skip to content

feat(simd): missing-lanes sweep — U16x16/U32x8/U64x4/I32x8/I64x4#179

Merged
AdaWorldAPI merged 1 commit into
masterfrom
claude/pr-x-missing-lanes
May 20, 2026
Merged

feat(simd): missing-lanes sweep — U16x16/U32x8/U64x4/I32x8/I64x4#179
AdaWorldAPI merged 1 commit into
masterfrom
claude/pr-x-missing-lanes

Conversation

@AdaWorldAPI
Copy link
Copy Markdown
Owner

Summary

Implements the five missing 256-bit int lane types surfaced by PR #178's matrix audit:

Type Was Now
U16x16 Missing on every backend (incl. nightly) 🟠 v3, v4 via re-export, scalar, 🔵 nightly
U32x8 Nightly-only 🟠 v3, v4 via re-export, scalar, 🔵 nightly
U64x4 Nightly-only 🟠 v3, v4 via re-export, scalar, 🔵 nightly
I32x8 Missing on every backend 🟠 v3, v4 via re-export, scalar, 🔵 nightly
I64x4 Missing on every backend 🟠 v3, v4 via re-export, scalar, 🔵 nightly

Consumers can now use crate::simd::{U16x16, U32x8, U64x4, I32x8, I64x4} on any backend without an #[cfg] dance.

Files touched

  • src/simd_avx2.rs — 5× avx2_int_type! instantiations (scalar-storage polyfills, align 64) + lowercase aliases. Native AVX2 __m256i upgrade tracked as TD-SIMD-3.
  • src/simd_scalar.rs — 5× impl_int_type! instantiations mirroring the AVX2 polyfills + lowercase aliases. Consumers on wasm32 / riscv / thumb reach the same names.
  • src/simd_avx512.rs — re-exports the new types from simd_avx2 so the v4 dispatch arm can surface them without forking the macro. Native __m256i upgrade also TD-SIMD-3.
  • src/simd_nightly/u_word_types.rsU16x16 wrapper backed by core::simd::u16x16.
  • src/simd_nightly/i_word_types.rsI32x8, I64x4 wrappers backed by core::simd::{i32x8, i64x4}.
  • src/simd_nightly/mod.rs — re-exports + lowercase aliases.
  • src/simd.rs — all 5 dispatch arms (nightly, v4, v3, aarch64, scalar fallback) updated.
  • .claude/knowledge/simd-dispatch-architecture.md — parity matrix updated.

Verification

  • cargo check -p ndarray --features approx,serde,rayon — clean (v3 default, 44 s cold).
  • CARGO_TARGET_X86_64_UNKNOWN_LINUX_GNU_RUSTFLAGS="-D warnings -Ctarget-cpu=x86-64-v4" cargo check --target=x86_64-unknown-linux-gnu -p ndarray --features approx,serde,rayon — clean (v4 path, 17 s cached). Same RUSTFLAGS env / --target pattern as the tier4-avx512-check CI job, so build scripts don't SIGILL on non-AVX-512 runners.
  • cargo fmt --all --check — clean.

Out of scope

Native __m256i intrinsic upgrades for these wrappers (currently all scalar-storage polyfills on x86) are tracked as TD-SIMD-3 — same task that already covers the existing 512-bit AVX2 polyfills.


Generated by Claude Code

…ss all backends

PR #178's matrix audit surfaced five 256-bit int lane types that were
either entirely missing or stranded in `simd_nightly` only. Adds them
across every backend so `crate::simd::{U16x16, U32x8, U64x4, I32x8,
I64x4}` resolves uniformly on v3 / v4 / native / nightly / scalar /
aarch64 paths.

`src/simd_avx2.rs`
  + 5× `avx2_int_type!` instantiations producing scalar-storage
    `[$elem; $lanes]` polyfills (align 64). Same macro pattern as the
    existing 512-bit polyfills (U8x64, U16x32, …). Native AVX2 `__m256i`
    upgrades are TD-SIMD-3.
  + 5× lowercase aliases (`u16x16 = U16x16`, etc.) matching the
    std::simd convention used by every other lane type in the file.

`src/simd_scalar.rs`
  + 5× `impl_int_type!` instantiations mirroring the AVX2 polyfills
    above. Consumers on non-x86/non-aarch64 (wasm32, riscv, thumb)
    reach the same type names through `crate::simd::*`.
  + Lowercase aliases.

`src/simd_avx512.rs`
  + Re-export of the new types from `simd_avx2` so the v4 dispatch
    arm in `simd.rs` can surface them without forking the macro into
    this file. Both files are already gated on `target_arch = "x86_64"`,
    so the re-export is cheap. Native `__m256i` upgrades here are
    TD-SIMD-3 (same story as the v3 polyfills).

`src/simd_nightly/u_word_types.rs`
  + `U16x16` wrapper backed by `core::simd::u16x16`. Same API surface
    as the existing 32-/16-/8-lane wrappers — splat, from_slice,
    from_array, to_array, copy_to_slice, reduce_{sum,min,max},
    simd_min/max, cmpeq_mask, cmpgt_mask, Default.

`src/simd_nightly/i_word_types.rs`
  + `I32x8` and `I64x4` wrappers backed by `core::simd::{i32x8, i64x4}`.
    Same API surface as siblings; PartialEq via array compare.

`src/simd_nightly/mod.rs`
  + Re-exports for the three new types + lowercase aliases.

`src/simd.rs`
  + All 5 dispatch arms (nightly, v4, v3, aarch64, scalar fallback)
    updated to surface the new types through `crate::simd::*`.

`.claude/knowledge/simd-dispatch-architecture.md`
  + Parity matrix updated — the five rows previously marked ❌ across
    most backends now show 🟠 polyfill (v3, v4-via-v3, scalar) /
    🔵 (nightly via `core::simd`).

Verified: `cargo check` clean under default v3 features and under
`-Ctarget-cpu=x86-64-v4` (via `CARGO_TARGET_X86_64_UNKNOWN_LINUX_GNU_RUSTFLAGS`
+ explicit `--target` so build scripts don't SIGILL on non-AVX-512
runners — same pattern as the tier4-avx512-check job).
@AdaWorldAPI AdaWorldAPI force-pushed the claude/pr-x-missing-lanes branch from f6a12ca to 2ef97c0 Compare May 20, 2026 16:57
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f6a12cae0b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/simd.rs
Comment on lines +223 to +225
f32x16, f32x8, f64x4, f64x8, i16x16, i16x32, i32x16, i32x8, i64x4, i64x8, i8x32, i8x64, u16x16, u16x32, u32x16,
u32x8, u64x4, u64x8, u8x32, u8x64, BF16x16, BF16x8, F16x16, F32Mask16, F32Mask8, F32x16, F32x8, F64Mask4, F64Mask8,
F64x4, F64x8, I16x16, I16x32, I32x16, I32x8, I64x4, I64x8, I8x32, I8x64, U16x16, U16x32, U32x16, U32x8, U64x4,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Add nightly arithmetic/bitwise ops for newly exported int lanes

This change publicly exposes U16x16, I32x8, and I64x4 through crate::simd::* when nightly-simd is enabled, but src/simd_nightly/ops.rs still only implements impl_int_ops!/impl_int_neg! for the older set (U16x32, U32x16, U32x8, U64x8, U64x4, I16x16, I16x32, I32x16, I64x8). As a result, code that uses operators like +, -, &, |, ^ (and unary - for signed types) on these new lanes will fail to compile only under the nightly backend, creating backend-specific API breakage.

Useful? React with 👍 / 👎.

@AdaWorldAPI AdaWorldAPI merged commit 4556d81 into master May 20, 2026
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants