feat(simd): Phase 2 — wire simd_nightly into crate::simd::* dispatch + matrix fix#173
Conversation
Phase 2 of the integration plan in `.claude/knowledge/
simd-dispatch-architecture.md`.
simd.rs
-------
Adds a top-priority `feature = "nightly-simd"` dispatch arm that
re-exports the full `simd_nightly::*` portable-SIMD type set through
`crate::simd::*`. No `target_arch` constraint — `core::simd` is portable,
so the same arm catches wasm32 / riscv / aarch64 / x86_64.
Tightens the predicate on every other type-re-export arm to
`not(feature = "nightly-simd")`:
* AVX-512 (avx512f)
* AVX-512BF16 (BF16x8/16 types)
* AVX2 baseline (the v3 default arm)
* U8x32 (cross-tier export)
* aarch64 NEON
* non-x86/non-aarch64 scalar fallback
* the inline `mod scalar` declaration itself
Result: when `cargo +nightly --features nightly-simd ...` is used, every
`use crate::simd::F32x16` call site routes to the portable-SIMD
implementation — and miri can actually execute it (it treats `_mm*`
intrinsics as opaque, but `core::simd::*` runs fine).
BF16 conversion FUNCTIONS (bf16_to_f32_batch etc.) are NOT gated under
the nightly arm: they're scalar/intrinsic functions taking primitive
slices, not the SIMD types, and they coexist cleanly with the portable
backend.
architecture doc
----------------
Parity matrix updated to reflect what `src/simd_avx2.rs` actually ships.
Previous matrix marked U8x64 / I8x64 / I16x32 / I32x16 / I64x8 /
U16x32 / U32x16 / U64x8 as ❌ in the AVX2 column. On survey those types
exist via the `avx2_int_type!` macro — full API-parity structs with
`[$elem; $lanes]` scalar storage (align 64). New 🟠 marker introduced
to distinguish "struct exists with API, storage is scalar" from "true
two-half SIMD composite" (🟡). I8x32 / I16x16 also corrected: they
share the AVX-512 `__m256i` definition (re-exported through
`simd_avx2`'s `pub use crate::simd_avx512::{i16x16, i8x32, ...}`).
The remaining AVX2 vectorization gap (filling 🟠 → 🟡 with real
two-half SIMD ops) is tracked separately as TD-SIMD-3.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f857a812d1
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| pub use crate::simd_nightly::{ | ||
| BF16x16, BF16x8, F16x16, F32Mask16, F32Mask8, F32x16, F32x8, F64Mask4, F64Mask8, F64x4, F64x8, I16x16, I16x32, | ||
| I32x16, I64x8, I8x32, I8x64, U16x32, U32x16, U32x8, U64x4, U64x8, U8x32, U8x64, | ||
| }; |
There was a problem hiding this comment.
Preserve lowercase SIMD aliases in nightly dispatch
When nightly-simd is enabled, this arm shadows all other crate::simd::* re-export branches, but it only exports F32x16-style names and drops lowercase aliases like f32x16, f64x8, and u8x64 that were previously available from crate::simd. Any downstream code (and current docs) importing ndarray::simd::f32x16 will fail to compile under cargo +nightly --features nightly-simd, so the feature now causes an API break instead of just swapping backend implementation.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Fixed in a7ca029. Added 17 lowercase aliases (f32x16, f64x8, u8x64, i32x16, i64x8, u32x16, u64x8, f32x8, f64x4, u8x32, u16x32, u32x8, u64x4, i8x64, i8x32, i16x32, i16x16) to src/simd_nightly/mod.rs matching the convention already in simd_avx2.rs, simd_avx512.rs, and src/simd_scalar.rs. Wired them into the nightly dispatch arm's pub use list so they surface through crate::simd::*. cargo +nightly --features nightly-simd is back to "backend swap, not API break."
Generated by Claude Code
…rustfmt Codex P1 on PR #173: the nightly-simd dispatch arm shadowed every other re-export branch but dropped the lowercase aliases (`f32x16`, `f64x8`, `u8x64`, etc.) that downstream code and docs reach for. Switching on `--features nightly-simd` was therefore an API break instead of a backend swap. `src/simd_nightly/mod.rs` + 17 lowercase aliases (`f32x16` = `F32x16` etc.) matching the convention already present in `simd_avx2.rs`, `simd_avx512.rs`, and `src/simd_scalar.rs`. Reaches every numeric type the other backends expose under their lowercase names. `src/simd.rs` + Lowercase names added to the nightly dispatch arm's `pub use` list so they reach `crate::simd::*` alongside the PascalCase ones. Also runs rustfmt against the file — the multi-clause `#[cfg(all(...))]` attrs that Phase 2 introduced exceed the column budget and rustfmt prefers the vertical block layout.
Summary
Phase 2 of the integration plan in
.claude/knowledge/simd-dispatch-architecture.md.simd.rs
feature = "nightly-simd"dispatch arm that re-exports the fullsimd_nightly::*portable-SIMD type set throughcrate::simd::*. Notarget_archconstraint —core::simdis portable.not(feature = "nightly-simd")so the portable backend cleanly overrides intrinsics when opted in (AVX-512, AVX-512BF16, AVX2 baseline, U8x32, aarch64 NEON, non-x86/non-aarch64 scalar fallback, inlinemod scalar).bf16_to_f32_batchetc.) are NOT gated — they take primitive slices, not SIMD types, and coexist cleanly with the portable backend.Result:
cargo +nightly --features nightly-simd testandmirinow route everyuse crate::simd::F32x16call site throughsimd_nightly, whichcore::simd::*can actually execute. The intrinsic backends remain opaque to miri.Architecture doc
Parity matrix corrected. The previous version marked U8x64 / I8x64 / I16x32 / I32x16 / I64x8 / U16x32 / U32x16 / U64x8 as ❌ in the AVX2 column. On survey those types exist via
simd_avx2'savx2_int_type!macro — full API-parity structs with[$elem; $lanes]scalar storage (align 64). New 🟠 marker added to distinguish "struct exists with API, storage is scalar" from "true two-half SIMD composite" (🟡). I8x32 / I16x16 also corrected — they share the AVX-512__m256idefinitions re-exported throughsimd_avx2.The remaining AVX2 vectorization gap (filling 🟠 → 🟡 with real two-half SIMD ops) is tracked as TD-SIMD-3.
Test plan
cargo build— default path unchangedcargo +nightly build --features nightly-simd— portable backend reachable viacrate::simd::*tests/{1.95.0,stable,beta}paths unchangedGenerated by Claude Code