Skip to content

feat(simd): Phase 2 — wire simd_nightly into crate::simd::* dispatch + matrix fix#173

Merged
AdaWorldAPI merged 2 commits into
masterfrom
claude/pr-x-phase2-nightly-dispatch
May 20, 2026
Merged

feat(simd): Phase 2 — wire simd_nightly into crate::simd::* dispatch + matrix fix#173
AdaWorldAPI merged 2 commits into
masterfrom
claude/pr-x-phase2-nightly-dispatch

Conversation

@AdaWorldAPI
Copy link
Copy Markdown
Owner

Summary

Phase 2 of the integration plan in .claude/knowledge/simd-dispatch-architecture.md.

simd.rs

  • Added a top-priority feature = "nightly-simd" dispatch arm that re-exports the full simd_nightly::* portable-SIMD type set through crate::simd::*. No target_arch constraint — core::simd is portable.
  • Tightened every other type-re-export arm to not(feature = "nightly-simd") so the portable backend cleanly overrides intrinsics when opted in (AVX-512, AVX-512BF16, AVX2 baseline, U8x32, aarch64 NEON, non-x86/non-aarch64 scalar fallback, inline mod scalar).
  • BF16 conversion functions (bf16_to_f32_batch etc.) are NOT gated — they take primitive slices, not SIMD types, and coexist cleanly with the portable backend.

Result: cargo +nightly --features nightly-simd test and miri now route every use crate::simd::F32x16 call site through simd_nightly, which core::simd::* can actually execute. The intrinsic backends remain opaque to miri.

Architecture doc

Parity matrix corrected. The previous version marked U8x64 / I8x64 / I16x32 / I32x16 / I64x8 / U16x32 / U32x16 / U64x8 as ❌ in the AVX2 column. On survey those types exist via simd_avx2's avx2_int_type! macro — full API-parity structs with [$elem; $lanes] scalar storage (align 64). New 🟠 marker added to distinguish "struct exists with API, storage is scalar" from "true two-half SIMD composite" (🟡). I8x32 / I16x16 also corrected — they share the AVX-512 __m256i definitions re-exported through simd_avx2.

The remaining AVX2 vectorization gap (filling 🟠 → 🟡 with real two-half SIMD ops) is tracked as TD-SIMD-3.

Test plan

  • cargo build — default path unchanged
  • cargo +nightly build --features nightly-simd — portable backend reachable via crate::simd::*
  • CI green on PR — all existing tests/{1.95.0,stable,beta} paths unchanged

Generated by Claude Code

Phase 2 of the integration plan in `.claude/knowledge/
simd-dispatch-architecture.md`.

simd.rs
-------

Adds a top-priority `feature = "nightly-simd"` dispatch arm that
re-exports the full `simd_nightly::*` portable-SIMD type set through
`crate::simd::*`. No `target_arch` constraint — `core::simd` is portable,
so the same arm catches wasm32 / riscv / aarch64 / x86_64.

Tightens the predicate on every other type-re-export arm to
`not(feature = "nightly-simd")`:
  * AVX-512 (avx512f)
  * AVX-512BF16 (BF16x8/16 types)
  * AVX2 baseline (the v3 default arm)
  * U8x32 (cross-tier export)
  * aarch64 NEON
  * non-x86/non-aarch64 scalar fallback
  * the inline `mod scalar` declaration itself

Result: when `cargo +nightly --features nightly-simd ...` is used, every
`use crate::simd::F32x16` call site routes to the portable-SIMD
implementation — and miri can actually execute it (it treats `_mm*`
intrinsics as opaque, but `core::simd::*` runs fine).

BF16 conversion FUNCTIONS (bf16_to_f32_batch etc.) are NOT gated under
the nightly arm: they're scalar/intrinsic functions taking primitive
slices, not the SIMD types, and they coexist cleanly with the portable
backend.

architecture doc
----------------

Parity matrix updated to reflect what `src/simd_avx2.rs` actually ships.
Previous matrix marked U8x64 / I8x64 / I16x32 / I32x16 / I64x8 /
U16x32 / U32x16 / U64x8 as ❌ in the AVX2 column. On survey those types
exist via the `avx2_int_type!` macro — full API-parity structs with
`[$elem; $lanes]` scalar storage (align 64). New 🟠 marker introduced
to distinguish "struct exists with API, storage is scalar" from "true
two-half SIMD composite" (🟡). I8x32 / I16x16 also corrected: they
share the AVX-512 `__m256i` definition (re-exported through
`simd_avx2`'s `pub use crate::simd_avx512::{i16x16, i8x32, ...}`).

The remaining AVX2 vectorization gap (filling 🟠 → 🟡 with real
two-half SIMD ops) is tracked separately as TD-SIMD-3.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f857a812d1

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/simd.rs
Comment on lines +222 to +225
pub use crate::simd_nightly::{
BF16x16, BF16x8, F16x16, F32Mask16, F32Mask8, F32x16, F32x8, F64Mask4, F64Mask8, F64x4, F64x8, I16x16, I16x32,
I32x16, I64x8, I8x32, I8x64, U16x32, U32x16, U32x8, U64x4, U64x8, U8x32, U8x64,
};
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Preserve lowercase SIMD aliases in nightly dispatch

When nightly-simd is enabled, this arm shadows all other crate::simd::* re-export branches, but it only exports F32x16-style names and drops lowercase aliases like f32x16, f64x8, and u8x64 that were previously available from crate::simd. Any downstream code (and current docs) importing ndarray::simd::f32x16 will fail to compile under cargo +nightly --features nightly-simd, so the feature now causes an API break instead of just swapping backend implementation.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in a7ca029. Added 17 lowercase aliases (f32x16, f64x8, u8x64, i32x16, i64x8, u32x16, u64x8, f32x8, f64x4, u8x32, u16x32, u32x8, u64x4, i8x64, i8x32, i16x32, i16x16) to src/simd_nightly/mod.rs matching the convention already in simd_avx2.rs, simd_avx512.rs, and src/simd_scalar.rs. Wired them into the nightly dispatch arm's pub use list so they surface through crate::simd::*. cargo +nightly --features nightly-simd is back to "backend swap, not API break."


Generated by Claude Code

…rustfmt

Codex P1 on PR #173: the nightly-simd dispatch arm shadowed every other
re-export branch but dropped the lowercase aliases (`f32x16`, `f64x8`,
`u8x64`, etc.) that downstream code and docs reach for. Switching on
`--features nightly-simd` was therefore an API break instead of a
backend swap.

`src/simd_nightly/mod.rs`
  + 17 lowercase aliases (`f32x16` = `F32x16` etc.) matching the
    convention already present in `simd_avx2.rs`, `simd_avx512.rs`, and
    `src/simd_scalar.rs`. Reaches every numeric type the other backends
    expose under their lowercase names.

`src/simd.rs`
  + Lowercase names added to the nightly dispatch arm's `pub use`
    list so they reach `crate::simd::*` alongside the PascalCase ones.

  Also runs rustfmt against the file — the multi-clause `#[cfg(all(...))]`
  attrs that Phase 2 introduced exceed the column budget and rustfmt
  prefers the vertical block layout.
@AdaWorldAPI AdaWorldAPI merged commit e9a4d73 into master May 20, 2026
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants