feat(simd): Phase 1 — explicit cargo configs + AVX2 dispatch hardening by AdaWorldAPI · Pull Request #172 · AdaWorldAPI/ndarray

AdaWorldAPI · 2026-05-20T11:38:56Z

Summary

Phase 1 of the integration plan in .claude/knowledge/simd-dispatch-architecture.md (PR #171).

.cargo/config.toml — pin target-cpu = "x86-64-v3" for x86_64 (was empty → SSE2 baseline → __m256 / __m256i intrinsics inside simd_avx2::F32x16 / F64x8 / int wrappers ran under SSE2 → SIGILL on the GH runner, PR PR-X12 A1: CTU carrier + quad-tree partition #170's 38-test uniform-timeout failure mode).
.cargo/config-avx512.toml (new) — explicit x86-64-v4 for AVX-512 builds.
.cargo/config-native.toml (new) — target-cpu = "native" for developer-machine builds.
src/simd.rs — tighten the AVX2 dispatch arm predicate from not(avx512f) to avx2 + not(avx512f). Refresh the stale "target-cpu=x86-64-v4 → AVX-512" comment block to describe the new three-config dispatch model.

Scope correction vs PR #171

The architecture doc listed "add ~10 missing AVX2 two-half wrappers" as Phase 1 work. On survey, those wrappers already exist in src/simd_avx2.rs:

F32x16 / F64x8 — true two-half AVX wrappers ((f32x8, f32x8) / (f64x4, f64x4))
U8x32 — native AVX2 __m256i
U8x64 / I8x64 / I16x32 / I32x16 / I64x8 / U16x32 / U32x16 / U64x8 — scalar polyfills via the avx2_int_type! macro (storage = [$elem; $lanes], align 64)

The matrix entry in #171's parity table will be corrected on the doc branch as a follow-up. The remaining gap (AVX2 int wrappers are scalar-polyfill, not vectorized) is its own piece of tech debt and is not in Phase 1's scope.

Test plan

cargo build — default config → builds with x86-64-v3 (AVX2)
cargo --config .cargo/config-avx512.toml build — AVX-512 build path
cargo --config .cargo/config-native.toml build — native build path
CI on PR — simd_avx2::* / simd_ops::* / simd_soa::* no longer SIGILL on the AVX2-only GitHub runner

Generated by Claude Code

Implements Phase 1 of the integration plan in `.claude/knowledge/ simd-dispatch-architecture.md` (PR #171). Changes ------- 1. `.cargo/config.toml` — set `target-cpu = "x86-64-v3"` for x86_64. Previously the file declared "no global target-cpu", which compiled binaries to x86-64 generic (SSE2). `simd_avx2::F32x16` and friends wrap `__m256` / `__m256i` intrinsics that the runtime CPU never executes under SSE2, producing the PR #170 SIGILL CI mode (38 tests timing out uniformly at ~19s in `simd_avx2::*` / `simd_ops::*` / `simd_soa::*`). 2. `.cargo/config-avx512.toml` (new) — explicit `x86-64-v4` for AVX-512 builds. Triggered by `cargo --config .cargo/config-avx512.toml`. 3. `.cargo/config-native.toml` (new) — `target-cpu = "native"` for build-host-tuned binaries (developer machines). Non-portable. 4. `src/simd.rs` — tighten the AVX2 dispatch arm predicate from `not(target_feature = "avx512f")` to `target_feature = "avx2" + not(target_feature = "avx512f")`. Belts-and-braces: under v3 the predicates are equivalent, but the explicit `avx2` requirement means a future "build me without v3" invocation lands on a compile error rather than a SIGILL at run time. Stale "target-cpu=x86-64-v4 → AVX-512" comment refreshed to describe the new three-config dispatch model. Out of scope for this PR ------------------------ The architecture doc (PR #171) claimed Phase 1 also needed to "add ~10 missing AVX2 two-half wrappers". On survey those wrappers already exist in `src/simd_avx2.rs`: - `F32x16` / `F64x8` — true two-half AVX wrappers - `U8x32` — native AVX2 `__m256i` - `U8x64` / `I8x64` / `I16x32` / `I32x16` / `I64x8` / `U16x32` / `U32x16` / `U64x8` — scalar polyfill via the `avx2_int_type!` macro (storage = `[$elem; $lanes]` align 64). The matrix in the architecture doc will be corrected as a follow-up. The parity gap that does exist (scalar-polyfill ints are not vectorized under AVX2) is its own piece of tech debt, tracked separately.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0d0031812d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-20T11:42:32Z

 pub use crate::simd_avx512::{f32x8, f64x4, i16x16, i8x32, F32x8, F64x4, I16x16, I8x32};

-#[cfg(all(target_arch = "x86_64", not(target_feature = "avx512f")))]
+#[cfg(all(target_arch = "x86_64", target_feature = "avx2", not(target_feature = "avx512f")))]


Restore x86_64 fallback when AVX2 is unavailable

Requiring target_feature = "avx2" on the x86_64 re-export arm removes all F32x16/F64x8/integer SIMD type exports for x86_64 builds that are not compiled with AVX2 (for example downstream users building this crate with default x86_64 flags or x86-64-v2). Because this file defines unconditional APIs like simd_exp_f32(x: F32x16), those builds now fail at compile time due to missing type definitions instead of falling back; .cargo/config.toml in this repo does not protect dependency builds in other workspaces.

Useful? React with 👍 / 👎.

Fixed in e3ad707 (already on the merged branch) — reverted the target_feature = "avx2" predicate tightening for exactly this reason. Same root cause surfaced in our CI: RUSTFLAGS="-D warnings" env in ci.yaml overrides .cargo/config.toml rustflags entirely (cargo doesn't merge — env wins), so even our own GitHub runner landed on x86-64 baseline without target_feature = "avx2" set, leaving no matching arm → consumer references to crate::simd::F32x16 failed to compile.

Predicate is back to not(avx512f). Per-function #[target_feature(enable = "avx,avx2,fma")] annotations inside simd_avx2.rs gate the actual intrinsic execution at the symbol level; the struct-field types (__m256 / __m256i) are core::arch declarations that don't require AVX/AVX2 at the type level. Downstream consumers building this crate with default x86_64 flags or x86-64-v2 now keep their type exports.

Generated by Claude Code

… v3 config The previous commit tightened the x86_64 dispatch arm to `target_feature = "avx2" + not(avx512f)`. The intent was to make "x86-64 baseline + AVX2 wrappers" a compile error rather than a SIGILL. CI green-mode disagreed: `.github/workflows/ci.yaml` sets a global `RUSTFLAGS="-D warnings"` env that overrides the rustflags from `.cargo/config.toml` entirely (cargo doesn't merge env + config rustflags — env wins). So in CI the v3 baseline never takes effect, x86-64 generic / SSE2 is what builds, `target_feature = "avx2"` is not set, and the tightened arm leaves no matching dispatch path → consumer references to `crate::simd::F32x16` fail to compile. The pre-existing wider `not(avx512f)` predicate works at x86-64 baseline because the inner intrinsics in `simd_avx2.rs` use per-function `#[target_feature(enable = "avx,avx2,fma")]` annotations — the OPS gate themselves at the symbol level, struct fields like `__m256` / `__m256i` are core::arch type declarations that don't require AVX/AVX2 at the type level (only at execution). Reverting the predicate. The cargo configs added in the previous commit stay — they're the documented opt-in affordances. Local `cargo build` without env override gets v3; CI runs at baseline + per-function target_feature; explicit AVX-512 via `--config .cargo/config-avx512.toml`.

chatgpt-codex-connector Bot reviewed May 20, 2026

View reviewed changes

AdaWorldAPI merged commit a18366a into master May 20, 2026
15 checks passed

AdaWorldAPI mentioned this pull request May 20, 2026

ci(simd): Phase 6 — AVX-512 dispatch check job #175

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(simd): Phase 1 — explicit cargo configs + AVX2 dispatch hardening#172

feat(simd): Phase 1 — explicit cargo configs + AVX2 dispatch hardening#172
AdaWorldAPI merged 2 commits into
masterfrom
claude/pr-x-phase1-config-flip

AdaWorldAPI commented May 20, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot May 20, 2026

Uh oh!

AdaWorldAPI May 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

AdaWorldAPI commented May 20, 2026

Summary

Scope correction vs PR #171

Test plan

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 20, 2026

Choose a reason for hiding this comment

Uh oh!

AdaWorldAPI May 20, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants