Skip to content

feat(simd): Phase 1 — explicit cargo configs + AVX2 dispatch hardening#172

Merged
AdaWorldAPI merged 2 commits into
masterfrom
claude/pr-x-phase1-config-flip
May 20, 2026
Merged

feat(simd): Phase 1 — explicit cargo configs + AVX2 dispatch hardening#172
AdaWorldAPI merged 2 commits into
masterfrom
claude/pr-x-phase1-config-flip

Conversation

@AdaWorldAPI
Copy link
Copy Markdown
Owner

Summary

Phase 1 of the integration plan in .claude/knowledge/simd-dispatch-architecture.md (PR #171).

  • .cargo/config.toml — pin target-cpu = "x86-64-v3" for x86_64 (was empty → SSE2 baseline → __m256 / __m256i intrinsics inside simd_avx2::F32x16 / F64x8 / int wrappers ran under SSE2 → SIGILL on the GH runner, PR PR-X12 A1: CTU carrier + quad-tree partition #170's 38-test uniform-timeout failure mode).
  • .cargo/config-avx512.toml (new) — explicit x86-64-v4 for AVX-512 builds.
  • .cargo/config-native.toml (new) — target-cpu = "native" for developer-machine builds.
  • src/simd.rs — tighten the AVX2 dispatch arm predicate from not(avx512f) to avx2 + not(avx512f). Refresh the stale "target-cpu=x86-64-v4 → AVX-512" comment block to describe the new three-config dispatch model.

Scope correction vs PR #171

The architecture doc listed "add ~10 missing AVX2 two-half wrappers" as Phase 1 work. On survey, those wrappers already exist in src/simd_avx2.rs:

  • F32x16 / F64x8 — true two-half AVX wrappers ((f32x8, f32x8) / (f64x4, f64x4))
  • U8x32 — native AVX2 __m256i
  • U8x64 / I8x64 / I16x32 / I32x16 / I64x8 / U16x32 / U32x16 / U64x8 — scalar polyfills via the avx2_int_type! macro (storage = [$elem; $lanes], align 64)

The matrix entry in #171's parity table will be corrected on the doc branch as a follow-up. The remaining gap (AVX2 int wrappers are scalar-polyfill, not vectorized) is its own piece of tech debt and is not in Phase 1's scope.

Test plan

  • cargo build — default config → builds with x86-64-v3 (AVX2)
  • cargo --config .cargo/config-avx512.toml build — AVX-512 build path
  • cargo --config .cargo/config-native.toml build — native build path
  • CI on PR — simd_avx2::* / simd_ops::* / simd_soa::* no longer SIGILL on the AVX2-only GitHub runner

Generated by Claude Code

Implements Phase 1 of the integration plan in `.claude/knowledge/
simd-dispatch-architecture.md` (PR #171).

Changes
-------

1. `.cargo/config.toml` — set `target-cpu = "x86-64-v3"` for x86_64.
   Previously the file declared "no global target-cpu", which compiled
   binaries to x86-64 generic (SSE2). `simd_avx2::F32x16` and friends
   wrap `__m256` / `__m256i` intrinsics that the runtime CPU never
   executes under SSE2, producing the PR #170 SIGILL CI mode (38 tests
   timing out uniformly at ~19s in `simd_avx2::*` / `simd_ops::*` /
   `simd_soa::*`).

2. `.cargo/config-avx512.toml` (new) — explicit `x86-64-v4` for AVX-512
   builds. Triggered by `cargo --config .cargo/config-avx512.toml`.

3. `.cargo/config-native.toml` (new) — `target-cpu = "native"` for
   build-host-tuned binaries (developer machines). Non-portable.

4. `src/simd.rs` — tighten the AVX2 dispatch arm predicate from
   `not(target_feature = "avx512f")` to
   `target_feature = "avx2" + not(target_feature = "avx512f")`.
   Belts-and-braces: under v3 the predicates are equivalent, but the
   explicit `avx2` requirement means a future "build me without v3"
   invocation lands on a compile error rather than a SIGILL at run
   time. Stale "target-cpu=x86-64-v4 → AVX-512" comment refreshed to
   describe the new three-config dispatch model.

Out of scope for this PR
------------------------

The architecture doc (PR #171) claimed Phase 1 also needed to "add
~10 missing AVX2 two-half wrappers". On survey those wrappers already
exist in `src/simd_avx2.rs`:
  - `F32x16` / `F64x8`           — true two-half AVX wrappers
  - `U8x32`                      — native AVX2 `__m256i`
  - `U8x64` / `I8x64` / `I16x32` / `I32x16` / `I64x8` / `U16x32` /
    `U32x16` / `U64x8`           — scalar polyfill via the
                                   `avx2_int_type!` macro (storage =
                                   `[$elem; $lanes]` align 64).
The matrix in the architecture doc will be corrected as a follow-up.

The parity gap that does exist (scalar-polyfill ints are not vectorized
under AVX2) is its own piece of tech debt, tracked separately.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0d0031812d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/simd.rs Outdated
pub use crate::simd_avx512::{f32x8, f64x4, i16x16, i8x32, F32x8, F64x4, I16x16, I8x32};

#[cfg(all(target_arch = "x86_64", not(target_feature = "avx512f")))]
#[cfg(all(target_arch = "x86_64", target_feature = "avx2", not(target_feature = "avx512f")))]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Restore x86_64 fallback when AVX2 is unavailable

Requiring target_feature = "avx2" on the x86_64 re-export arm removes all F32x16/F64x8/integer SIMD type exports for x86_64 builds that are not compiled with AVX2 (for example downstream users building this crate with default x86_64 flags or x86-64-v2). Because this file defines unconditional APIs like simd_exp_f32(x: F32x16), those builds now fail at compile time due to missing type definitions instead of falling back; .cargo/config.toml in this repo does not protect dependency builds in other workspaces.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in e3ad707 (already on the merged branch) — reverted the target_feature = "avx2" predicate tightening for exactly this reason. Same root cause surfaced in our CI: RUSTFLAGS="-D warnings" env in ci.yaml overrides .cargo/config.toml rustflags entirely (cargo doesn't merge — env wins), so even our own GitHub runner landed on x86-64 baseline without target_feature = "avx2" set, leaving no matching arm → consumer references to crate::simd::F32x16 failed to compile.

Predicate is back to not(avx512f). Per-function #[target_feature(enable = "avx,avx2,fma")] annotations inside simd_avx2.rs gate the actual intrinsic execution at the symbol level; the struct-field types (__m256 / __m256i) are core::arch declarations that don't require AVX/AVX2 at the type level. Downstream consumers building this crate with default x86_64 flags or x86-64-v2 now keep their type exports.


Generated by Claude Code

… v3 config

The previous commit tightened the x86_64 dispatch arm to
`target_feature = "avx2" + not(avx512f)`. The intent was to make
"x86-64 baseline + AVX2 wrappers" a compile error rather than a SIGILL.

CI green-mode disagreed: `.github/workflows/ci.yaml` sets a global
`RUSTFLAGS="-D warnings"` env that overrides the rustflags from
`.cargo/config.toml` entirely (cargo doesn't merge env + config
rustflags — env wins). So in CI the v3 baseline never takes effect,
x86-64 generic / SSE2 is what builds, `target_feature = "avx2"` is not
set, and the tightened arm leaves no matching dispatch path → consumer
references to `crate::simd::F32x16` fail to compile.

The pre-existing wider `not(avx512f)` predicate works at x86-64
baseline because the inner intrinsics in `simd_avx2.rs` use per-function
`#[target_feature(enable = "avx,avx2,fma")]` annotations — the OPS
gate themselves at the symbol level, struct fields like `__m256` /
`__m256i` are core::arch type declarations that don't require AVX/AVX2
at the type level (only at execution).

Reverting the predicate. The cargo configs added in the previous commit
stay — they're the documented opt-in affordances. Local `cargo build`
without env override gets v3; CI runs at baseline + per-function
target_feature; explicit AVX-512 via `--config .cargo/config-avx512.toml`.
@AdaWorldAPI AdaWorldAPI merged commit a18366a into master May 20, 2026
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants