Skip to content

docs(simd): dispatch architecture + parity matrix + tech debt + integration plan#171

Merged
AdaWorldAPI merged 2 commits into
masterfrom
claude/simd-dispatch-architecture-doc
May 20, 2026
Merged

docs(simd): dispatch architecture + parity matrix + tech debt + integration plan#171
AdaWorldAPI merged 2 commits into
masterfrom
claude/simd-dispatch-architecture-doc

Conversation

@AdaWorldAPI
Copy link
Copy Markdown
Owner

Summary

Docs-only PR. New design doc .claude/knowledge/simd-dispatch-architecture.md capturing four artifacts in one place per the discussion thread on PR #170:

  1. Dispatch architecture — three explicit cargo configs (v3 default, v4 explicit AVX-512, native explicit) + optional runtime LazyLock<CpuCaps> mode. Each is a conscious cargo invocation; no silent fallback.
  2. Parity matrix — typed lane primitives × backend (simd_avx512 / simd_avx2 / simd_neon / simd_nightly / scalar). Surfaces the AVX2 gap: only F32x16, F64x8, U8x32, F16Scaler exist; the other 14 cross-arch lanes are missing.
  3. Technical debt matrix — 10 issues (TD-SIMD-1..TD-SIMD-10) ranked P0..P3. P0s are the v4-baked-binary default + the AVX2 wrapper gap. P1s are the unwired nightly-simd dispatch arm + NEON parity. P2-P3s are scalar/macro/runtime/CI ergonomics.
  4. Integration plan — six sequenced single-PR phases that take the codebase from current state → target dispatch architecture, each landing green CI before the next starts.

1 file, +294 / 0 vs master.

Why now

PR #170 (tests/1.95.0 CI run 26151746204/76920666348) showed 38 tests failing uniformly at ~19 s timeouts in simd_avx2::* / simd_amx::* / simd_ops::* / simd_soa::* — the SIGILL pattern from a v4-baked binary running on an AVX2-only GitHub runner. The matrix + tech debt capture the gaps; the integration plan sequences the fix.

Out of scope

No code changes in this PR. Phase 1 (the actual config flip + AVX2 wrappers + dispatch arm) is a separate follow-up PR.

🤖 Generated with Claude Code


Generated by Claude Code

…ration plan

New design doc at .claude/knowledge/simd-dispatch-architecture.md
covering four artifacts in one place:

1. Dispatch architecture — three explicit cargo configs (v3 default,
   v4 explicit, native explicit) + optional runtime LazyLock path.
   Each is a conscious cargo invocation; no silent fallback.

2. Parity matrix — typed lane primitives × backend (avx512 / avx2 /
   neon / nightly / scalar). Surfaces the AVX2 gap: only F32x16,
   F64x8, U8x32, F16Scaler exist; the other 14 cross-arch lanes are
   missing → CI SIGILL on the v3 baseline.

3. Technical debt matrix — 10 issues (TD-SIMD-1..10) ranked P0..P3.
   P0: default config bakes AVX-512 → CI SIGILL on AVX2-only runners.
   P0: AVX2 missing 10 two-half wrappers (U64x8, I32x16, …).
   P1: simd.rs has no `feature = "nightly-simd"` arm → simd_nightly/*
       unreachable despite being the most-complete backend.
   P1: NEON parity gap symmetric to AVX2.
   …through P3 CI-matrix entries.

4. Integration plan — six sequenced phases, each a single-PR worker:
   Phase 1 unblock CI (config flip + AVX2 wrappers + dispatch arms)
   Phase 2 unblock nightly-simd polyfill
   Phase 3 NEON parity
   Phase 4 scalar→file + macro + F16 honesty
   Phase 5 runtime dispatch (opt-in)
   Phase 6 AVX-512 explicit CI job

Pinned by the discussion thread on PR #170 (CI run 26151746204): the
38 simd_avx2/simd_amx/simd_ops/simd_soa test failures at uniform 19 s
timeouts are the v3-runner / v4-baked-binary SIGILL pattern. This
doc captures the architecture target, the gaps, and the path.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2c1942ba00

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +61 to +62
#[cfg(all(feature = "nightly-simd", any(target_arch = "x86_64", target_arch = "aarch64")))]
pub use crate::simd_nightly::{F32x16, F64x8, U8x32, U8x64, U16x32, U32x16, U64x8, I8x32, I8x64, I16x16, I16x32, I32x16, I64x8, F32Mask16, F64Mask8, BF16x16, BF16x8};
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Keep nightly-simd dispatch valid on non-x86/aarch64

The proposed nightly-simd arm is constrained to x86_64/aarch64, but the same section describes this mode as usable on “or any” target. With the shown scalar fallback condition (not(..., feature = "nightly-simd")), enabling nightly-simd on targets like wasm32 or riscv would leave no matching re-export path in simd.rs, causing unresolved SIMD type exports. Please either broaden the nightly arm to all targets that should be supported or adjust the scalar fallback predicate so one backend always matches.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 9016621. Dropped the target_arch = x86_64 | aarch64 constraint from the nightly arm — core::simd is portable, so the arm is now #[cfg(feature = "nightly-simd")] unconditional and catches wasm32 / riscv too. Also tightened the scalar fallback predicate to the exact negation of arms 1-4 (not(any(nightly-simd, all(x86_64, avx2), aarch64))) so x86_64-without-AVX2 also lands on scalar. Result: exactly one arm matches on every (target, feature) pair.


Generated by Claude Code

| `I16x32` | ✅ `__m512i` | ❌ | ❌ | 🔵 | ✅ |
| `I32x16` | ✅ `__m512i` | ❌ | ❌ | 🔵 | ✅ |
| `I64x8` | ✅ `__m512i` | ❌ | ❌ | 🔵 | ✅ |
| `BF16x8` | ✅ `__m128bh` | ❌ | ❌ | ❌ | ✅ |
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Correct BF16x8 parity status for simd_nightly

The parity matrix marks BF16x8 as missing in simd_nightly, but src/simd_nightly/mod.rs already re-exports BF16x8 (alongside BF16x16). This mismatch can mis-prioritize follow-up work by treating an implemented lane as a gap, so the matrix row should be updated to reflect current backend coverage.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 9016621. BF16x8 row column 4 (simd_nightly) flipped from ❌ to 🔵 — confirmed by grep BF16x8 src/simd_nightly/mod.rs showing pub use bf16_types::{BF16x16, BF16x8};. Matrix now matches actual backend coverage.


Generated by Claude Code

- Drop the `target_arch = x86_64 | aarch64` constraint from the
  nightly-simd arm. `core::simd` is portable, so this arm must match
  on wasm32 / riscv too — otherwise enabling `nightly-simd` on those
  targets leaves no backend (every other arm excludes it).
- Tighten the scalar fallback predicate to the exact negation of arms
  1-4 so x86_64-without-AVX2 also routes to scalar.
- Correct `BF16x8` parity row: `simd_nightly` already re-exports
  `BF16x8` (alongside `BF16x16`); the matrix was stale.
@AdaWorldAPI AdaWorldAPI merged commit 8d3c1d7 into master May 20, 2026
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants