docs(simd): dispatch architecture + parity matrix + tech debt + integration plan by AdaWorldAPI · Pull Request #171 · AdaWorldAPI/ndarray

AdaWorldAPI · 2026-05-20T10:54:08Z

Summary

Docs-only PR. New design doc .claude/knowledge/simd-dispatch-architecture.md capturing four artifacts in one place per the discussion thread on PR #170:

Dispatch architecture — three explicit cargo configs (v3 default, v4 explicit AVX-512, native explicit) + optional runtime LazyLock<CpuCaps> mode. Each is a conscious cargo invocation; no silent fallback.
Parity matrix — typed lane primitives × backend (simd_avx512 / simd_avx2 / simd_neon / simd_nightly / scalar). Surfaces the AVX2 gap: only F32x16, F64x8, U8x32, F16Scaler exist; the other 14 cross-arch lanes are missing.
Technical debt matrix — 10 issues (TD-SIMD-1..TD-SIMD-10) ranked P0..P3. P0s are the v4-baked-binary default + the AVX2 wrapper gap. P1s are the unwired nightly-simd dispatch arm + NEON parity. P2-P3s are scalar/macro/runtime/CI ergonomics.
Integration plan — six sequenced single-PR phases that take the codebase from current state → target dispatch architecture, each landing green CI before the next starts.

1 file, +294 / 0 vs master.

Why now

PR #170 (tests/1.95.0 CI run 26151746204/76920666348) showed 38 tests failing uniformly at ~19 s timeouts in simd_avx2::* / simd_amx::* / simd_ops::* / simd_soa::* — the SIGILL pattern from a v4-baked binary running on an AVX2-only GitHub runner. The matrix + tech debt capture the gaps; the integration plan sequences the fix.

Out of scope

No code changes in this PR. Phase 1 (the actual config flip + AVX2 wrappers + dispatch arm) is a separate follow-up PR.

🤖 Generated with Claude Code

Generated by Claude Code

…ration plan New design doc at .claude/knowledge/simd-dispatch-architecture.md covering four artifacts in one place: 1. Dispatch architecture — three explicit cargo configs (v3 default, v4 explicit, native explicit) + optional runtime LazyLock path. Each is a conscious cargo invocation; no silent fallback. 2. Parity matrix — typed lane primitives × backend (avx512 / avx2 / neon / nightly / scalar). Surfaces the AVX2 gap: only F32x16, F64x8, U8x32, F16Scaler exist; the other 14 cross-arch lanes are missing → CI SIGILL on the v3 baseline. 3. Technical debt matrix — 10 issues (TD-SIMD-1..10) ranked P0..P3. P0: default config bakes AVX-512 → CI SIGILL on AVX2-only runners. P0: AVX2 missing 10 two-half wrappers (U64x8, I32x16, …). P1: simd.rs has no `feature = "nightly-simd"` arm → simd_nightly/* unreachable despite being the most-complete backend. P1: NEON parity gap symmetric to AVX2. …through P3 CI-matrix entries. 4. Integration plan — six sequenced phases, each a single-PR worker: Phase 1 unblock CI (config flip + AVX2 wrappers + dispatch arms) Phase 2 unblock nightly-simd polyfill Phase 3 NEON parity Phase 4 scalar→file + macro + F16 honesty Phase 5 runtime dispatch (opt-in) Phase 6 AVX-512 explicit CI job Pinned by the discussion thread on PR #170 (CI run 26151746204): the 38 simd_avx2/simd_amx/simd_ops/simd_soa test failures at uniform 19 s timeouts are the v3-runner / v4-baked-binary SIGILL pattern. This doc captures the architecture target, the gaps, and the path.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2c1942ba00

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-20T10:58:06Z

+#[cfg(all(feature = "nightly-simd", any(target_arch = "x86_64", target_arch = "aarch64")))]
+pub use crate::simd_nightly::{F32x16, F64x8, U8x32, U8x64, U16x32, U32x16, U64x8, I8x32, I8x64, I16x16, I16x32, I32x16, I64x8, F32Mask16, F64Mask8, BF16x16, BF16x8};


Keep nightly-simd dispatch valid on non-x86/aarch64

The proposed nightly-simd arm is constrained to x86_64/aarch64, but the same section describes this mode as usable on “or any” target. With the shown scalar fallback condition (not(..., feature = "nightly-simd")), enabling nightly-simd on targets like wasm32 or riscv would leave no matching re-export path in simd.rs, causing unresolved SIMD type exports. Please either broaden the nightly arm to all targets that should be supported or adjust the scalar fallback predicate so one backend always matches.

Useful? React with 👍 / 👎.

Fixed in 9016621. Dropped the target_arch = x86_64 | aarch64 constraint from the nightly arm — core::simd is portable, so the arm is now #[cfg(feature = "nightly-simd")] unconditional and catches wasm32 / riscv too. Also tightened the scalar fallback predicate to the exact negation of arms 1-4 (not(any(nightly-simd, all(x86_64, avx2), aarch64))) so x86_64-without-AVX2 also lands on scalar. Result: exactly one arm matches on every (target, feature) pair.

Generated by Claude Code

chatgpt-codex-connector · 2026-05-20T10:58:06Z

+| `I16x32` | ✅ `__m512i` | ❌ | ❌ | 🔵 | ✅ |
+| `I32x16` | ✅ `__m512i` | ❌ | ❌ | 🔵 | ✅ |
+| `I64x8` | ✅ `__m512i` | ❌ | ❌ | 🔵 | ✅ |
+| `BF16x8` | ✅ `__m128bh` | ❌ | ❌ | ❌ | ✅ |


Correct BF16x8 parity status for simd_nightly

The parity matrix marks BF16x8 as missing in simd_nightly, but src/simd_nightly/mod.rs already re-exports BF16x8 (alongside BF16x16). This mismatch can mis-prioritize follow-up work by treating an implemented lane as a gap, so the matrix row should be updated to reflect current backend coverage.

Useful? React with 👍 / 👎.

Fixed in 9016621. BF16x8 row column 4 (simd_nightly) flipped from ❌ to 🔵 — confirmed by grep BF16x8 src/simd_nightly/mod.rs showing pub use bf16_types::{BF16x16, BF16x8};. Matrix now matches actual backend coverage.

Generated by Claude Code

- Drop the `target_arch = x86_64 | aarch64` constraint from the nightly-simd arm. `core::simd` is portable, so this arm must match on wasm32 / riscv too — otherwise enabling `nightly-simd` on those targets leaves no backend (every other arm excludes it). - Tighten the scalar fallback predicate to the exact negation of arms 1-4 so x86_64-without-AVX2 also routes to scalar. - Correct `BF16x8` parity row: `simd_nightly` already re-exports `BF16x8` (alongside `BF16x16`); the matrix was stale.

chatgpt-codex-connector Bot reviewed May 20, 2026

View reviewed changes

AdaWorldAPI mentioned this pull request May 20, 2026

feat(simd): Phase 1 — explicit cargo configs + AVX2 dispatch hardening #172

Merged

4 tasks

AdaWorldAPI merged commit 8d3c1d7 into master May 20, 2026
15 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs(simd): dispatch architecture + parity matrix + tech debt + integration plan#171

docs(simd): dispatch architecture + parity matrix + tech debt + integration plan#171
AdaWorldAPI merged 2 commits into
masterfrom
claude/simd-dispatch-architecture-doc

AdaWorldAPI commented May 20, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot May 20, 2026

Uh oh!

AdaWorldAPI May 20, 2026

Uh oh!

chatgpt-codex-connector Bot May 20, 2026

Uh oh!

AdaWorldAPI May 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		#[cfg(all(feature = "nightly-simd", any(target_arch = "x86_64", target_arch = "aarch64")))]
		pub use crate::simd_nightly::{F32x16, F64x8, U8x32, U8x64, U16x32, U32x16, U64x8, I8x32, I8x64, I16x16, I16x32, I32x16, I64x8, F32Mask16, F64Mask8, BF16x16, BF16x8};

Conversation

AdaWorldAPI commented May 20, 2026

Summary

Why now

Out of scope

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 20, 2026

Choose a reason for hiding this comment

Uh oh!

AdaWorldAPI May 20, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot May 20, 2026

Choose a reason for hiding this comment

Uh oh!

AdaWorldAPI May 20, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants