feat(simd): I8/I16 SIMD vectors + slice-level int ops (sprint A3) by AdaWorldAPI · Pull Request #124 · AdaWorldAPI/ndarray

AdaWorldAPI · 2026-04-30T09:58:30Z

Summary

Sprint A3 of burn-ndarray parity sprint v1. Closes items (4)+(5) — I8/I16 SIMD vectors + slice-level int ops.

Rebased onto master post-sprint (includes A1, A4, A5, A6, A7, A10, A12 merges + clippy fix). Conflict in simd_neon.rs resolved: kept both A7's float NEON block (F32x16/F64x8) AND A3's int NEON block (I8x16/I16x8).

What ships

I8x64 / I8x32 / I8x16 SIMD types (AVX-512 native, AVX2, NEON)
I16x32 / I16x16 / I16x8 SIMD types
src/simd_int_ops.rs — slice-level add_i8, sub_i8, dot_i8, min_i8, max_i8, etc.
Re-exports from src/simd.rs

Files (+1399/-3)

src/simd_avx512.rs — I8x64, I8x32, I16x32, I16x16 AVX-512 native impls
src/simd_avx2.rs — AVX2 polyfills
src/simd_neon.rs — NEON I8x16, I16x8 + paired polyfills for I8x32/I8x64/I16x16/I16x32
src/simd.rs — re-exports
src/simd_int_ops.rs — slice-level int operations
src/lib.rs — module declaration

Verification

cargo build: clean (1 pre-existing warning)
cargo test --lib: 1770 passed, 0 failed (70 tests added across entire sprint)
Rebased cleanly onto post-sprint master

https://claude.ai/code/session_01NYGrxVopyszZYgLBxe4hgj

Generated by Claude Code

… + 5) Adds the signed-byte / signed-half SIMD parity surface for the burn↔ndarray sprint: Item 4 — types • simd_avx512.rs: native I8x64 (__m512i) + I16x32 (__m512i) via AVX-512BW intrinsics (add/sub/min/max/cmp_gt/saturating/abs/neg). Plus AVX2-native I8x32 / I16x16 (__m256i) so the 256-bit signed types live in the same module as F32x8 / F64x4. • simd_avx2.rs: scalar-array polyfills for I8x64 / I16x32 (the AVX2 tier doesn't have a 64-byte signed type) and re-exports of the AVX2-native I8x32 / I16x16 from simd_avx512.rs for unified imports. • simd_neon.rs: NEON-native I8x16 (int8x16_t) + I16x8 (int16x8_t) via vaddq_s8 / vminq_s8 / vcgtq_s8 + paired/quadrupled scalar polyfills for I8x32 / I8x64 / I16x16 / I16x32. • simd.rs: scalar fallbacks for non-x86_64/aarch64 targets and re-exports for every active tier so consumers write use ndarray::simd::{I8x32, I8x64, I16x16, I16x32}; Item 5 — slice ops (new file simd_int_ops.rs) add_i8 / add_i16 / sub_i8 / sub_i16 (mutate-in-place, wrapping) dot_i8 -> i32 (overflow-safe accumulator) dot_i16 -> i64 (overflow-safe accumulator) min_i8 / max_i8 / min_i16 / max_i16 Each chunks via the natural SIMD width of the active tier (64-byte AVX-512BW when available, 32-byte AVX2, 16-byte NEON) and finishes with a scalar tail. Tests (+21 lib tests vs master baseline 1741 -> 1762): • simd_avx512::int_simd_tests: 9 tests (gated on target_feature=avx512f) pair-sum 64, signed boundaries, cmp_gt mask, saturating arithmetic. • simd_int_ops::tests: 11 tests misaligned tail lengths (63/65/127/129), 127i8 dot 127i8 x 64 overflow safety, signed boundary min/max, empty-slice identity. • simd_avx2 polyfill build verified with RUSTFLAGS="-C target-feature=-avx512f". Build host (this commit): AVX2 path (no avx512f at compile time -> uses the polyfill in simd_avx2.rs and simd.rs scalar mod for I8x64/I16x32).

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2b6c043fbc

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-30T10:01:49Z

+    assert_eq!(a.len(), b.len(), "dot_i8: length mismatch");
+    let mut acc: i32 = 0;
+    for i in 0..a.len() {
+        acc = acc.wrapping_add((a[i] as i32) * (b[i] as i32));


Prevent dot_i8 accumulator from wrapping overflow

dot_i8 is documented as overflow-safe, but the accumulation uses wrapping_add, so large inputs silently produce incorrect results instead of a true dot product. A concrete case is a=b=[-128; 131072]: each product is 16384, and the mathematical sum exceeds i32::MAX, so this code wraps and returns a corrupted value. This affects any workload with sufficiently long slices or extreme values.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-04-30T10:01:49Z

+        use crate::simd::I8x64;
+        const L: usize = 64;
+        let chunks = n / L;


Use AVX2-native i8 lanes in x86_64 slice ops

The x86_64 fast path hard-codes I8x64, but on non-AVX512 builds crate::simd::I8x64 resolves to the array-backed scalar polyfill in simd_avx2, not a vectorized AVX2 type. That means add_i8 (and similarly the other x86_64 i8/i16 helpers) executes scalar loops on mainstream AVX2 machines despite I8x32/I16x16 AVX2-native types being available, causing a significant regression from the advertised SIMD behavior.

Useful? React with 👍 / 👎.

AdaWorldAPI merged commit a5c8943 into master Apr 30, 2026
4 of 10 checks passed

chatgpt-codex-connector Bot reviewed Apr 30, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(simd): I8/I16 SIMD vectors + slice-level int ops (sprint A3)#124

feat(simd): I8/I16 SIMD vectors + slice-level int ops (sprint A3)#124
AdaWorldAPI merged 1 commit into
masterfrom
claude/burn-A3-int-simd

AdaWorldAPI commented Apr 30, 2026

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Apr 30, 2026

Uh oh!

chatgpt-codex-connector Bot Apr 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

AdaWorldAPI commented Apr 30, 2026

Summary

What ships

Files (+1399/-3)

Verification

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants