feat(quantized): VNNI INT8 GEMM via VPDPBUSD (sprint W3-C)#128
Conversation
Closes parity item 12 — INT8 GEMM accelerated via AVX-512 VNNI's VPDPBUSD
instruction (4-element u8×i8→i32 dot product). Falls back to scalar
int8_gemm_i32 on hardware without VNNI.
What ships:
- src/hpc/vnni_gemm.rs (387 LOC): int8_gemm_vnni public API,
has_vnni() detection, _mm512_dpbusd_epi32 inner kernel, scalar fallback
- src/hpc/simd_caps.rs: avx512vnni: bool field added to SimdCaps,
is_x86_feature_detected!("avx512vnni") detection wired
- src/hpc/mod.rs: pub mod vnni_gemm declaration
Hardware coverage:
- AVX-512 VNNI: Ice Lake, Sapphire Rapids, Zen 4 (with AVX-512), Tiger Lake
- Fallback: any x86_64 / ARM / scalar
Tests: 11 passing (4×4, 16×16, 17×17 tail, 1×1 edge, mixed values).
Total lib tests: 1817+ pass.
Note: type-cast fix applied to _mm512_loadu_si512 / _mm512_storeu_si512
(*const i32 → *const __m512i, *mut i32 → *mut __m512i) per Rust 1.94
intrinsic signatures.
https://claude.ai/code/session_01NYGrxVopyszZYgLBxe4hgj
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e24f7aa8b0
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if caps.has_avx512_vnni() { | ||
| unsafe { int8_gemm_vnni_avx512(a, b, c, m, n, k) } |
There was a problem hiding this comment.
Gate VNNI kernel on all declared target features
int8_gemm_vnni_avx512 is compiled with #[target_feature(enable = "avx512f,avx512vnni,avx512bw")], but dispatch only checks has_avx512_vnni() (avx512f && avx512vnni). If this runs on a CPU/VM that exposes VNNI without avx512bw, the unsafe call enters a function compiled for an unsupported feature set, which is undefined behavior and can surface as SIGILL. The runtime gate should include avx512bw (or the function should stop declaring it if truly unnecessary).
Useful? React with 👍 / 👎.
Closes parity item (12) — INT8 GEMM accelerated via AVX-512 VNNI's
VPDPBUSDinstruction (4-element u8×i8→i32 dot product, 64 multiply-accumulates per zmm). Falls back to scalarint8_gemm_i32on hardware without VNNI.What ships:
src/hpc/vnni_gemm.rs(387 LOC):int8_gemm_vnnipublic API,_mm512_dpbusd_epi32inner kernel, scalar fallback pathsrc/hpc/simd_caps.rs:avx512vnni: boolfield added toSimdCaps,is_x86_feature_detected!("avx512vnni")detection wiredsrc/hpc/mod.rs:pub mod vnni_gemmdeclarationHardware coverage: Ice Lake, Sapphire Rapids, Zen 4 (with AVX-512), Tiger Lake. Fallback works on any x86_64 / ARM / scalar.
Tests: 11 passing (4×4, 16×16, 17×17 tail, 1×1 edge, mixed values).
Note: type-cast fix applied to
_mm512_loadu_si512/_mm512_storeu_si512(*const i32→*const __m512i,*mut i32→*mut __m512i) per Rust 1.94 intrinsic signatures.https://claude.ai/code/session_01NYGrxVopyszZYgLBxe4hgj