Skip to content

feat(kernel): Panama Vector FP32 matmul provider (M5)#557

Merged
michalharakal merged 1 commit intodevelopfrom
feature/jvm-panama-fp32-matmul-kernel
Apr 28, 2026
Merged

feat(kernel): Panama Vector FP32 matmul provider (M5)#557
michalharakal merged 1 commit intodevelopfrom
feature/jvm-panama-fp32-matmul-kernel

Conversation

@michalharakal
Copy link
Copy Markdown
Contributor

Summary

  • Adds PanamaVectorMatmulKernel (jdk.incubator.vector — FloatVector + fma + reduceLanes) implementing the Fp32MatmulKernel SPI from feat(kernel): add KernelProvider SPI for matmul dispatch (Scalar baseline) #554.
  • Adds PanamaVectorKernelProvider (name = \"panama-vector\", priority = 50). Sits above ScalarKernelProvider (0) and below a future native provider (100).
  • isAvailable() requires JDK 21+, the jdk.incubator.vector module on the path, and respects the existing skainet.cpu.vector.enabled kill switch (-D or SKAINET_CPU_VECTOR_ENABLED).
  • Closes the "Panama-first" half of milestone M5 — CPU backend dispatch in the JVM inference perf roadmap.

Why this shape

  • The kernel packs B^T into a contiguous (n, k) buffer so the inner reduction streams sequentially over k for both operands. One pack + one FMA accumulator per output cell, scalar tail for the lanes that don't fill a vector.
  • Bit-for-bit numerical equivalence with ScalarMatmulKernel is not guaranteed (FMA + reordered accumulation), but parity within 1e-5 * k tolerance is asserted across contiguous, strided sub-blocks, non-aligned k (tail loop), and randomized larger sizes. This matches the per-milestone golden-output regression bar in the roadmap.

Out of scope (follow-ups)

  • Wire DefaultCpuOpsJvm.matmul through the kernel SPI. Today it still calls JvmVectorKernels.matmulFloat / matmulFloatBlocked directly; until that routing change lands, the existing MatmulBench won't exercise this provider end-to-end.
  • JMH evidence for the M5 ≥1.5× target. Best added together with the routing change so the bench numbers reflect the SPI path. A kernel-level microbench in :skainet-backends:benchmarks:jvm-cpu-jmh is the natural home.
  • ServiceLoader auto-discovery in KernelRegistry. The SPI doc explicitly defers this until a second concrete JVM provider exists — that condition is now met, so this is a clean small follow-up PR.
  • Cache-blocked variant of the kernel (8×8×128 tiling, like JvmVectorKernels.matmulFloatBlocked). Useful if the simple FMA path doesn't clear the ≥4× target on 512² that docs/.../perf/jvm-cpu.adoc mentions.

Test plan

  • ./gradlew :skainet-backends:skainet-backend-cpu:jvmTest --tests \"sk.ainet.exec.kernel.*\" — 13 new tests pass (8 kernel parity + 5 provider/registry); existing KernelRegistryTest and ScalarMatmulKernelTest still pass.
  • Parity vs ScalarMatmulKernel for: 2×3×4 contiguous, 8×16×32 random, 31×17×23 random, non-aligned k=23 (tail loop), strided A sub-block.
  • Boundary semantics: m=0/n=0 is no-op, k=0 zeros the output block, negative dims throw IllegalArgumentException.
  • Provider: name/priority assertions, isAvailable() on test JDK, registry picks Panama over Scalar when both registered, kill-switch via -Dskainet.cpu.vector.enabled=false disables it.

🤖 Generated with Claude Code

Implements `PanamaVectorMatmulKernel` (jdk.incubator.vector,
FloatVector + fma + reduceLanes) and `PanamaVectorKernelProvider`
against the kernel SPI from PR #554. Picks up automatically over
`ScalarKernelProvider` once registered, and respects the existing
`-Dskainet.cpu.vector.enabled=false` kill switch. Closes the M5
"Panama-first" half of the JVM perf milestone plan.

Routing `DefaultCpuOpsJvm.matmul` through the SPI and adding a
ServiceLoader-based auto-registration are deferred to follow-ups so
this PR stays focused on the kernel itself.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@michalharakal michalharakal marked this pull request as ready for review April 28, 2026 12:22
@michalharakal michalharakal merged commit a5e5f93 into develop Apr 28, 2026
6 checks passed
@michalharakal michalharakal deleted the feature/jvm-panama-fp32-matmul-kernel branch April 28, 2026 12:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant