bench(kernel): KernelMatmulBench — scalar vs Panama (M5 evidence) by michalharakal · Pull Request #558 · SKaiNET-developers/SKaiNET

michalharakal · 2026-04-28T13:38:37Z

Summary

Adds KernelMatmulBench to :skainet-backends:benchmarks:jvm-cpu-jmh — a direct Fp32MatmulKernel.matmul JMH harness with provider ∈ {scalar, panama} and size ∈ {256, 512, 1024}.
Validates the M5 milestone target (Panama ≥1.5× scalar through the kernel SPI) without entanglement from ctx.ops.matmul routing — that routing change is the next follow-up.
Adds skainet-backend-api as a direct dep on the bench module so JMH sources can see the SPI types directly.
Documents the new bench in docs/.../perf/jvm-cpu.adoc.

Local run — JDK 21.0.10, M-series macOS

size	scalar (ms/op)	panama (ms/op)	speedup
256	9.454 ± 0.364	1.356 ± 0.041	6.97×
512	79.679 ± 0.754	13.620 ± 0.109	5.85×
1024	862.754 ± 40.256	118.242 ± 0.507	7.30×

JMH config: --enable-preview --add-modules jdk.incubator.vector, 3 warmup × 10s + 5 measurement × 10s, 1 fork. Same input seeding as the existing MatmulBench so cross-bench comparison is meaningful.

Reference: MatmulBench (full op-level path, BLAS off, vector on) on the same machine clocks 9.74 ms @ 512², slightly faster than this kernel's 13.62 ms @ 512². The gap is the cache-blocked tiled implementation in JvmVectorKernels.matmulFloatBlocked that the production routing currently calls; the SPI kernel uses a simpler FMA + B^T pack. Closing that gap by porting the tiling into the SPI kernel is a fair follow-up if the production bench numbers regress after the routing change.

Why direct kernel benching

MatmulBench exercises ctx.ops.matmul, which today still calls JvmVectorKernels directly (not the SPI). Until that routing change lands, only this new bench reflects scalar-vs-Panama through the kernel SPI in isolation. Once routing flips, the existing MatmulBench will exercise the same provider end-to-end and we can decide whether to keep both benches or fold one in.

Test plan

./gradlew :skainet-backends:benchmarks:jvm-cpu-jmh:jmhCompileGeneratedClasses — compiles cleanly.
./gradlew :skainet-backends:benchmarks:jvm-cpu-jmh:jmh -Pjmh.include=KernelMatmulBench — produces the numbers above.

Follow-ups (still in M5 hopper)

Route DefaultCpuOpsJvm.matmul through KernelRegistry.
ServiceLoader auto-discovery for kernel providers.
Cache-blocked variant of PanamaVectorMatmulKernel if/when production routing exposes a regression vs the current matmulFloatBlocked path.

🤖 Generated with Claude Code

Direct Fp32MatmulKernel.matmul JMH harness, sizes 256/512/1024, provider param toggles ScalarMatmulKernel vs PanamaVectorMatmulKernel. Used to validate the M5 milestone target (Panama ≥1.5× scalar) without entanglement from the rest of the op pipeline. Local run on JDK 21.0.10 (M-series macOS) clears the target comfortably: size scalar panama speedup 256 9.454ms 1.356ms 6.97x 512 79.68ms 13.62ms 5.85x 1024 862.8ms 118.2ms 7.30x Adds skainet-backend-api as a direct dep on the bench module so the JMH source set can see the kernel SPI types, and documents the new bench in docs/.../perf/jvm-cpu.adoc. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-04-28T13:40:57Z

📖 Documentation Preview

The documentation has been built successfully for this PR.

Generated Files:

Operator documentation: docs/modules/operators/_generated_/
JSON schema output: operators.json

Artifacts:

Download the documentation-preview-558 artifact to view the complete documentation locally.

This comment will be updated automatically when the PR is updated.

michalharakal marked this pull request as ready for review April 28, 2026 13:48

michalharakal merged commit 1487c3a into develop Apr 28, 2026
10 checks passed

michalharakal deleted the feature/jvm-panama-kernel-jmh branch April 28, 2026 13:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bench(kernel): KernelMatmulBench — scalar vs Panama (M5 evidence)#558

bench(kernel): KernelMatmulBench — scalar vs Panama (M5 evidence)#558
michalharakal merged 1 commit intodevelopfrom
feature/jvm-panama-kernel-jmh

michalharakal commented Apr 28, 2026

Uh oh!

github-actions Bot commented Apr 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

michalharakal commented Apr 28, 2026

Summary

Local run — JDK 21.0.10, M-series macOS

Why direct kernel benching

Test plan

Follow-ups (still in M5 hopper)

Uh oh!

github-actions Bot commented Apr 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant