Skip to content

bench(kernel): KernelMatmulBench — scalar vs Panama (M5 evidence)#558

Merged
michalharakal merged 1 commit intodevelopfrom
feature/jvm-panama-kernel-jmh
Apr 28, 2026
Merged

bench(kernel): KernelMatmulBench — scalar vs Panama (M5 evidence)#558
michalharakal merged 1 commit intodevelopfrom
feature/jvm-panama-kernel-jmh

Conversation

@michalharakal
Copy link
Copy Markdown
Contributor

Summary

  • Adds KernelMatmulBench to :skainet-backends:benchmarks:jvm-cpu-jmh — a direct Fp32MatmulKernel.matmul JMH harness with provider ∈ {scalar, panama} and size ∈ {256, 512, 1024}.
  • Validates the M5 milestone target (Panama ≥1.5× scalar through the kernel SPI) without entanglement from ctx.ops.matmul routing — that routing change is the next follow-up.
  • Adds skainet-backend-api as a direct dep on the bench module so JMH sources can see the SPI types directly.
  • Documents the new bench in docs/.../perf/jvm-cpu.adoc.

Local run — JDK 21.0.10, M-series macOS

size scalar (ms/op) panama (ms/op) speedup
256 9.454 ± 0.364 1.356 ± 0.041 6.97×
512 79.679 ± 0.754 13.620 ± 0.109 5.85×
1024 862.754 ± 40.256 118.242 ± 0.507 7.30×

JMH config: --enable-preview --add-modules jdk.incubator.vector, 3 warmup × 10s + 5 measurement × 10s, 1 fork. Same input seeding as the existing MatmulBench so cross-bench comparison is meaningful.

Reference: MatmulBench (full op-level path, BLAS off, vector on) on the same machine clocks 9.74 ms @ 512², slightly faster than this kernel's 13.62 ms @ 512². The gap is the cache-blocked tiled implementation in JvmVectorKernels.matmulFloatBlocked that the production routing currently calls; the SPI kernel uses a simpler FMA + B^T pack. Closing that gap by porting the tiling into the SPI kernel is a fair follow-up if the production bench numbers regress after the routing change.

Why direct kernel benching

MatmulBench exercises ctx.ops.matmul, which today still calls JvmVectorKernels directly (not the SPI). Until that routing change lands, only this new bench reflects scalar-vs-Panama through the kernel SPI in isolation. Once routing flips, the existing MatmulBench will exercise the same provider end-to-end and we can decide whether to keep both benches or fold one in.

Test plan

  • ./gradlew :skainet-backends:benchmarks:jvm-cpu-jmh:jmhCompileGeneratedClasses — compiles cleanly.
  • ./gradlew :skainet-backends:benchmarks:jvm-cpu-jmh:jmh -Pjmh.include=KernelMatmulBench — produces the numbers above.

Follow-ups (still in M5 hopper)

  • Route DefaultCpuOpsJvm.matmul through KernelRegistry.
  • ServiceLoader auto-discovery for kernel providers.
  • Cache-blocked variant of PanamaVectorMatmulKernel if/when production routing exposes a regression vs the current matmulFloatBlocked path.

🤖 Generated with Claude Code

Direct Fp32MatmulKernel.matmul JMH harness, sizes 256/512/1024,
provider param toggles ScalarMatmulKernel vs PanamaVectorMatmulKernel.
Used to validate the M5 milestone target (Panama ≥1.5× scalar)
without entanglement from the rest of the op pipeline.

Local run on JDK 21.0.10 (M-series macOS) clears the target
comfortably:

  size  scalar   panama   speedup
  256   9.454ms  1.356ms  6.97x
  512   79.68ms  13.62ms  5.85x
  1024  862.8ms  118.2ms  7.30x

Adds skainet-backend-api as a direct dep on the bench module so the
JMH source set can see the kernel SPI types, and documents the new
bench in docs/.../perf/jvm-cpu.adoc.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

📖 Documentation Preview

The documentation has been built successfully for this PR.

Generated Files:

  • Operator documentation: docs/modules/operators/_generated_/
  • JSON schema output: operators.json

Artifacts:

  • Download the documentation-preview-558 artifact to view the complete documentation locally.

This comment will be updated automatically when the PR is updated.

@michalharakal michalharakal marked this pull request as ready for review April 28, 2026 13:48
@michalharakal michalharakal merged commit 1487c3a into develop Apr 28, 2026
10 checks passed
@michalharakal michalharakal deleted the feature/jvm-panama-kernel-jmh branch April 28, 2026 13:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant