Skip to content

Native macos accelerate simd#449

Merged
michalharakal merged 2 commits intodevelopfrom
feature/native-macos-accelerate-simd
Apr 6, 2026
Merged

Native macos accelerate simd#449
michalharakal merged 2 commits intodevelopfrom
feature/native-macos-accelerate-simd

Conversation

@michalharakal
Copy link
Copy Markdown
Contributor

No description provided.

michalharakal and others added 2 commits March 30, 2026 00:25
The CPU backend on Kotlin/Native macOS currently uses scalar loops for all
tensor operations. This documents the plan to add Apple Accelerate framework
integration (cblas_sgemm, vDSP_*) for 5-20x speedup on native macOS.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace scalar DefaultCpuOps on Apple platforms with AccelerateCpuOps
that dispatches hot-path operations to Apple's Accelerate framework:

- matmul: cblas_sgemm (NEON + AMX hardware acceleration)
- add/subtract/multiply/divide: vDSP_vadd/vsub/vmul/vdiv
- sum/mean: vDSP_sve/meanv
- relu: vDSP_vthres
- silu: optimized scalar loop on contiguous buffer
- transpose: vDSP_mtrans

All operations fall through to DefaultCpuOpsBase for non-FP32,
non-contiguous, or complex broadcasting cases.

Split PlatformCpuOpsFactory from single nativeMain actual into:
- appleMain (macOS + iOS) → AccelerateCpuOps
- linuxMain → DefaultCpuOps (scalar fallback)

All 23 existing test suites pass on macosArm64 with zero failures.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@michalharakal michalharakal merged commit a2fa299 into develop Apr 6, 2026
5 checks passed
@michalharakal michalharakal deleted the feature/native-macos-accelerate-simd branch April 6, 2026 08:28
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 6, 2026

📖 Documentation Preview

The documentation has been built successfully for this PR.

Generated Files:

  • Operator documentation: docs/modules/operators/_generated_/
  • JSON schema output: operators.json

Artifacts:

  • Download the documentation-preview-449 artifact to view the complete documentation locally.

This comment will be updated automatically when the PR is updated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant