Skip to content

cleanup(gpu): delete GPU stubs and migrate native benchmark to DSL#131

Merged
michalharakal merged 1 commit intodevelopfrom
cleanup/drop-gpu-stubs
May 4, 2026
Merged

cleanup(gpu): delete GPU stubs and migrate native benchmark to DSL#131
michalharakal merged 1 commit intodevelopfrom
cleanup/drop-gpu-stubs

Conversation

@michalharakal
Copy link
Copy Markdown
Contributor

Summary

There is no real GPU support in this repo — the GPU code paths in
`:llm-runtime:kllama` and the native benchmark engine are placeholders
that have never run on GPU and always fall back to CPU. This PR deletes
them and migrates the native benchmark to the DSL path, mirroring the
JVM cleanup from #127.

  • Deleted: `GpuAttentionBackend.kt`, `GpuTensorBridge.kt` from kllama commonMain
  • Stripped from kllama backend expect/actual (linux/macos/ios): `createGpuTensorBridge` and `createGraphAccelerator` (the latter was unused dead code — `GraphAccelerator` interface and the JVM `FusedQKVAccelerator` impl are CPU-side and stay)
  • Stripped from llm-performance macosMain: `createMetalContext`, `createMlxContext`, `createGpuBridge` (only `availableNativeBackends` remains)
  • Rewrote `NativeBenchmarkEngine`: dropped `GpuNativeLlamaAdapter` and the Metal/MLX scenario adapters; renamed scenario `native-backend-throughput` → `native-cpu-throughput`; migrated CPU adapter from legacy `LlamaRuntime` + `LlamaIngestion` to the DSL path (`DecoderGgufWeightLoader` + `LlamaNetworkLoader.fromWeights` + `OptimizedLLMRuntime` DIRECT)
  • Updated `AttentionBackend` kdoc to drop the `GpuAttentionBackend` reference

Net: +73 / −479 across 9 files.

Test plan

  • `./gradlew :llm-runtime:kllama:compileKotlinJvm :llm-runtime:kllama:compileKotlinLinuxX64 :llm-runtime:kllama:compileKotlinMacosArm64 :llm-runtime:kllama:compileKotlinWasmJs` — all green
  • `./gradlew :llm-performance:compileKotlinJvm :llm-performance:compileKotlinMacosArm64 :llm-performance:compileKotlinWasmJs` — all green
  • `./gradlew :llm-runtime:kllama:jvmTest :llm-performance:jvmTest` — pass
  • (optional) Run `bash tests/smoke/smoke-test.sh` to confirm CLI smoke harness still passes

🤖 Generated with Claude Code

Removes the placeholder GPU code paths in :llm-runtime:kllama and the
native benchmark engine. There is no real GPU support in this repo —
GpuAttentionBackend, GpuTensorBridge, and the createGpuBridge /
createMetalContext / createMlxContext expect/actual chains were stubs
that always fell back to CPU.

- Delete GpuAttentionBackend.kt and GpuTensorBridge.kt
- Strip createGpuTensorBridge / createGraphAccelerator from kllama
  BackendExpect.kt and the linux/macos/ios actuals (createGraphAccelerator
  was unused dead code)
- Drop createMetalContext / createMlxContext / createGpuBridge from
  llm-performance macosMain; only availableNativeBackends remains
- Rewrite NativeBenchmarkEngine: drop GpuNativeLlamaAdapter and the
  Metal/MLX scenario adapters; rename scenario to native-cpu-throughput
  and migrate the CPU adapter to the DSL path (DecoderGgufWeightLoader
  + LlamaNetworkLoader.fromWeights + OptimizedLLMRuntime DIRECT),
  mirroring #127's JVM cleanup
- Drop GpuAttentionBackend reference from AttentionBackend kdoc

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@michalharakal michalharakal merged commit 3d5cd31 into develop May 4, 2026
2 checks passed
@michalharakal michalharakal deleted the cleanup/drop-gpu-stubs branch May 4, 2026 22:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant