fix: add GPU EP to Kokoro/Orpheus/Silero ORT sessions (#964 series PR #2) by joelteply · Pull Request #991 · CambrianTech/continuum

joelteply · 2026-05-01T23:52:07Z

What

Continues the GPU-fallback-removal series started in #985. PR #1 (#985) fixed the 3 sites with broken `feature = "coreml"` cfg gates (embedding, piper, moonshine). This PR (#2) covers the 4 sites that configured NO Execution Provider at all — they relied on ORT's implicit CPU EP, which is the same silent-fallback shape per Joel's architectural rule (2026-05-01: "lack of GPU integration is forbidden, GPU acceleration in all cases").

Sites updated (all use the centralized helper from #985)

File	Component
`live/audio/tts/kokoro.rs`	Kokoro TTS
`live/audio/tts/orpheus.rs`	Orpheus SNAC decoder
`live/audio/vad/silero.rs`	Silero VAD
`live/audio/vad/silero_raw.rs`	Silero VAD (raw ORT)

Each call site is identical in shape: insert one `build_ort_gpu_execution_providers()` call between `Session::builder()` and `with_optimization_level()`. No other behaviour change.

Note on Silero VAD perf

Silero is small (<2 MB) and per-frame; on its own a CPU EP would arguably be faster than CoreML/CUDA due to host↔GPU transfer overhead. But ORT's runtime decides per-op assignment once it sees the model graph + the GPU device profile, so any genuine perf trade-off is ORT's call. Per the architectural rule, we provide the GPU EP — ORT optimises from there.

Test

`cargo check -p continuum-core --features metal`: PASSES (verified locally on M5; new EP-attachment compiles + integrates with the existing helper from fix(#964): repair broken ORT GPU EP cfg gating + centralize provider helper #985)

Note on PR cycle

Branch named `mac-pr/...` to disambiguate from another AI working in the same workspace; this PR was rescued via SHA-to-ref push after a parallel-git race contaminated three earlier branch names. Setting up git worktrees per-AI as a permanent fix.

Out of scope (queued for PR #3 + later in series)

`gpu/memory_manager.rs:799 detect_cpu_fallback()` — silent "no GPU, use 25% RAM" fallback. Replace with hard-fail.
`persona/allocator.rs:165` — explicit `"cpu"` GPU-type branch.
ROCm / DirectML / OpenVINO EP coverage in `ort_providers.rs`.

🤖 Generated with Claude Code

) Continues the GPU-fallback-removal series started in #985. PR #1 (#985) fixed the 3 sites with broken `feature = "coreml"` cfg gates (embedding, piper, moonshine). This PR (#2) covers the 4 sites that configured NO Execution Provider at all — they relied on ORT's implicit CPU EP, which is the same silent-fallback shape per Joel's architectural rule (2026-05-01: "lack of GPU integration is forbidden, GPU acceleration in all cases"). Sites updated (all use the centralized helper from #985): - live/audio/tts/kokoro.rs (Kokoro TTS) - live/audio/tts/orpheus.rs (Orpheus SNAC decoder) - live/audio/vad/silero.rs (Silero VAD) - live/audio/vad/silero_raw.rs (Silero VAD raw) Each call site is identical in shape: insert one `build_ort_gpu_execution_providers()` call between `Session::builder()` and `with_optimization_level()`. No other behaviour change. ## Note on Silero VAD perf Silero is small (<2 MB) and per-frame; on its own a CPU EP would arguably be faster than CoreML/CUDA due to host↔GPU transfer overhead. But ORT's runtime decides per-op assignment once it sees the model graph + the GPU device profile, so any genuine perf trade-off is ORT's call. Per the architectural rule, we provide the GPU EP — ORT optimises from there. ## Test - cargo check -p continuum-core --features metal: PASSES (verified locally on M5; new EP-attachment compiles + integrates with the existing helper from #985) ## Out of scope (queued for PR #3 + later in series) - gpu/memory_manager.rs:799 detect_cpu_fallback() — silent "no GPU, use 25% RAM" fallback. Replace with hard-fail. - persona/allocator.rs:165 — explicit "cpu" GPU-type branch. - ROCm / DirectML / OpenVINO EP coverage in ort_providers.rs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…#1000) Per Joel's "100% free OOTB on MacBook Air on up, canary e2e working from curl, Carl's case" — the existing smoke probe only validates the page renders, not that a chat actually gets an AI reply. That's the true Carl-impact gate: if Carl types "hello" + gets nothing, the install isn't shippable, regardless of whether /health returned 200. This extends the smoke script with a 4th phase: 4. End-to-end chat: - Locate jtag binary (3 search paths) - Send a unique probe message to #general - Detect #994's "no listener" warning → exit 6 (distinct failure) - Poll chat/export for an AI reply (default 90s timeout) - On reply: report latency in PASS banner - On timeout: list root-cause diagnostic commands per #964/#980 series Exit codes (extends 0-3 from existing): 4 — chat/send command failed (system not ready for chat at all) 5 — no AI reply within timeout (the main Carl-blocker shape — silent AI) 6 — chat/send accepted but reported NO PERSONAS (#994 warning) — distinct from 5: "no AI" vs "AI didn't respond" CARL_CHAT_TIMEOUT_SEC env override (default 90s) for slow first-runs where DMR is cold-loading the persona model. The diagnostic message on exit 5 lists the post-#980 fix points so a future regression has an obvious starting checklist: - #997's 'local' default routing (cloud fallback dropped) - DMR running (Docker Desktop 4.62+ check from install.sh) - GPU EP cfg (#985/#991 fixed broken cfg gates) - Persona model pulled into DMR - NEW-A SIGABRT (tracked upstream as ggml-org/llama.cpp#22593) Now CI's carl-install-smoke gate proves the OOTB chain works end-to-end, not just up to the page render. Co-authored-by: Test <test@test.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions Bot added the size: S label May 1, 2026

joelteply merged commit 02b2379 into canary May 1, 2026
3 checks passed

joelteply deleted the mac-pr/gpu-ep-tts-vad-v3 branch May 1, 2026 23:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: add GPU EP to Kokoro/Orpheus/Silero ORT sessions (#964 series PR #2)#991

fix: add GPU EP to Kokoro/Orpheus/Silero ORT sessions (#964 series PR #2)#991
joelteply merged 1 commit into
canaryfrom
mac-pr/gpu-ep-tts-vad-v3

joelteply commented May 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

joelteply commented May 1, 2026

What

Sites updated (all use the centralized helper from #985)

Note on Silero VAD perf

Test

Note on PR cycle

Out of scope (queued for PR #3 + later in series)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant