fix(audio/tts): orpheus must fail-closed on no-Metal, no CPU fallback by joelteply · Pull Request #1312 · CambrianTech/continuum

joelteply · 2026-05-16T15:04:56Z

Summary

vhsm-d1f4 (Joel via vHSM cwd) flagged `orpheus.rs:181-190` in audit pass 1 (2026-05-16): explicit Metal→CPU fallback with a friendly `"Orpheus: Using CPU (with Accelerate BLAS)"` log. The fallback evaded `tests/no_cpu_fallback_contract.rs` because that test only inspects llamacpp.rs/ort_providers.rs/llamacpp_adapter.rs — Candle-side TTS slipped through.

Joel attributes the 900% CPU pathology seen during chat to this class of silent fallback: render loop is sacred per README, main thread should not be doing inference, but Orpheus CPU+Accelerate BLAS via candle ends up doing exactly that.

What changes

`select_device() -> Device` → `select_device() -> Result<Device, TTSError>`
On Metal failure: returns `TTSError::ModelNotLoaded("Orpheus requires Metal GPU; no CPU fallback. Device::new_metal(0) failed: {e}")`
Caller at line 550 propagates with `?`
The "Using CPU" log line is gone; only the success-path Metal log remains

What does NOT change

Behavior on Metal-capable hosts: identical
SNAC decoder ORT path already required GPU EP (lines 196-208); this PR brings the GGUF/candle path to the same standard

Vision Pro test (per Joel's TS-vs-Rust criterion)

The Rust crate compiles + tests standalone with this change. Caller (any host — Swift on Vision Pro, Node on Mac, whatever) sees a typed `TTSError` if Metal unavailable. No JS runtime involved.

Why this is safe

`cargo check + clippy clean` (146 warnings = baseline 146, no regression)
All Mac dev hosts have Metal; production runtime contract per README requires GPU
Error surface is typed so callers can fall through to alternative TTS engines if registered, or fail-loud — no silent CPU drift

VDD note (per vhsm-d1f4 audit pass 2)

This PR is defensive — it removes the SOURCE of the CPU pathology rather than tuning the hot path. tok/s measurement isn't applicable to a deletion. Whoever owns Phase A.8 next can measure aggregate tok/s with Orpheus load now correctly gated.

Follow-ups (separate PRs)

`src/workers/inference-grpc/src/model.rs:275-295` — same CUDA→Metal→CPU fallback (vhsm-d1f4 finding Feature: Add CI/CD Configuration #2)
Widen `tests/no_cpu_fallback_contract.rs` to grep whole workers tree for `Device::Cpu` + allow-list (finding Build(deps): Bump actions/stale from 8 to 9 #3)

🤖 Generated with Claude Code

vhsm-d1f4 (Joel via vHSM cwd) flagged this in the 2026-05-16 audit pass: orpheus.rs:179-191 had explicit Metal→CPU fallback with a friendly "Orpheus: Using CPU (with Accelerate BLAS)" log. The fallback evaded tests/no_cpu_fallback_contract.rs because that test only inspects llamacpp.rs/ort_providers.rs/llamacpp_adapter.rs — Candle-side TTS slipped through. Joel's audit attributes the 900% CPU pathology seen during chat to this class of silent fallback: render loop is sacred per the README, main thread should not be doing inference, but Orpheus CPU+Accelerate BLAS via candle ends up doing exactly that. What changes: - select_device() -> Device becomes select_device() -> Result<Device, TTSError> - On Metal failure, returns TTSError::ModelNotLoaded with explicit "Orpheus requires Metal GPU; no CPU fallback. Device::new_metal(0) failed: {e}" - Caller at line 550 propagates with ? - The "Using CPU" log line is gone; only the success-path Metal log remains What does NOT change: - Behavior on Metal-capable hosts: identical - SNAC decoder ORT path already required GPU EP (lines 196-208); this PR brings the GGUF/candle path to the same standard - TTS engine selection elsewhere — if Orpheus refuses to load, the caller can register a different TTS engine or surface to operator Why this is safe: - cargo check + clippy clean (146 warnings, baseline 146 = no regression) - All Mac dev hosts have Metal; production runtime contract per README requires GPU - Error surface is typed (TTSError::ModelNotLoaded) so callers can fall through to alternative TTS engines if registered, or fail-loud otherwise — no silent CPU drift VDD note (per vhsm-d1f4 audit pass 2): this PR is defensive (prevents the CPU pathology); tok/s measurement isn't applicable because it removes the SOURCE of the pathology rather than tuning the hot path. Whoever owns Phase A.8 next can measure aggregate tok/s with Orpheus load now correctly gated. Follow-ups (separate PRs): - src/workers/inference-grpc/src/model.rs:275-295 same CUDA→Metal→CPU fallback (vhsm-d1f4 finding #2) - Widen tests/no_cpu_fallback_contract.rs to grep whole workers tree for Device::Cpu, require allow-list justification (finding #3) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Companion to codex's #1312 (orpheus same-shape fix). Closes the inference-grpc CPU-fallback path supervisor vhsm-d1f4 flagged in audit pass 1 finding #2 (2026-05-16). Evaded the codified no_cpu_fallback_contract.rs test (only inspects llamacpp / ort_providers / llamacpp_adapter, not workers/inference-grpc). Pre-fix select_best_device tried CUDA, tried Metal, then printed 'Using CPU (no GPU acceleration)' and returned Device::Cpu. - select_best_device now returns Result<Device, Box<dyn Error>> - caller propagates via ?, no behavior change on GPU-available hosts - Error message names what to do - cargo check clean: --features metal Co-authored-by: Test <test@test.com>

…aths (PIECE-5 + #1314 + #1312 layers) (#1341) Co-authored-by: Test <test@test.com>

github-actions Bot added the size: S label May 16, 2026

joelteply mentioned this pull request May 16, 2026

fix(gpu): inference-grpc hard-fail on no-GPU (companion to #1312) #1314

Merged

3 tasks

joelteply merged commit b4845f4 into canary May 16, 2026
3 checks passed

joelteply deleted the fix/orpheus-no-cpu-fallback branch May 16, 2026 21:18

joelteply mentioned this pull request May 16, 2026

test(contract): widen no_cpu_fallback_contract to cover Candle-side paths #1341

Merged

joelteply added a commit that referenced this pull request May 16, 2026

test(contract): widen no_cpu_fallback_contract to cover Candle-side p…

6db36a9

…aths (PIECE-5 + #1314 + #1312 layers) (#1341) Co-authored-by: Test <test@test.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(audio/tts): orpheus must fail-closed on no-Metal, no CPU fallback#1312

fix(audio/tts): orpheus must fail-closed on no-Metal, no CPU fallback#1312
joelteply merged 1 commit into
canaryfrom
fix/orpheus-no-cpu-fallback

joelteply commented May 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

joelteply commented May 16, 2026

Summary

What changes

What does NOT change

Vision Pro test (per Joel's TS-vs-Rust criterion)

Why this is safe

VDD note (per vhsm-d1f4 audit pass 2)

Follow-ups (separate PRs)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant