Skip to content

fix(audio/tts): orpheus must fail-closed on no-Metal, no CPU fallback#1312

Merged
joelteply merged 1 commit into
canaryfrom
fix/orpheus-no-cpu-fallback
May 16, 2026
Merged

fix(audio/tts): orpheus must fail-closed on no-Metal, no CPU fallback#1312
joelteply merged 1 commit into
canaryfrom
fix/orpheus-no-cpu-fallback

Conversation

@joelteply
Copy link
Copy Markdown
Contributor

Summary

vhsm-d1f4 (Joel via vHSM cwd) flagged `orpheus.rs:181-190` in audit pass 1 (2026-05-16): explicit Metal→CPU fallback with a friendly `"Orpheus: Using CPU (with Accelerate BLAS)"` log. The fallback evaded `tests/no_cpu_fallback_contract.rs` because that test only inspects llamacpp.rs/ort_providers.rs/llamacpp_adapter.rs — Candle-side TTS slipped through.

Joel attributes the 900% CPU pathology seen during chat to this class of silent fallback: render loop is sacred per README, main thread should not be doing inference, but Orpheus CPU+Accelerate BLAS via candle ends up doing exactly that.

What changes

  • `select_device() -> Device` → `select_device() -> Result<Device, TTSError>`
  • On Metal failure: returns `TTSError::ModelNotLoaded("Orpheus requires Metal GPU; no CPU fallback. Device::new_metal(0) failed: {e}")`
  • Caller at line 550 propagates with `?`
  • The "Using CPU" log line is gone; only the success-path Metal log remains

What does NOT change

  • Behavior on Metal-capable hosts: identical
  • SNAC decoder ORT path already required GPU EP (lines 196-208); this PR brings the GGUF/candle path to the same standard

Vision Pro test (per Joel's TS-vs-Rust criterion)

The Rust crate compiles + tests standalone with this change. Caller (any host — Swift on Vision Pro, Node on Mac, whatever) sees a typed `TTSError` if Metal unavailable. No JS runtime involved.

Why this is safe

  • `cargo check + clippy clean` (146 warnings = baseline 146, no regression)
  • All Mac dev hosts have Metal; production runtime contract per README requires GPU
  • Error surface is typed so callers can fall through to alternative TTS engines if registered, or fail-loud — no silent CPU drift

VDD note (per vhsm-d1f4 audit pass 2)

This PR is defensive — it removes the SOURCE of the CPU pathology rather than tuning the hot path. tok/s measurement isn't applicable to a deletion. Whoever owns Phase A.8 next can measure aggregate tok/s with Orpheus load now correctly gated.

Follow-ups (separate PRs)

🤖 Generated with Claude Code

vhsm-d1f4 (Joel via vHSM cwd) flagged this in the 2026-05-16 audit pass:
orpheus.rs:179-191 had explicit Metal→CPU fallback with a friendly
"Orpheus: Using CPU (with Accelerate BLAS)" log. The fallback evaded
tests/no_cpu_fallback_contract.rs because that test only inspects
llamacpp.rs/ort_providers.rs/llamacpp_adapter.rs — Candle-side TTS
slipped through.

Joel's audit attributes the 900% CPU pathology seen during chat to this
class of silent fallback: render loop is sacred per the README, main
thread should not be doing inference, but Orpheus CPU+Accelerate BLAS
via candle ends up doing exactly that.

What changes:
- select_device() -> Device becomes select_device() -> Result<Device, TTSError>
- On Metal failure, returns TTSError::ModelNotLoaded with explicit
  "Orpheus requires Metal GPU; no CPU fallback. Device::new_metal(0)
  failed: {e}"
- Caller at line 550 propagates with ?
- The "Using CPU" log line is gone; only the success-path Metal log
  remains

What does NOT change:
- Behavior on Metal-capable hosts: identical
- SNAC decoder ORT path already required GPU EP (lines 196-208); this
  PR brings the GGUF/candle path to the same standard
- TTS engine selection elsewhere — if Orpheus refuses to load, the
  caller can register a different TTS engine or surface to operator

Why this is safe:
- cargo check + clippy clean (146 warnings, baseline 146 = no regression)
- All Mac dev hosts have Metal; production runtime contract per README
  requires GPU
- Error surface is typed (TTSError::ModelNotLoaded) so callers can
  fall through to alternative TTS engines if registered, or fail-loud
  otherwise — no silent CPU drift

VDD note (per vhsm-d1f4 audit pass 2): this PR is defensive (prevents
the CPU pathology); tok/s measurement isn't applicable because it
removes the SOURCE of the pathology rather than tuning the hot path.
Whoever owns Phase A.8 next can measure aggregate tok/s with Orpheus
load now correctly gated.

Follow-ups (separate PRs):
- src/workers/inference-grpc/src/model.rs:275-295 same CUDA→Metal→CPU
  fallback (vhsm-d1f4 finding #2)
- Widen tests/no_cpu_fallback_contract.rs to grep whole workers tree
  for Device::Cpu, require allow-list justification (finding #3)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
joelteply added a commit that referenced this pull request May 16, 2026
Companion to codex's #1312 (orpheus same-shape fix). Closes the
inference-grpc CPU-fallback path supervisor vhsm-d1f4 flagged in
audit pass 1 finding #2 (2026-05-16). Evaded the codified
no_cpu_fallback_contract.rs test (only inspects llamacpp /
ort_providers / llamacpp_adapter, not workers/inference-grpc).

Pre-fix select_best_device tried CUDA, tried Metal, then printed
'Using CPU (no GPU acceleration)' and returned Device::Cpu.

- select_best_device now returns Result<Device, Box<dyn Error>>
- caller propagates via ?, no behavior change on GPU-available hosts
- Error message names what to do
- cargo check clean: --features metal

Co-authored-by: Test <test@test.com>
@joelteply joelteply merged commit b4845f4 into canary May 16, 2026
3 checks passed
@joelteply joelteply deleted the fix/orpheus-no-cpu-fallback branch May 16, 2026 21:18
joelteply added a commit that referenced this pull request May 16, 2026
…aths (PIECE-5 + #1314 + #1312 layers) (#1341)

Co-authored-by: Test <test@test.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant