fix(gpu): inference-grpc hard-fail on no-GPU (companion to #1312)#1314
Merged
Conversation
Companion to codex's #1312 (orpheus same-shape fix). Closes the inference-grpc CPU-fallback path supervisor vhsm-d1f4 flagged in audit pass 1 finding #2 (2026-05-16). Evaded the codified no_cpu_fallback_contract.rs test (only inspects llamacpp / ort_providers / llamacpp_adapter, not workers/inference-grpc). Pre-fix select_best_device tried CUDA, tried Metal, then printed 'Using CPU (no GPU acceleration)' and returned Device::Cpu. - select_best_device now returns Result<Device, Box<dyn Error>> - caller propagates via ?, no behavior change on GPU-available hosts - Error message names what to do - cargo check clean: --features metal
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes the inference-grpc CPU-fallback path supervisor vhsm-d1f4 flagged in audit pass 1 finding #2 (2026-05-16). Same shape + rationale as codex's #1312 for orpheus.
Pre-fix
select_best_devicetried CUDA, tried Metal, then printed "Using CPU (no GPU acceleration)" and returnedDevice::Cpu. Same "code in fallbacks" anti-pattern Joel flagged at 900% CPU.Change
select_best_device→Result<Device, Box<dyn Error + Send + Sync>>?— no behavior change on GPU-available hosts--featureson a host with the matching GPUn_gpu_layers: -1GPU-only contractno_cpu_fallback_contract.rstest (only inspects llamacpp / ort_providers / llamacpp_adapter, not workers/inference-grpc)Test plan
cargo check --features metalclean🤖 Generated with Claude Code