fix(gpu): hard-fail on no-GPU instead of silent CPU 25%-RAM fallback by joelteply · Pull Request #998 · CambrianTech/continuum

joelteply · 2026-05-02T02:26:33Z

Joel "GPU 100%" rule applied to memory_manager.rs. Pre-fix: silent CPU degrade with fake "CPU (no GPU)" device. Post-fix: panic with same actionable diagnostic message install.sh uses on IC_GPU_PATH=unsupported.

🤖 Generated with Claude Code

Per Joel's architectural rule "lack of GPU integration is forbidden, GPU acceleration in all cases" (#964 series, GPU-fallback audit). memory_manager.rs's detect_gpu() chained Metal → CUDA → CPU fallback, where the CPU fallback returned a budget of "25% of system RAM" with the device name "CPU (no GPU)". That's the silent-degrade vector this rule explicitly forbids — continuum-core would silently start with a fake "GPU" budget against system RAM, then run inference on CPU through whatever path picked it up. Fix: panic with the same actionable message install.sh's `IC_GPU_PATH=unsupported` branch uses — name supported paths, point at diagnostic commands per platform, link to the issue tracker. Removed: - CPU_FALLBACK_RAM_PCT constant (only consumer was the deleted fn) - detect_cpu_fallback() function Behaviour delta: - macOS without Metal-capable GPU: previously silent 25%-RAM "GPU"; now panics with diagnostic - Linux without CUDA-capable GPU + no --features cuda: same - Mac with Metal: unchanged (detect_metal returns Some) - Linux with --features cuda + working nvidia-smi: unchanged (detect_cuda returns Some) Test (cargo check --features metal,accelerate): clean. Out of scope (next PRs in series): - persona/allocator.rs:165 — explicit "cpu" GPU-type branch - ROCm / Vulkan / OpenVINO EP coverage in inference/ort_providers.rs Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…gpu_type (#999) Per #964-series GPU-fallback audit + Joel's "lack of GPU integration is forbidden" rule. PR #998 made memory_manager::detect_gpu() panic when no GPU is found, so a "cpu" gpu_name can never reach detect_gpu_type in production. Removing the branch cleans up the dead path. If somehow a "cpu" gpu_name still arrives (e.g. a test stub), it now falls back to the OS-default GPU type ("metal" on Mac, "cuda" on Linux) — a best-guess that lets the caller proceed against a real GPU subsystem rather than configuring a non-existent "cpu" subsystem that no inference path actually serves. Test updated: - assert_eq!(detect_gpu_type("CPU"), "cpu") removed - replaced with cfg-gated assertions matching new OS-default behaviour - real GPU detections (NVIDIA, Apple M-series) unchanged cargo test --features metal,accelerate --lib persona::allocator:: tests::test_detect_gpu_type: PASS. Co-authored-by: Test <test@test.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

… just CUDA (#1002) * feat(gpu): add ROCm / DirectML / OpenVINO ORT EP cfg branches Per Joel's "OOTB on all architectures from Docker" + "5090 Windows box available later." Extends the ORT GPU EP coverage from #985 (Mac/CUDA only) to the full Carl-OOTB matrix: --features rocm → AMD GPU (Linux). ROCmExecutionProvider. --features directml → Windows-native, any DX12 GPU (Nvidia/AMD/Intel). --features openvino → Intel CPU/GPU/VPU (Linux + Windows). Each is a cfg-gated branch in build_ort_gpu_execution_providers(). The no-GPU-EP-configured error message now lists all 5 features so a contributor on a new arch sees the right --features incantation. Cargo.toml feature definitions added at lines ~199-207. Per Joel's "GPU 100%" rule the EPs only activate when explicitly built with the matching feature flag — no runtime CPU fallback. Build verified: cargo check --features metal,accelerate clean (the new cfg branches don't fire on this Mac, no compile cost). Validation needed on real hardware: - BigMama or 5090 Windows box: --features cuda + --features directml - Linux+AMD box (when available): --features rocm - Intel-Arc Linux box (rarer): --features openvino Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(install): cargo-features.sh detects ROCm + Vulkan + DirectML, not just CUDA Per Joel's "OOTB on all architectures from Docker" + the ORT EP coverage added in #1001. Pre-fix the script only mapped Mac→metal + Linux+Nvidia→cuda; ROCm was commented-out, Vulkan absent, Windows- native unhandled entirely. Detection order on Linux: 1. nvidia-smi → cuda (highest priority — full ORT/llama.cpp/Candle) 2. rocminfo → rocm (AMD with ROCm runtime, full ORT EP) 3. vulkaninfo → vulkan (AMD/Intel without ROCm; llama.cpp Vulkan path; ORT EPs absent — will hard-fail at session create per #985's helper, surfacing the gap clearly) 4. else: empty → continuum-core panics at startup per #998 (no CPU fallback per architectural rule) Windows-native (MINGW/MSYS/CYGWIN): - DirectML always (DX12 universal on Win10+) - +CUDA if nvidia-smi present (ORT picks CUDA first, DirectML for non-CUDA-supported ops) Tested on this Mac: still resolves to "--features metal,accelerate" (unchanged — Darwin branch). Validation needed on real hardware: - 5090 Windows box: should resolve to "--features cuda,directml" - BigMama Linux+Nvidia: still "--features cuda,load-dynamic-ort" (unchanged) - Future Linux+AMD: will resolve to "--features rocm,load-dynamic-ort" - Future Linux+Intel-Arc with Vulkan loader: "--features vulkan, load-dynamic-ort" Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Test <test@test.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ok Air on up" (#1003) * feat(gpu): add ROCm / DirectML / OpenVINO ORT EP cfg branches Per Joel's "OOTB on all architectures from Docker" + "5090 Windows box available later." Extends the ORT GPU EP coverage from #985 (Mac/CUDA only) to the full Carl-OOTB matrix: --features rocm → AMD GPU (Linux). ROCmExecutionProvider. --features directml → Windows-native, any DX12 GPU (Nvidia/AMD/Intel). --features openvino → Intel CPU/GPU/VPU (Linux + Windows). Each is a cfg-gated branch in build_ort_gpu_execution_providers(). The no-GPU-EP-configured error message now lists all 5 features so a contributor on a new arch sees the right --features incantation. Cargo.toml feature definitions added at lines ~199-207. Per Joel's "GPU 100%" rule the EPs only activate when explicitly built with the matching feature flag — no runtime CPU fallback. Build verified: cargo check --features metal,accelerate clean (the new cfg branches don't fire on this Mac, no compile cost). Validation needed on real hardware: - BigMama or 5090 Windows box: --features cuda + --features directml - Linux+AMD box (when available): --features rocm - Intel-Arc Linux box (rarer): --features openvino Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(install): cargo-features.sh detects ROCm + Vulkan + DirectML, not just CUDA Per Joel's "OOTB on all architectures from Docker" + the ORT EP coverage added in #1001. Pre-fix the script only mapped Mac→metal + Linux+Nvidia→cuda; ROCm was commented-out, Vulkan absent, Windows- native unhandled entirely. Detection order on Linux: 1. nvidia-smi → cuda (highest priority — full ORT/llama.cpp/Candle) 2. rocminfo → rocm (AMD with ROCm runtime, full ORT EP) 3. vulkaninfo → vulkan (AMD/Intel without ROCm; llama.cpp Vulkan path; ORT EPs absent — will hard-fail at session create per #985's helper, surfacing the gap clearly) 4. else: empty → continuum-core panics at startup per #998 (no CPU fallback per architectural rule) Windows-native (MINGW/MSYS/CYGWIN): - DirectML always (DX12 universal on Win10+) - +CUDA if nvidia-smi present (ORT picks CUDA first, DirectML for non-CUDA-supported ops) Tested on this Mac: still resolves to "--features metal,accelerate" (unchanged — Darwin branch). Validation needed on real hardware: - 5090 Windows box: should resolve to "--features cuda,directml" - BigMama Linux+Nvidia: still "--features cuda,load-dynamic-ort" (unchanged) - Future Linux+AMD: will resolve to "--features rocm,load-dynamic-ort" - Future Linux+Intel-Arc with Vulkan loader: "--features vulkan, load-dynamic-ort" Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(install): tier hardware (MBA / mid / primary) for "OOTB on MacBook Air on up" Per Joel's "100% free OOTB on MacBook Air on up, accessible, high school computer" + "we are just trying to make a viable release candidate." Pre-fix install.sh required 28GB physical RAM and rejected 16GB MBAs with "Get a 32GB+ M-series" — categorically wrong for the stated MBA target. Three tiers based on Mac physical RAM: | Tier | RAM | Native budget | PERSONA_MODEL | |---------|-----------|---------------|---------------------------------| | MBA | 16-23GB | 5GB | qwen3.5-0.8b-general-forged (~500MB) | | mid | 24-31GB | 8GB | qwen3.5-2b-general-forged (~1.4GB) | | primary | 32GB+ | 12GB | qwen3.5-4b-code-forged-GGUF (~2.7GB; original) | | reject | <16GB | n/a | hard-fail with actionable message | Previously hardcoded NATIVE_RESERVE_MIB=12GB + DOCKER_FLOOR=10GB = 22GB headroom alone (28GB+ total). Now MBA tier needs 5+6+4 = 15GB total minimum, which fits a 16GB MBA with ~1GB headroom for working set spikes. PERSONA_MODEL tiering uses the existing public continuum-ai org models (all gated:False per earlier audit). All three remain HF-public so Carl never needs an HF token regardless of tier. CONTINUUM_TIER env var is exported so future code paths (compose env, runtime feature gates for Bevy/vision/audio) can consult it. This PR doesn't yet skip Bevy/vision pull on MBA tier — that's a follow-up once the runtime supports a chat-only mode flag. Failure message rewritten to be actionable: - Names the specific minimums + what each subsystem reserves - Says "16GB MBA: chat-only OOTB works (smaller model). For 32GB+: full multimodal experience." — gives the user a sense of what they get at each tier instead of just a price-tag rejection. Validation needed: - 16GB MBA (when available): expect tier=MBA, install completes, chat works with 0.8B model - 32GB M-series (Joel's M5 today): expect tier=primary, no behavior change from current (same model, same budgets) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Test <test@test.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…us (#1004) * feat(gpu): add ROCm / DirectML / OpenVINO ORT EP cfg branches Per Joel's "OOTB on all architectures from Docker" + "5090 Windows box available later." Extends the ORT GPU EP coverage from #985 (Mac/CUDA only) to the full Carl-OOTB matrix: --features rocm → AMD GPU (Linux). ROCmExecutionProvider. --features directml → Windows-native, any DX12 GPU (Nvidia/AMD/Intel). --features openvino → Intel CPU/GPU/VPU (Linux + Windows). Each is a cfg-gated branch in build_ort_gpu_execution_providers(). The no-GPU-EP-configured error message now lists all 5 features so a contributor on a new arch sees the right --features incantation. Cargo.toml feature definitions added at lines ~199-207. Per Joel's "GPU 100%" rule the EPs only activate when explicitly built with the matching feature flag — no runtime CPU fallback. Build verified: cargo check --features metal,accelerate clean (the new cfg branches don't fire on this Mac, no compile cost). Validation needed on real hardware: - BigMama or 5090 Windows box: --features cuda + --features directml - Linux+AMD box (when available): --features rocm - Intel-Arc Linux box (rarer): --features openvino Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(install): cargo-features.sh detects ROCm + Vulkan + DirectML, not just CUDA Per Joel's "OOTB on all architectures from Docker" + the ORT EP coverage added in #1001. Pre-fix the script only mapped Mac→metal + Linux+Nvidia→cuda; ROCm was commented-out, Vulkan absent, Windows- native unhandled entirely. Detection order on Linux: 1. nvidia-smi → cuda (highest priority — full ORT/llama.cpp/Candle) 2. rocminfo → rocm (AMD with ROCm runtime, full ORT EP) 3. vulkaninfo → vulkan (AMD/Intel without ROCm; llama.cpp Vulkan path; ORT EPs absent — will hard-fail at session create per #985's helper, surfacing the gap clearly) 4. else: empty → continuum-core panics at startup per #998 (no CPU fallback per architectural rule) Windows-native (MINGW/MSYS/CYGWIN): - DirectML always (DX12 universal on Win10+) - +CUDA if nvidia-smi present (ORT picks CUDA first, DirectML for non-CUDA-supported ops) Tested on this Mac: still resolves to "--features metal,accelerate" (unchanged — Darwin branch). Validation needed on real hardware: - 5090 Windows box: should resolve to "--features cuda,directml" - BigMama Linux+Nvidia: still "--features cuda,load-dynamic-ort" (unchanged) - Future Linux+AMD: will resolve to "--features rocm,load-dynamic-ort" - Future Linux+Intel-Arc with Vulkan loader: "--features vulkan, load-dynamic-ort" Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(install): tier hardware (MBA / mid / primary) for "OOTB on MacBook Air on up" Per Joel's "100% free OOTB on MacBook Air on up, accessible, high school computer" + "we are just trying to make a viable release candidate." Pre-fix install.sh required 28GB physical RAM and rejected 16GB MBAs with "Get a 32GB+ M-series" — categorically wrong for the stated MBA target. Three tiers based on Mac physical RAM: | Tier | RAM | Native budget | PERSONA_MODEL | |---------|-----------|---------------|---------------------------------| | MBA | 16-23GB | 5GB | qwen3.5-0.8b-general-forged (~500MB) | | mid | 24-31GB | 8GB | qwen3.5-2b-general-forged (~1.4GB) | | primary | 32GB+ | 12GB | qwen3.5-4b-code-forged-GGUF (~2.7GB; original) | | reject | <16GB | n/a | hard-fail with actionable message | Previously hardcoded NATIVE_RESERVE_MIB=12GB + DOCKER_FLOOR=10GB = 22GB headroom alone (28GB+ total). Now MBA tier needs 5+6+4 = 15GB total minimum, which fits a 16GB MBA with ~1GB headroom for working set spikes. PERSONA_MODEL tiering uses the existing public continuum-ai org models (all gated:False per earlier audit). All three remain HF-public so Carl never needs an HF token regardless of tier. CONTINUUM_TIER env var is exported so future code paths (compose env, runtime feature gates for Bevy/vision/audio) can consult it. This PR doesn't yet skip Bevy/vision pull on MBA tier — that's a follow-up once the runtime supports a chat-only mode flag. Failure message rewritten to be actionable: - Names the specific minimums + what each subsystem reserves - Says "16GB MBA: chat-only OOTB works (smaller model). For 32GB+: full multimodal experience." — gives the user a sense of what they get at each tier instead of just a price-tag rejection. Validation needed: - 16GB MBA (when available): expect tier=MBA, install completes, chat works with 0.8B model - 32GB M-series (Joel's M5 today): expect tier=primary, no behavior change from current (same model, same budgets) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(gap-analysis): catalogue today's 23-PR Carl-OOTB push + chain status End-of-day snapshot: 23 PRs landed today targeting "100% free OOTB on MacBook Air on up, install→chat with AI flawlessly" (Joel). Lists each PR + the Carl-OOTB chain status post-push, with explicit callouts for what's known broken / unfixed (#980 Bug 9 leak — needs live RCA; #75 echo loops dev-tab scope; NEW-A upstream tracking). Also documents the worktree-based parallel-AI workflow lesson learned the hard way (3× commit cross-contamination during today's session before switching to per-AI worktrees + SHA-to-ref push escape valve). Pure docs change. Tomorrow's work has a clean baseline. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Test <test@test.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions Bot added the size: S label May 2, 2026

joelteply merged commit 4a192f4 into canary May 2, 2026
3 checks passed

joelteply deleted the mac/gpu-fallback-removal-memory-manager branch May 2, 2026 02:28

joelteply mentioned this pull request May 2, 2026

fix(gpu): remove "cpu" gpu_type branch from persona/allocator #999

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(gpu): hard-fail on no-GPU instead of silent CPU 25%-RAM fallback#998

fix(gpu): hard-fail on no-GPU instead of silent CPU 25%-RAM fallback#998
joelteply merged 1 commit into
canaryfrom
mac/gpu-fallback-removal-memory-manager

joelteply commented May 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

joelteply commented May 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant