Skip to content

fix(gpu): hard-fail on no-GPU instead of silent CPU 25%-RAM fallback#998

Merged
joelteply merged 1 commit into
canaryfrom
mac/gpu-fallback-removal-memory-manager
May 2, 2026
Merged

fix(gpu): hard-fail on no-GPU instead of silent CPU 25%-RAM fallback#998
joelteply merged 1 commit into
canaryfrom
mac/gpu-fallback-removal-memory-manager

Conversation

@joelteply
Copy link
Copy Markdown
Contributor

Joel "GPU 100%" rule applied to memory_manager.rs. Pre-fix: silent CPU degrade with fake "CPU (no GPU)" device. Post-fix: panic with same actionable diagnostic message install.sh uses on IC_GPU_PATH=unsupported.

🤖 Generated with Claude Code

Per Joel's architectural rule "lack of GPU integration is forbidden,
GPU acceleration in all cases" (#964 series, GPU-fallback audit).

memory_manager.rs's detect_gpu() chained Metal → CUDA → CPU fallback,
where the CPU fallback returned a budget of "25% of system RAM" with
the device name "CPU (no GPU)". That's the silent-degrade vector this
rule explicitly forbids — continuum-core would silently start with a
fake "GPU" budget against system RAM, then run inference on CPU
through whatever path picked it up.

Fix: panic with the same actionable message install.sh's
`IC_GPU_PATH=unsupported` branch uses — name supported paths, point
at diagnostic commands per platform, link to the issue tracker.

Removed:
  - CPU_FALLBACK_RAM_PCT constant (only consumer was the deleted fn)
  - detect_cpu_fallback() function

Behaviour delta:
  - macOS without Metal-capable GPU: previously silent 25%-RAM "GPU";
    now panics with diagnostic
  - Linux without CUDA-capable GPU + no --features cuda: same
  - Mac with Metal: unchanged (detect_metal returns Some)
  - Linux with --features cuda + working nvidia-smi: unchanged
    (detect_cuda returns Some)

Test (cargo check --features metal,accelerate): clean.

Out of scope (next PRs in series):
  - persona/allocator.rs:165 — explicit "cpu" GPU-type branch
  - ROCm / Vulkan / OpenVINO EP coverage in inference/ort_providers.rs

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@joelteply joelteply merged commit 4a192f4 into canary May 2, 2026
3 checks passed
@joelteply joelteply deleted the mac/gpu-fallback-removal-memory-manager branch May 2, 2026 02:28
joelteply added a commit that referenced this pull request May 2, 2026
…gpu_type (#999)

Per #964-series GPU-fallback audit + Joel's "lack of GPU integration is
forbidden" rule. PR #998 made memory_manager::detect_gpu() panic when
no GPU is found, so a "cpu" gpu_name can never reach detect_gpu_type
in production. Removing the branch cleans up the dead path.

If somehow a "cpu" gpu_name still arrives (e.g. a test stub), it now
falls back to the OS-default GPU type ("metal" on Mac, "cuda" on
Linux) — a best-guess that lets the caller proceed against a real GPU
subsystem rather than configuring a non-existent "cpu" subsystem that
no inference path actually serves.

Test updated:
  - assert_eq!(detect_gpu_type("CPU"), "cpu") removed
  - replaced with cfg-gated assertions matching new OS-default behaviour
  - real GPU detections (NVIDIA, Apple M-series) unchanged

cargo test --features metal,accelerate --lib persona::allocator::
tests::test_detect_gpu_type: PASS.

Co-authored-by: Test <test@test.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
joelteply added a commit that referenced this pull request May 2, 2026
… just CUDA (#1002)

* feat(gpu): add ROCm / DirectML / OpenVINO ORT EP cfg branches

Per Joel's "OOTB on all architectures from Docker" + "5090 Windows box
available later." Extends the ORT GPU EP coverage from #985 (Mac/CUDA
only) to the full Carl-OOTB matrix:

  --features rocm     → AMD GPU (Linux). ROCmExecutionProvider.
  --features directml → Windows-native, any DX12 GPU (Nvidia/AMD/Intel).
  --features openvino → Intel CPU/GPU/VPU (Linux + Windows).

Each is a cfg-gated branch in build_ort_gpu_execution_providers(). The
no-GPU-EP-configured error message now lists all 5 features so a
contributor on a new arch sees the right --features incantation.

Cargo.toml feature definitions added at lines ~199-207. Per Joel's
"GPU 100%" rule the EPs only activate when explicitly built with the
matching feature flag — no runtime CPU fallback.

Build verified: cargo check --features metal,accelerate clean (the
new cfg branches don't fire on this Mac, no compile cost).

Validation needed on real hardware:
  - BigMama or 5090 Windows box: --features cuda + --features directml
  - Linux+AMD box (when available): --features rocm
  - Intel-Arc Linux box (rarer): --features openvino

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(install): cargo-features.sh detects ROCm + Vulkan + DirectML, not just CUDA

Per Joel's "OOTB on all architectures from Docker" + the ORT EP
coverage added in #1001. Pre-fix the script only mapped Mac→metal +
Linux+Nvidia→cuda; ROCm was commented-out, Vulkan absent, Windows-
native unhandled entirely.

Detection order on Linux:
  1. nvidia-smi → cuda (highest priority — full ORT/llama.cpp/Candle)
  2. rocminfo  → rocm (AMD with ROCm runtime, full ORT EP)
  3. vulkaninfo → vulkan (AMD/Intel without ROCm; llama.cpp Vulkan
                  path; ORT EPs absent — will hard-fail at session
                  create per #985's helper, surfacing the gap clearly)
  4. else: empty → continuum-core panics at startup per #998 (no CPU
     fallback per architectural rule)

Windows-native (MINGW/MSYS/CYGWIN):
  - DirectML always (DX12 universal on Win10+)
  - +CUDA if nvidia-smi present (ORT picks CUDA first, DirectML for
    non-CUDA-supported ops)

Tested on this Mac: still resolves to "--features metal,accelerate"
(unchanged — Darwin branch).

Validation needed on real hardware:
  - 5090 Windows box: should resolve to "--features cuda,directml"
  - BigMama Linux+Nvidia: still "--features cuda,load-dynamic-ort"
    (unchanged)
  - Future Linux+AMD: will resolve to "--features rocm,load-dynamic-ort"
  - Future Linux+Intel-Arc with Vulkan loader: "--features vulkan,
    load-dynamic-ort"

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Test <test@test.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
joelteply added a commit that referenced this pull request May 2, 2026
…ok Air on up" (#1003)

* feat(gpu): add ROCm / DirectML / OpenVINO ORT EP cfg branches

Per Joel's "OOTB on all architectures from Docker" + "5090 Windows box
available later." Extends the ORT GPU EP coverage from #985 (Mac/CUDA
only) to the full Carl-OOTB matrix:

  --features rocm     → AMD GPU (Linux). ROCmExecutionProvider.
  --features directml → Windows-native, any DX12 GPU (Nvidia/AMD/Intel).
  --features openvino → Intel CPU/GPU/VPU (Linux + Windows).

Each is a cfg-gated branch in build_ort_gpu_execution_providers(). The
no-GPU-EP-configured error message now lists all 5 features so a
contributor on a new arch sees the right --features incantation.

Cargo.toml feature definitions added at lines ~199-207. Per Joel's
"GPU 100%" rule the EPs only activate when explicitly built with the
matching feature flag — no runtime CPU fallback.

Build verified: cargo check --features metal,accelerate clean (the
new cfg branches don't fire on this Mac, no compile cost).

Validation needed on real hardware:
  - BigMama or 5090 Windows box: --features cuda + --features directml
  - Linux+AMD box (when available): --features rocm
  - Intel-Arc Linux box (rarer): --features openvino

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(install): cargo-features.sh detects ROCm + Vulkan + DirectML, not just CUDA

Per Joel's "OOTB on all architectures from Docker" + the ORT EP
coverage added in #1001. Pre-fix the script only mapped Mac→metal +
Linux+Nvidia→cuda; ROCm was commented-out, Vulkan absent, Windows-
native unhandled entirely.

Detection order on Linux:
  1. nvidia-smi → cuda (highest priority — full ORT/llama.cpp/Candle)
  2. rocminfo  → rocm (AMD with ROCm runtime, full ORT EP)
  3. vulkaninfo → vulkan (AMD/Intel without ROCm; llama.cpp Vulkan
                  path; ORT EPs absent — will hard-fail at session
                  create per #985's helper, surfacing the gap clearly)
  4. else: empty → continuum-core panics at startup per #998 (no CPU
     fallback per architectural rule)

Windows-native (MINGW/MSYS/CYGWIN):
  - DirectML always (DX12 universal on Win10+)
  - +CUDA if nvidia-smi present (ORT picks CUDA first, DirectML for
    non-CUDA-supported ops)

Tested on this Mac: still resolves to "--features metal,accelerate"
(unchanged — Darwin branch).

Validation needed on real hardware:
  - 5090 Windows box: should resolve to "--features cuda,directml"
  - BigMama Linux+Nvidia: still "--features cuda,load-dynamic-ort"
    (unchanged)
  - Future Linux+AMD: will resolve to "--features rocm,load-dynamic-ort"
  - Future Linux+Intel-Arc with Vulkan loader: "--features vulkan,
    load-dynamic-ort"

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(install): tier hardware (MBA / mid / primary) for "OOTB on MacBook Air on up"

Per Joel's "100% free OOTB on MacBook Air on up, accessible, high
school computer" + "we are just trying to make a viable release
candidate." Pre-fix install.sh required 28GB physical RAM and rejected
16GB MBAs with "Get a 32GB+ M-series" — categorically wrong for the
stated MBA target.

Three tiers based on Mac physical RAM:

| Tier    | RAM       | Native budget | PERSONA_MODEL                   |
|---------|-----------|---------------|---------------------------------|
| MBA     | 16-23GB   | 5GB           | qwen3.5-0.8b-general-forged (~500MB) |
| mid     | 24-31GB   | 8GB           | qwen3.5-2b-general-forged (~1.4GB)  |
| primary | 32GB+     | 12GB          | qwen3.5-4b-code-forged-GGUF (~2.7GB; original) |
| reject  | <16GB     | n/a           | hard-fail with actionable message |

Previously hardcoded NATIVE_RESERVE_MIB=12GB + DOCKER_FLOOR=10GB =
22GB headroom alone (28GB+ total). Now MBA tier needs 5+6+4 = 15GB
total minimum, which fits a 16GB MBA with ~1GB headroom for working
set spikes.

PERSONA_MODEL tiering uses the existing public continuum-ai org models
(all gated:False per earlier audit). All three remain HF-public so
Carl never needs an HF token regardless of tier.

CONTINUUM_TIER env var is exported so future code paths (compose env,
runtime feature gates for Bevy/vision/audio) can consult it. This PR
doesn't yet skip Bevy/vision pull on MBA tier — that's a follow-up
once the runtime supports a chat-only mode flag.

Failure message rewritten to be actionable:
  - Names the specific minimums + what each subsystem reserves
  - Says "16GB MBA: chat-only OOTB works (smaller model). For 32GB+:
    full multimodal experience." — gives the user a sense of what
    they get at each tier instead of just a price-tag rejection.

Validation needed:
  - 16GB MBA (when available): expect tier=MBA, install completes,
    chat works with 0.8B model
  - 32GB M-series (Joel's M5 today): expect tier=primary, no
    behavior change from current (same model, same budgets)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Test <test@test.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
joelteply added a commit that referenced this pull request May 2, 2026
…us (#1004)

* feat(gpu): add ROCm / DirectML / OpenVINO ORT EP cfg branches

Per Joel's "OOTB on all architectures from Docker" + "5090 Windows box
available later." Extends the ORT GPU EP coverage from #985 (Mac/CUDA
only) to the full Carl-OOTB matrix:

  --features rocm     → AMD GPU (Linux). ROCmExecutionProvider.
  --features directml → Windows-native, any DX12 GPU (Nvidia/AMD/Intel).
  --features openvino → Intel CPU/GPU/VPU (Linux + Windows).

Each is a cfg-gated branch in build_ort_gpu_execution_providers(). The
no-GPU-EP-configured error message now lists all 5 features so a
contributor on a new arch sees the right --features incantation.

Cargo.toml feature definitions added at lines ~199-207. Per Joel's
"GPU 100%" rule the EPs only activate when explicitly built with the
matching feature flag — no runtime CPU fallback.

Build verified: cargo check --features metal,accelerate clean (the
new cfg branches don't fire on this Mac, no compile cost).

Validation needed on real hardware:
  - BigMama or 5090 Windows box: --features cuda + --features directml
  - Linux+AMD box (when available): --features rocm
  - Intel-Arc Linux box (rarer): --features openvino

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(install): cargo-features.sh detects ROCm + Vulkan + DirectML, not just CUDA

Per Joel's "OOTB on all architectures from Docker" + the ORT EP
coverage added in #1001. Pre-fix the script only mapped Mac→metal +
Linux+Nvidia→cuda; ROCm was commented-out, Vulkan absent, Windows-
native unhandled entirely.

Detection order on Linux:
  1. nvidia-smi → cuda (highest priority — full ORT/llama.cpp/Candle)
  2. rocminfo  → rocm (AMD with ROCm runtime, full ORT EP)
  3. vulkaninfo → vulkan (AMD/Intel without ROCm; llama.cpp Vulkan
                  path; ORT EPs absent — will hard-fail at session
                  create per #985's helper, surfacing the gap clearly)
  4. else: empty → continuum-core panics at startup per #998 (no CPU
     fallback per architectural rule)

Windows-native (MINGW/MSYS/CYGWIN):
  - DirectML always (DX12 universal on Win10+)
  - +CUDA if nvidia-smi present (ORT picks CUDA first, DirectML for
    non-CUDA-supported ops)

Tested on this Mac: still resolves to "--features metal,accelerate"
(unchanged — Darwin branch).

Validation needed on real hardware:
  - 5090 Windows box: should resolve to "--features cuda,directml"
  - BigMama Linux+Nvidia: still "--features cuda,load-dynamic-ort"
    (unchanged)
  - Future Linux+AMD: will resolve to "--features rocm,load-dynamic-ort"
  - Future Linux+Intel-Arc with Vulkan loader: "--features vulkan,
    load-dynamic-ort"

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(install): tier hardware (MBA / mid / primary) for "OOTB on MacBook Air on up"

Per Joel's "100% free OOTB on MacBook Air on up, accessible, high
school computer" + "we are just trying to make a viable release
candidate." Pre-fix install.sh required 28GB physical RAM and rejected
16GB MBAs with "Get a 32GB+ M-series" — categorically wrong for the
stated MBA target.

Three tiers based on Mac physical RAM:

| Tier    | RAM       | Native budget | PERSONA_MODEL                   |
|---------|-----------|---------------|---------------------------------|
| MBA     | 16-23GB   | 5GB           | qwen3.5-0.8b-general-forged (~500MB) |
| mid     | 24-31GB   | 8GB           | qwen3.5-2b-general-forged (~1.4GB)  |
| primary | 32GB+     | 12GB          | qwen3.5-4b-code-forged-GGUF (~2.7GB; original) |
| reject  | <16GB     | n/a           | hard-fail with actionable message |

Previously hardcoded NATIVE_RESERVE_MIB=12GB + DOCKER_FLOOR=10GB =
22GB headroom alone (28GB+ total). Now MBA tier needs 5+6+4 = 15GB
total minimum, which fits a 16GB MBA with ~1GB headroom for working
set spikes.

PERSONA_MODEL tiering uses the existing public continuum-ai org models
(all gated:False per earlier audit). All three remain HF-public so
Carl never needs an HF token regardless of tier.

CONTINUUM_TIER env var is exported so future code paths (compose env,
runtime feature gates for Bevy/vision/audio) can consult it. This PR
doesn't yet skip Bevy/vision pull on MBA tier — that's a follow-up
once the runtime supports a chat-only mode flag.

Failure message rewritten to be actionable:
  - Names the specific minimums + what each subsystem reserves
  - Says "16GB MBA: chat-only OOTB works (smaller model). For 32GB+:
    full multimodal experience." — gives the user a sense of what
    they get at each tier instead of just a price-tag rejection.

Validation needed:
  - 16GB MBA (when available): expect tier=MBA, install completes,
    chat works with 0.8B model
  - 32GB M-series (Joel's M5 today): expect tier=primary, no
    behavior change from current (same model, same budgets)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(gap-analysis): catalogue today's 23-PR Carl-OOTB push + chain status

End-of-day snapshot: 23 PRs landed today targeting "100% free OOTB
on MacBook Air on up, install→chat with AI flawlessly" (Joel). Lists
each PR + the Carl-OOTB chain status post-push, with explicit callouts
for what's known broken / unfixed (#980 Bug 9 leak — needs live RCA;
#75 echo loops dev-tab scope; NEW-A upstream tracking).

Also documents the worktree-based parallel-AI workflow lesson learned
the hard way (3× commit cross-contamination during today's session
before switching to per-AI worktrees + SHA-to-ref push escape valve).

Pure docs change. Tomorrow's work has a clean baseline.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Test <test@test.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant