feat(gpu): add ROCm / DirectML / OpenVINO ORT EP cfg branches by joelteply · Pull Request #1001 · CambrianTech/continuum

joelteply · 2026-05-02T02:46:48Z

Joel: "OOTB on all architectures from Docker." Extends #985 ORT EP coverage from Mac/CUDA-only to ROCm (Linux+AMD), DirectML (Windows-native DX12), OpenVINO (Intel). cfg-gated; only fires when explicitly built with the matching feature.

🤖 Generated with Claude Code

Per Joel's "OOTB on all architectures from Docker" + "5090 Windows box available later." Extends the ORT GPU EP coverage from #985 (Mac/CUDA only) to the full Carl-OOTB matrix: --features rocm → AMD GPU (Linux). ROCmExecutionProvider. --features directml → Windows-native, any DX12 GPU (Nvidia/AMD/Intel). --features openvino → Intel CPU/GPU/VPU (Linux + Windows). Each is a cfg-gated branch in build_ort_gpu_execution_providers(). The no-GPU-EP-configured error message now lists all 5 features so a contributor on a new arch sees the right --features incantation. Cargo.toml feature definitions added at lines ~199-207. Per Joel's "GPU 100%" rule the EPs only activate when explicitly built with the matching feature flag — no runtime CPU fallback. Build verified: cargo check --features metal,accelerate clean (the new cfg branches don't fire on this Mac, no compile cost). Validation needed on real hardware: - BigMama or 5090 Windows box: --features cuda + --features directml - Linux+AMD box (when available): --features rocm - Intel-Arc Linux box (rarer): --features openvino Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

… just CUDA (#1002) * feat(gpu): add ROCm / DirectML / OpenVINO ORT EP cfg branches Per Joel's "OOTB on all architectures from Docker" + "5090 Windows box available later." Extends the ORT GPU EP coverage from #985 (Mac/CUDA only) to the full Carl-OOTB matrix: --features rocm → AMD GPU (Linux). ROCmExecutionProvider. --features directml → Windows-native, any DX12 GPU (Nvidia/AMD/Intel). --features openvino → Intel CPU/GPU/VPU (Linux + Windows). Each is a cfg-gated branch in build_ort_gpu_execution_providers(). The no-GPU-EP-configured error message now lists all 5 features so a contributor on a new arch sees the right --features incantation. Cargo.toml feature definitions added at lines ~199-207. Per Joel's "GPU 100%" rule the EPs only activate when explicitly built with the matching feature flag — no runtime CPU fallback. Build verified: cargo check --features metal,accelerate clean (the new cfg branches don't fire on this Mac, no compile cost). Validation needed on real hardware: - BigMama or 5090 Windows box: --features cuda + --features directml - Linux+AMD box (when available): --features rocm - Intel-Arc Linux box (rarer): --features openvino Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(install): cargo-features.sh detects ROCm + Vulkan + DirectML, not just CUDA Per Joel's "OOTB on all architectures from Docker" + the ORT EP coverage added in #1001. Pre-fix the script only mapped Mac→metal + Linux+Nvidia→cuda; ROCm was commented-out, Vulkan absent, Windows- native unhandled entirely. Detection order on Linux: 1. nvidia-smi → cuda (highest priority — full ORT/llama.cpp/Candle) 2. rocminfo → rocm (AMD with ROCm runtime, full ORT EP) 3. vulkaninfo → vulkan (AMD/Intel without ROCm; llama.cpp Vulkan path; ORT EPs absent — will hard-fail at session create per #985's helper, surfacing the gap clearly) 4. else: empty → continuum-core panics at startup per #998 (no CPU fallback per architectural rule) Windows-native (MINGW/MSYS/CYGWIN): - DirectML always (DX12 universal on Win10+) - +CUDA if nvidia-smi present (ORT picks CUDA first, DirectML for non-CUDA-supported ops) Tested on this Mac: still resolves to "--features metal,accelerate" (unchanged — Darwin branch). Validation needed on real hardware: - 5090 Windows box: should resolve to "--features cuda,directml" - BigMama Linux+Nvidia: still "--features cuda,load-dynamic-ort" (unchanged) - Future Linux+AMD: will resolve to "--features rocm,load-dynamic-ort" - Future Linux+Intel-Arc with Vulkan loader: "--features vulkan, load-dynamic-ort" Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Test <test@test.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ok Air on up" (#1003) * feat(gpu): add ROCm / DirectML / OpenVINO ORT EP cfg branches Per Joel's "OOTB on all architectures from Docker" + "5090 Windows box available later." Extends the ORT GPU EP coverage from #985 (Mac/CUDA only) to the full Carl-OOTB matrix: --features rocm → AMD GPU (Linux). ROCmExecutionProvider. --features directml → Windows-native, any DX12 GPU (Nvidia/AMD/Intel). --features openvino → Intel CPU/GPU/VPU (Linux + Windows). Each is a cfg-gated branch in build_ort_gpu_execution_providers(). The no-GPU-EP-configured error message now lists all 5 features so a contributor on a new arch sees the right --features incantation. Cargo.toml feature definitions added at lines ~199-207. Per Joel's "GPU 100%" rule the EPs only activate when explicitly built with the matching feature flag — no runtime CPU fallback. Build verified: cargo check --features metal,accelerate clean (the new cfg branches don't fire on this Mac, no compile cost). Validation needed on real hardware: - BigMama or 5090 Windows box: --features cuda + --features directml - Linux+AMD box (when available): --features rocm - Intel-Arc Linux box (rarer): --features openvino Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(install): cargo-features.sh detects ROCm + Vulkan + DirectML, not just CUDA Per Joel's "OOTB on all architectures from Docker" + the ORT EP coverage added in #1001. Pre-fix the script only mapped Mac→metal + Linux+Nvidia→cuda; ROCm was commented-out, Vulkan absent, Windows- native unhandled entirely. Detection order on Linux: 1. nvidia-smi → cuda (highest priority — full ORT/llama.cpp/Candle) 2. rocminfo → rocm (AMD with ROCm runtime, full ORT EP) 3. vulkaninfo → vulkan (AMD/Intel without ROCm; llama.cpp Vulkan path; ORT EPs absent — will hard-fail at session create per #985's helper, surfacing the gap clearly) 4. else: empty → continuum-core panics at startup per #998 (no CPU fallback per architectural rule) Windows-native (MINGW/MSYS/CYGWIN): - DirectML always (DX12 universal on Win10+) - +CUDA if nvidia-smi present (ORT picks CUDA first, DirectML for non-CUDA-supported ops) Tested on this Mac: still resolves to "--features metal,accelerate" (unchanged — Darwin branch). Validation needed on real hardware: - 5090 Windows box: should resolve to "--features cuda,directml" - BigMama Linux+Nvidia: still "--features cuda,load-dynamic-ort" (unchanged) - Future Linux+AMD: will resolve to "--features rocm,load-dynamic-ort" - Future Linux+Intel-Arc with Vulkan loader: "--features vulkan, load-dynamic-ort" Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(install): tier hardware (MBA / mid / primary) for "OOTB on MacBook Air on up" Per Joel's "100% free OOTB on MacBook Air on up, accessible, high school computer" + "we are just trying to make a viable release candidate." Pre-fix install.sh required 28GB physical RAM and rejected 16GB MBAs with "Get a 32GB+ M-series" — categorically wrong for the stated MBA target. Three tiers based on Mac physical RAM: | Tier | RAM | Native budget | PERSONA_MODEL | |---------|-----------|---------------|---------------------------------| | MBA | 16-23GB | 5GB | qwen3.5-0.8b-general-forged (~500MB) | | mid | 24-31GB | 8GB | qwen3.5-2b-general-forged (~1.4GB) | | primary | 32GB+ | 12GB | qwen3.5-4b-code-forged-GGUF (~2.7GB; original) | | reject | <16GB | n/a | hard-fail with actionable message | Previously hardcoded NATIVE_RESERVE_MIB=12GB + DOCKER_FLOOR=10GB = 22GB headroom alone (28GB+ total). Now MBA tier needs 5+6+4 = 15GB total minimum, which fits a 16GB MBA with ~1GB headroom for working set spikes. PERSONA_MODEL tiering uses the existing public continuum-ai org models (all gated:False per earlier audit). All three remain HF-public so Carl never needs an HF token regardless of tier. CONTINUUM_TIER env var is exported so future code paths (compose env, runtime feature gates for Bevy/vision/audio) can consult it. This PR doesn't yet skip Bevy/vision pull on MBA tier — that's a follow-up once the runtime supports a chat-only mode flag. Failure message rewritten to be actionable: - Names the specific minimums + what each subsystem reserves - Says "16GB MBA: chat-only OOTB works (smaller model). For 32GB+: full multimodal experience." — gives the user a sense of what they get at each tier instead of just a price-tag rejection. Validation needed: - 16GB MBA (when available): expect tier=MBA, install completes, chat works with 0.8B model - 32GB M-series (Joel's M5 today): expect tier=primary, no behavior change from current (same model, same budgets) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Test <test@test.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…us (#1004) * feat(gpu): add ROCm / DirectML / OpenVINO ORT EP cfg branches Per Joel's "OOTB on all architectures from Docker" + "5090 Windows box available later." Extends the ORT GPU EP coverage from #985 (Mac/CUDA only) to the full Carl-OOTB matrix: --features rocm → AMD GPU (Linux). ROCmExecutionProvider. --features directml → Windows-native, any DX12 GPU (Nvidia/AMD/Intel). --features openvino → Intel CPU/GPU/VPU (Linux + Windows). Each is a cfg-gated branch in build_ort_gpu_execution_providers(). The no-GPU-EP-configured error message now lists all 5 features so a contributor on a new arch sees the right --features incantation. Cargo.toml feature definitions added at lines ~199-207. Per Joel's "GPU 100%" rule the EPs only activate when explicitly built with the matching feature flag — no runtime CPU fallback. Build verified: cargo check --features metal,accelerate clean (the new cfg branches don't fire on this Mac, no compile cost). Validation needed on real hardware: - BigMama or 5090 Windows box: --features cuda + --features directml - Linux+AMD box (when available): --features rocm - Intel-Arc Linux box (rarer): --features openvino Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(install): cargo-features.sh detects ROCm + Vulkan + DirectML, not just CUDA Per Joel's "OOTB on all architectures from Docker" + the ORT EP coverage added in #1001. Pre-fix the script only mapped Mac→metal + Linux+Nvidia→cuda; ROCm was commented-out, Vulkan absent, Windows- native unhandled entirely. Detection order on Linux: 1. nvidia-smi → cuda (highest priority — full ORT/llama.cpp/Candle) 2. rocminfo → rocm (AMD with ROCm runtime, full ORT EP) 3. vulkaninfo → vulkan (AMD/Intel without ROCm; llama.cpp Vulkan path; ORT EPs absent — will hard-fail at session create per #985's helper, surfacing the gap clearly) 4. else: empty → continuum-core panics at startup per #998 (no CPU fallback per architectural rule) Windows-native (MINGW/MSYS/CYGWIN): - DirectML always (DX12 universal on Win10+) - +CUDA if nvidia-smi present (ORT picks CUDA first, DirectML for non-CUDA-supported ops) Tested on this Mac: still resolves to "--features metal,accelerate" (unchanged — Darwin branch). Validation needed on real hardware: - 5090 Windows box: should resolve to "--features cuda,directml" - BigMama Linux+Nvidia: still "--features cuda,load-dynamic-ort" (unchanged) - Future Linux+AMD: will resolve to "--features rocm,load-dynamic-ort" - Future Linux+Intel-Arc with Vulkan loader: "--features vulkan, load-dynamic-ort" Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(install): tier hardware (MBA / mid / primary) for "OOTB on MacBook Air on up" Per Joel's "100% free OOTB on MacBook Air on up, accessible, high school computer" + "we are just trying to make a viable release candidate." Pre-fix install.sh required 28GB physical RAM and rejected 16GB MBAs with "Get a 32GB+ M-series" — categorically wrong for the stated MBA target. Three tiers based on Mac physical RAM: | Tier | RAM | Native budget | PERSONA_MODEL | |---------|-----------|---------------|---------------------------------| | MBA | 16-23GB | 5GB | qwen3.5-0.8b-general-forged (~500MB) | | mid | 24-31GB | 8GB | qwen3.5-2b-general-forged (~1.4GB) | | primary | 32GB+ | 12GB | qwen3.5-4b-code-forged-GGUF (~2.7GB; original) | | reject | <16GB | n/a | hard-fail with actionable message | Previously hardcoded NATIVE_RESERVE_MIB=12GB + DOCKER_FLOOR=10GB = 22GB headroom alone (28GB+ total). Now MBA tier needs 5+6+4 = 15GB total minimum, which fits a 16GB MBA with ~1GB headroom for working set spikes. PERSONA_MODEL tiering uses the existing public continuum-ai org models (all gated:False per earlier audit). All three remain HF-public so Carl never needs an HF token regardless of tier. CONTINUUM_TIER env var is exported so future code paths (compose env, runtime feature gates for Bevy/vision/audio) can consult it. This PR doesn't yet skip Bevy/vision pull on MBA tier — that's a follow-up once the runtime supports a chat-only mode flag. Failure message rewritten to be actionable: - Names the specific minimums + what each subsystem reserves - Says "16GB MBA: chat-only OOTB works (smaller model). For 32GB+: full multimodal experience." — gives the user a sense of what they get at each tier instead of just a price-tag rejection. Validation needed: - 16GB MBA (when available): expect tier=MBA, install completes, chat works with 0.8B model - 32GB M-series (Joel's M5 today): expect tier=primary, no behavior change from current (same model, same budgets) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(gap-analysis): catalogue today's 23-PR Carl-OOTB push + chain status End-of-day snapshot: 23 PRs landed today targeting "100% free OOTB on MacBook Air on up, install→chat with AI flawlessly" (Joel). Lists each PR + the Carl-OOTB chain status post-push, with explicit callouts for what's known broken / unfixed (#980 Bug 9 leak — needs live RCA; #75 echo loops dev-tab scope; NEW-A upstream tracking). Also documents the worktree-based parallel-AI workflow lesson learned the hard way (3× commit cross-contamination during today's session before switching to per-AI worktrees + SHA-to-ref push escape valve). Pure docs change. Tomorrow's work has a clean baseline. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Test <test@test.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions Bot added the size: M label May 2, 2026

joelteply merged commit 74af869 into canary May 2, 2026
3 checks passed

joelteply deleted the mac/ort-ep-coverage-rocm-directml-openvino branch May 2, 2026 02:48

joelteply mentioned this pull request May 2, 2026

fix(install): cargo-features.sh detects ROCm + Vulkan + DirectML, not just CUDA #1002

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(gpu): add ROCm / DirectML / OpenVINO ORT EP cfg branches#1001

feat(gpu): add ROCm / DirectML / OpenVINO ORT EP cfg branches#1001
joelteply merged 1 commit into
canaryfrom
mac/ort-ep-coverage-rocm-directml-openvino

joelteply commented May 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

joelteply commented May 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant