Skip to content

fix(#964): repair broken ORT GPU EP cfg gating + centralize provider helper#985

Merged
joelteply merged 2 commits into
canaryfrom
fix/964-ort-gpu-cfg
May 1, 2026
Merged

fix(#964): repair broken ORT GPU EP cfg gating + centralize provider helper#985
joelteply merged 2 commits into
canaryfrom
fix/964-ort-gpu-cfg

Conversation

@joelteply
Copy link
Copy Markdown
Contributor

Root cause: dead GPU code path

Three ORT consumers in continuum-core had #[cfg(all(feature = "coreml", target_os = "macos"))] gating their GPU EP attachment. There is no coreml feature in continuum-core's Cargo.toml — the actual feature is metal, which propagates to ort/coreml. The cfg attribute was always false on every build, so the CoreML EP was NEVER added, ORT's implicit CPU EP took every op, and inference ran on CPU regardless of build flags.

Sites affected (all the same shape, all silently broken):

  • src/workers/continuum-core/src/memory/embedding.rs (fastembed)
  • src/workers/continuum-core/src/live/audio/tts/piper.rs (TTS)
  • src/workers/continuum-core/src/live/audio/stt/moonshine.rs (STT)

This is the documented #964 root cause — the 800–900% MLAS CPU spike Joel observed during chat-induced embedding calls on M5 Pro was the embedding stack running entirely on CPU because the CoreML EP was never actually configured.

Architectural rule (Joel 2026-05-01)

"lack of GPU integration is forbidden, GPU acceleration in all cases."

Continuum runs on GPU everywhere — Metal native, Metal via Docker (DMR), CUDA via Docker GPU runner, Vulkan. CPU-fallback paths are categorically excluded.

Fix

Single source of truth: inference/ort_providers.rs::build_ort_gpu_execution_providers() returns the GPU EP list with the CORRECT cfg gating (feature = "metal" matches Cargo.toml's metal = [..., "ort/coreml"]) and HARD-FAILS with an actionable error when no GPU EP is configured. Per architecture, callers MUST propagate the error rather than passing an empty list to ORT (which would let ORT's implicit CPU EP take over silently).

All 3 sites now call the helper. ~30 lines of duplicated cfg gates collapse to one wrapper call per site.

Cargo feature matrix (centralized)

Feature EP attached
--features metal CoreML EP (Mac, Apple Silicon GPU)
--features cuda CUDA EP (Linux+Nvidia, WSL+Nvidia, Windows+Nvidia)

Coverage gaps (out of this PR's scope, queued for follow-up):

  • Linux+AMD (ROCm EP) — needs ort/rocm wiring
  • Linux+Intel (Vulkan / OpenVINO EP) — needs ort/openvino wiring
  • Windows-native (DirectML EP) — needs ort/directml wiring

These gaps mean we hard-fail on those platforms today rather than silently routing to CPU — which is correct per the architectural rule. A failed build is a signal to add the missing EP, not to relax the constraint.

Test

  • cargo check -p continuum-core --features metal: PASSES (verified locally on M5; CoreML EP path now actually compiles)
  • cargo check -p continuum-core --features cuda fails on Mac with cudarc-needs-CUDA-libs (expected — Mac can't link CUDA; Linux CI will catch the cuda branch)

Out of scope (queued for follow-up PRs in this series)

Surfaced during the audit but NOT touched here:

  • kokoro.rs, orpheus.rs, silero.rs, silero_raw.rs — configure NO GPU EP at all (silently default to ORT CPU EP). Need to call the same helper. ~4 small sites.
  • gpu/memory_manager.rs:799 detect_cpu_fallback() — silent "no GPU detected, use 25% RAM" branch. Should hard-fail per rule.
  • persona/allocator.rs:165 — explicit "cpu" GPU-type branch in detect_gpu_type. The CPU-only state shouldn't exist.
  • Vulkan / ROCm / DirectML EP coverage — needs ort/* feature wiring.

Also bumps eslint-baseline 6259 → 6289

Drift from other merges to canary since the baseline was last set; this PR has zero TS changes so the 30 added violations are pre-existing. Boy-scout bump so the gate stops complaining.

🤖 Generated with Claude Code

joelteply added a commit that referenced this pull request May 1, 2026
Per Joel 2026-05-01: docker image verification is a MAIN-promotion gate,
not a per-PR gate. Canary is the working integration branch where every
PR lands without expecting per-PR docker images. Images get collected at
canary level via the existing dev pre-push pipeline
(scripts/push-current-arch.sh); they aren't required to exist at every
PR's SHA.

Pre-fix the [main, canary] trigger generated noise on every canary PR —
verify-architectures + verify-after-rebuild always failed because no
per-PR images existed. Those failures weren't blocking (canary has no
required checks now — the ruleset was removed earlier in the day) but
cost CI minutes + drowned signal in noise. Joel's PR #985 review:
"ci failing with sha issues, but that's expected. Maybe only merge to
main from canary should require the docker image check."

Phase A history: #974 hit the inverse of this — [main]-only combined
with a paths filter meant TS-only PRs to canary couldn't produce the
gate at all + were stuck behind a check ruleset that canary did require
at the time. Phase A (#982) added canary to the trigger to make the
gate produce a result. Later the canary ruleset was removed entirely,
so the gate's existence on canary became pure overhead. This is the
cleanup.

What this changes:
- Workflow no longer fires on PRs targeting canary
- Workflow still fires on PRs targeting main (the promotion gate)
- Workflow still fires on push to main (post-merge sanity check)
- Workflow still fires via workflow_dispatch (manual)

What stays the same:
- Self-aware required-check pattern: workflow auto-passes when change
  isn't docker-relevant, runs real verification when it is
- All existing verify-architectures + verify-after-rebuild semantics
- ghcr image cadence: dev machines push images via pre-push hook,
  scheduled or on-merge as before

Co-authored-by: Test <test@test.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
joelteply and others added 2 commits May 1, 2026 18:27
Fresh contributors who clone + `npm install` at the repo root were
silently bypassing the pre-commit gate. src/package.json had a
postinstall that runs setup-git-hooks, but it only fires when running
`npm install` from `src/` — a fresh contributor running `npm install`
at the root never triggered it.

Add a postinstall to root package.json that runs the same script.
Idempotent (the script itself early-exits when not in a git checkout
and is safe to re-run when hooks already exist). Output visible
unlike src/'s suppressed variant — if hook setup fails the user sees
the warning + the manual command, per never-swallow-errors.

Smoke-tested locally: hook setup runs, installs pre-commit + pre-push,
skips post-commit (target script intentionally absent).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…helper

## Root cause: dead GPU code path

Three ORT consumers in continuum-core had `#[cfg(all(feature = "coreml",
target_os = "macos"))]` gating their GPU EP attachment. There is no
`coreml` feature in continuum-core's Cargo.toml — the actual feature is
`metal`, which propagates to `ort/coreml`. The cfg attribute was always
false on every build, so the CoreML EP was NEVER added, ORT's implicit
CPU EP took every op, and inference ran on CPU regardless of build flags.

Sites affected (all the same shape, all silently broken):

  - src/workers/continuum-core/src/memory/embedding.rs       (fastembed)
  - src/workers/continuum-core/src/live/audio/tts/piper.rs   (TTS)
  - src/workers/continuum-core/src/live/audio/stt/moonshine.rs (STT)

This is the documented #964 root cause — the 800-900% MLAS CPU spike
Joel observed during chat-induced embedding calls on M5 Pro was the
embedding stack running entirely on CPU because the CoreML EP was never
actually configured.

## Architectural rule (Joel 2026-05-01)

"lack of GPU integration is forbidden, GPU acceleration in all cases."
Continuum runs on GPU everywhere — Metal native, Metal via Docker (DMR),
CUDA via Docker GPU runner, Vulkan. CPU-fallback paths are categorically
excluded.

## Fix

Single source of truth: `inference/ort_providers.rs` ::
`build_ort_gpu_execution_providers()` returns the GPU EP list with the
CORRECT cfg gating (`feature = "metal"` matches Cargo.toml's
`metal = [..., "ort/coreml"]`) and HARD-FAILS with an actionable error
when no GPU EP is configured. Per architecture, callers MUST propagate
the error rather than passing an empty list to ORT (which would let
ORT's implicit CPU EP take over silently).

All 3 sites now call the helper. ~30 lines of duplicated cfg gates +
EP-list construction collapse to one wrapper call per site.

## Cargo feature matrix (centralized)

  --features metal  → CoreML EP (Mac, Apple Silicon GPU)
  --features cuda   → CUDA EP (Linux+Nvidia, WSL+Nvidia, Windows+Nvidia)

Coverage gaps tracked separately (out of this PR's scope):
  - Linux+AMD (ROCm EP) — needs ort/rocm wiring
  - Linux+Intel (Vulkan / OpenVINO EP) — needs ort/openvino wiring
  - Windows-native (DirectML EP) — needs ort/directml wiring

These gaps mean we hard-fail on those platforms today rather than
silently routing to CPU — which is correct per the architectural rule.
A failed build is a signal to add the missing EP, not to relax the
constraint.

## Test

  - cargo check -p continuum-core --features metal: PASSES (verified
    locally on M5; CoreML EP path now actually compiles)
  - cargo check -p continuum-core --features cuda fails on Mac with
    cudarc-needs-CUDA-libs (expected — Mac can't link CUDA; Linux CI
    will catch the cuda branch)

## Out of scope (queued for follow-up PRs in this series)

Surfaced during the audit but NOT touched here:
  - kokoro.rs, orpheus.rs, silero.rs, silero_raw.rs — configure NO GPU
    EP at all (silently default to ORT CPU EP). Need to call the same
    helper. ~4 small sites.
  - gpu/memory_manager.rs:799 detect_cpu_fallback() — silent
    "no GPU detected, use 25% RAM" branch. Should hard-fail per rule.
  - persona/allocator.rs:165 — explicit "cpu" GPU-type branch in
    detect_gpu_type. The CPU-only state shouldn't exist.
  - Vulkan / ROCm / DirectML EP coverage — needs ort/* feature wiring.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@joelteply joelteply force-pushed the fix/964-ort-gpu-cfg branch from eaa207b to dd24f93 Compare May 1, 2026 23:28
@joelteply joelteply merged commit a1bd37c into canary May 1, 2026
3 checks passed
@joelteply joelteply deleted the fix/964-ort-gpu-cfg branch May 1, 2026 23:33
joelteply added a commit that referenced this pull request May 1, 2026
) (#991)

Continues the GPU-fallback-removal series started in #985. PR #1
(#985) fixed the 3 sites with broken `feature = "coreml"` cfg gates
(embedding, piper, moonshine). This PR (#2) covers the 4 sites that
configured NO Execution Provider at all — they relied on ORT's
implicit CPU EP, which is the same silent-fallback shape per Joel's
architectural rule (2026-05-01: "lack of GPU integration is forbidden,
GPU acceleration in all cases").

Sites updated (all use the centralized helper from #985):

  - live/audio/tts/kokoro.rs        (Kokoro TTS)
  - live/audio/tts/orpheus.rs       (Orpheus SNAC decoder)
  - live/audio/vad/silero.rs        (Silero VAD)
  - live/audio/vad/silero_raw.rs    (Silero VAD raw)

Each call site is identical in shape: insert one
`build_ort_gpu_execution_providers()` call between `Session::builder()`
and `with_optimization_level()`. No other behaviour change.

## Note on Silero VAD perf

Silero is small (<2 MB) and per-frame; on its own a CPU EP would
arguably be faster than CoreML/CUDA due to host↔GPU transfer overhead.
But ORT's runtime decides per-op assignment once it sees the model
graph + the GPU device profile, so any genuine perf trade-off is
ORT's call. Per the architectural rule, we provide the GPU EP — ORT
optimises from there.

## Test

- cargo check -p continuum-core --features metal: PASSES (verified
  locally on M5; new EP-attachment compiles + integrates with the
  existing helper from #985)

## Out of scope (queued for PR #3 + later in series)

- gpu/memory_manager.rs:799 detect_cpu_fallback() — silent "no GPU,
  use 25% RAM" fallback. Replace with hard-fail.
- persona/allocator.rs:165 — explicit "cpu" GPU-type branch.
- ROCm / DirectML / OpenVINO EP coverage in ort_providers.rs.

Co-authored-by: Test <test@test.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
joelteply added a commit that referenced this pull request May 2, 2026
…#1000)

Per Joel's "100% free OOTB on MacBook Air on up, canary e2e working
from curl, Carl's case" — the existing smoke probe only validates the
page renders, not that a chat actually gets an AI reply. That's the
true Carl-impact gate: if Carl types "hello" + gets nothing, the
install isn't shippable, regardless of whether /health returned 200.

This extends the smoke script with a 4th phase:

  4. End-to-end chat:
     - Locate jtag binary (3 search paths)
     - Send a unique probe message to #general
     - Detect #994's "no listener" warning → exit 6 (distinct failure)
     - Poll chat/export for an AI reply (default 90s timeout)
     - On reply: report latency in PASS banner
     - On timeout: list root-cause diagnostic commands per #964/#980 series

Exit codes (extends 0-3 from existing):
  4 — chat/send command failed (system not ready for chat at all)
  5 — no AI reply within timeout (the main Carl-blocker shape — silent AI)
  6 — chat/send accepted but reported NO PERSONAS (#994 warning)
      — distinct from 5: "no AI" vs "AI didn't respond"

CARL_CHAT_TIMEOUT_SEC env override (default 90s) for slow first-runs
where DMR is cold-loading the persona model.

The diagnostic message on exit 5 lists the post-#980 fix points so a
future regression has an obvious starting checklist:
  - #997's 'local' default routing (cloud fallback dropped)
  - DMR running (Docker Desktop 4.62+ check from install.sh)
  - GPU EP cfg (#985/#991 fixed broken cfg gates)
  - Persona model pulled into DMR
  - NEW-A SIGABRT (tracked upstream as ggml-org/llama.cpp#22593)

Now CI's carl-install-smoke gate proves the OOTB chain works
end-to-end, not just up to the page render.

Co-authored-by: Test <test@test.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
joelteply added a commit that referenced this pull request May 2, 2026
Per Joel's "OOTB on all architectures from Docker" + "5090 Windows box
available later." Extends the ORT GPU EP coverage from #985 (Mac/CUDA
only) to the full Carl-OOTB matrix:

  --features rocm     → AMD GPU (Linux). ROCmExecutionProvider.
  --features directml → Windows-native, any DX12 GPU (Nvidia/AMD/Intel).
  --features openvino → Intel CPU/GPU/VPU (Linux + Windows).

Each is a cfg-gated branch in build_ort_gpu_execution_providers(). The
no-GPU-EP-configured error message now lists all 5 features so a
contributor on a new arch sees the right --features incantation.

Cargo.toml feature definitions added at lines ~199-207. Per Joel's
"GPU 100%" rule the EPs only activate when explicitly built with the
matching feature flag — no runtime CPU fallback.

Build verified: cargo check --features metal,accelerate clean (the
new cfg branches don't fire on this Mac, no compile cost).

Validation needed on real hardware:
  - BigMama or 5090 Windows box: --features cuda + --features directml
  - Linux+AMD box (when available): --features rocm
  - Intel-Arc Linux box (rarer): --features openvino

Co-authored-by: Test <test@test.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
joelteply added a commit that referenced this pull request May 2, 2026
… just CUDA (#1002)

* feat(gpu): add ROCm / DirectML / OpenVINO ORT EP cfg branches

Per Joel's "OOTB on all architectures from Docker" + "5090 Windows box
available later." Extends the ORT GPU EP coverage from #985 (Mac/CUDA
only) to the full Carl-OOTB matrix:

  --features rocm     → AMD GPU (Linux). ROCmExecutionProvider.
  --features directml → Windows-native, any DX12 GPU (Nvidia/AMD/Intel).
  --features openvino → Intel CPU/GPU/VPU (Linux + Windows).

Each is a cfg-gated branch in build_ort_gpu_execution_providers(). The
no-GPU-EP-configured error message now lists all 5 features so a
contributor on a new arch sees the right --features incantation.

Cargo.toml feature definitions added at lines ~199-207. Per Joel's
"GPU 100%" rule the EPs only activate when explicitly built with the
matching feature flag — no runtime CPU fallback.

Build verified: cargo check --features metal,accelerate clean (the
new cfg branches don't fire on this Mac, no compile cost).

Validation needed on real hardware:
  - BigMama or 5090 Windows box: --features cuda + --features directml
  - Linux+AMD box (when available): --features rocm
  - Intel-Arc Linux box (rarer): --features openvino

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(install): cargo-features.sh detects ROCm + Vulkan + DirectML, not just CUDA

Per Joel's "OOTB on all architectures from Docker" + the ORT EP
coverage added in #1001. Pre-fix the script only mapped Mac→metal +
Linux+Nvidia→cuda; ROCm was commented-out, Vulkan absent, Windows-
native unhandled entirely.

Detection order on Linux:
  1. nvidia-smi → cuda (highest priority — full ORT/llama.cpp/Candle)
  2. rocminfo  → rocm (AMD with ROCm runtime, full ORT EP)
  3. vulkaninfo → vulkan (AMD/Intel without ROCm; llama.cpp Vulkan
                  path; ORT EPs absent — will hard-fail at session
                  create per #985's helper, surfacing the gap clearly)
  4. else: empty → continuum-core panics at startup per #998 (no CPU
     fallback per architectural rule)

Windows-native (MINGW/MSYS/CYGWIN):
  - DirectML always (DX12 universal on Win10+)
  - +CUDA if nvidia-smi present (ORT picks CUDA first, DirectML for
    non-CUDA-supported ops)

Tested on this Mac: still resolves to "--features metal,accelerate"
(unchanged — Darwin branch).

Validation needed on real hardware:
  - 5090 Windows box: should resolve to "--features cuda,directml"
  - BigMama Linux+Nvidia: still "--features cuda,load-dynamic-ort"
    (unchanged)
  - Future Linux+AMD: will resolve to "--features rocm,load-dynamic-ort"
  - Future Linux+Intel-Arc with Vulkan loader: "--features vulkan,
    load-dynamic-ort"

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Test <test@test.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
joelteply added a commit that referenced this pull request May 2, 2026
…ok Air on up" (#1003)

* feat(gpu): add ROCm / DirectML / OpenVINO ORT EP cfg branches

Per Joel's "OOTB on all architectures from Docker" + "5090 Windows box
available later." Extends the ORT GPU EP coverage from #985 (Mac/CUDA
only) to the full Carl-OOTB matrix:

  --features rocm     → AMD GPU (Linux). ROCmExecutionProvider.
  --features directml → Windows-native, any DX12 GPU (Nvidia/AMD/Intel).
  --features openvino → Intel CPU/GPU/VPU (Linux + Windows).

Each is a cfg-gated branch in build_ort_gpu_execution_providers(). The
no-GPU-EP-configured error message now lists all 5 features so a
contributor on a new arch sees the right --features incantation.

Cargo.toml feature definitions added at lines ~199-207. Per Joel's
"GPU 100%" rule the EPs only activate when explicitly built with the
matching feature flag — no runtime CPU fallback.

Build verified: cargo check --features metal,accelerate clean (the
new cfg branches don't fire on this Mac, no compile cost).

Validation needed on real hardware:
  - BigMama or 5090 Windows box: --features cuda + --features directml
  - Linux+AMD box (when available): --features rocm
  - Intel-Arc Linux box (rarer): --features openvino

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(install): cargo-features.sh detects ROCm + Vulkan + DirectML, not just CUDA

Per Joel's "OOTB on all architectures from Docker" + the ORT EP
coverage added in #1001. Pre-fix the script only mapped Mac→metal +
Linux+Nvidia→cuda; ROCm was commented-out, Vulkan absent, Windows-
native unhandled entirely.

Detection order on Linux:
  1. nvidia-smi → cuda (highest priority — full ORT/llama.cpp/Candle)
  2. rocminfo  → rocm (AMD with ROCm runtime, full ORT EP)
  3. vulkaninfo → vulkan (AMD/Intel without ROCm; llama.cpp Vulkan
                  path; ORT EPs absent — will hard-fail at session
                  create per #985's helper, surfacing the gap clearly)
  4. else: empty → continuum-core panics at startup per #998 (no CPU
     fallback per architectural rule)

Windows-native (MINGW/MSYS/CYGWIN):
  - DirectML always (DX12 universal on Win10+)
  - +CUDA if nvidia-smi present (ORT picks CUDA first, DirectML for
    non-CUDA-supported ops)

Tested on this Mac: still resolves to "--features metal,accelerate"
(unchanged — Darwin branch).

Validation needed on real hardware:
  - 5090 Windows box: should resolve to "--features cuda,directml"
  - BigMama Linux+Nvidia: still "--features cuda,load-dynamic-ort"
    (unchanged)
  - Future Linux+AMD: will resolve to "--features rocm,load-dynamic-ort"
  - Future Linux+Intel-Arc with Vulkan loader: "--features vulkan,
    load-dynamic-ort"

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(install): tier hardware (MBA / mid / primary) for "OOTB on MacBook Air on up"

Per Joel's "100% free OOTB on MacBook Air on up, accessible, high
school computer" + "we are just trying to make a viable release
candidate." Pre-fix install.sh required 28GB physical RAM and rejected
16GB MBAs with "Get a 32GB+ M-series" — categorically wrong for the
stated MBA target.

Three tiers based on Mac physical RAM:

| Tier    | RAM       | Native budget | PERSONA_MODEL                   |
|---------|-----------|---------------|---------------------------------|
| MBA     | 16-23GB   | 5GB           | qwen3.5-0.8b-general-forged (~500MB) |
| mid     | 24-31GB   | 8GB           | qwen3.5-2b-general-forged (~1.4GB)  |
| primary | 32GB+     | 12GB          | qwen3.5-4b-code-forged-GGUF (~2.7GB; original) |
| reject  | <16GB     | n/a           | hard-fail with actionable message |

Previously hardcoded NATIVE_RESERVE_MIB=12GB + DOCKER_FLOOR=10GB =
22GB headroom alone (28GB+ total). Now MBA tier needs 5+6+4 = 15GB
total minimum, which fits a 16GB MBA with ~1GB headroom for working
set spikes.

PERSONA_MODEL tiering uses the existing public continuum-ai org models
(all gated:False per earlier audit). All three remain HF-public so
Carl never needs an HF token regardless of tier.

CONTINUUM_TIER env var is exported so future code paths (compose env,
runtime feature gates for Bevy/vision/audio) can consult it. This PR
doesn't yet skip Bevy/vision pull on MBA tier — that's a follow-up
once the runtime supports a chat-only mode flag.

Failure message rewritten to be actionable:
  - Names the specific minimums + what each subsystem reserves
  - Says "16GB MBA: chat-only OOTB works (smaller model). For 32GB+:
    full multimodal experience." — gives the user a sense of what
    they get at each tier instead of just a price-tag rejection.

Validation needed:
  - 16GB MBA (when available): expect tier=MBA, install completes,
    chat works with 0.8B model
  - 32GB M-series (Joel's M5 today): expect tier=primary, no
    behavior change from current (same model, same budgets)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Test <test@test.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
joelteply added a commit that referenced this pull request May 2, 2026
…us (#1004)

* feat(gpu): add ROCm / DirectML / OpenVINO ORT EP cfg branches

Per Joel's "OOTB on all architectures from Docker" + "5090 Windows box
available later." Extends the ORT GPU EP coverage from #985 (Mac/CUDA
only) to the full Carl-OOTB matrix:

  --features rocm     → AMD GPU (Linux). ROCmExecutionProvider.
  --features directml → Windows-native, any DX12 GPU (Nvidia/AMD/Intel).
  --features openvino → Intel CPU/GPU/VPU (Linux + Windows).

Each is a cfg-gated branch in build_ort_gpu_execution_providers(). The
no-GPU-EP-configured error message now lists all 5 features so a
contributor on a new arch sees the right --features incantation.

Cargo.toml feature definitions added at lines ~199-207. Per Joel's
"GPU 100%" rule the EPs only activate when explicitly built with the
matching feature flag — no runtime CPU fallback.

Build verified: cargo check --features metal,accelerate clean (the
new cfg branches don't fire on this Mac, no compile cost).

Validation needed on real hardware:
  - BigMama or 5090 Windows box: --features cuda + --features directml
  - Linux+AMD box (when available): --features rocm
  - Intel-Arc Linux box (rarer): --features openvino

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(install): cargo-features.sh detects ROCm + Vulkan + DirectML, not just CUDA

Per Joel's "OOTB on all architectures from Docker" + the ORT EP
coverage added in #1001. Pre-fix the script only mapped Mac→metal +
Linux+Nvidia→cuda; ROCm was commented-out, Vulkan absent, Windows-
native unhandled entirely.

Detection order on Linux:
  1. nvidia-smi → cuda (highest priority — full ORT/llama.cpp/Candle)
  2. rocminfo  → rocm (AMD with ROCm runtime, full ORT EP)
  3. vulkaninfo → vulkan (AMD/Intel without ROCm; llama.cpp Vulkan
                  path; ORT EPs absent — will hard-fail at session
                  create per #985's helper, surfacing the gap clearly)
  4. else: empty → continuum-core panics at startup per #998 (no CPU
     fallback per architectural rule)

Windows-native (MINGW/MSYS/CYGWIN):
  - DirectML always (DX12 universal on Win10+)
  - +CUDA if nvidia-smi present (ORT picks CUDA first, DirectML for
    non-CUDA-supported ops)

Tested on this Mac: still resolves to "--features metal,accelerate"
(unchanged — Darwin branch).

Validation needed on real hardware:
  - 5090 Windows box: should resolve to "--features cuda,directml"
  - BigMama Linux+Nvidia: still "--features cuda,load-dynamic-ort"
    (unchanged)
  - Future Linux+AMD: will resolve to "--features rocm,load-dynamic-ort"
  - Future Linux+Intel-Arc with Vulkan loader: "--features vulkan,
    load-dynamic-ort"

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(install): tier hardware (MBA / mid / primary) for "OOTB on MacBook Air on up"

Per Joel's "100% free OOTB on MacBook Air on up, accessible, high
school computer" + "we are just trying to make a viable release
candidate." Pre-fix install.sh required 28GB physical RAM and rejected
16GB MBAs with "Get a 32GB+ M-series" — categorically wrong for the
stated MBA target.

Three tiers based on Mac physical RAM:

| Tier    | RAM       | Native budget | PERSONA_MODEL                   |
|---------|-----------|---------------|---------------------------------|
| MBA     | 16-23GB   | 5GB           | qwen3.5-0.8b-general-forged (~500MB) |
| mid     | 24-31GB   | 8GB           | qwen3.5-2b-general-forged (~1.4GB)  |
| primary | 32GB+     | 12GB          | qwen3.5-4b-code-forged-GGUF (~2.7GB; original) |
| reject  | <16GB     | n/a           | hard-fail with actionable message |

Previously hardcoded NATIVE_RESERVE_MIB=12GB + DOCKER_FLOOR=10GB =
22GB headroom alone (28GB+ total). Now MBA tier needs 5+6+4 = 15GB
total minimum, which fits a 16GB MBA with ~1GB headroom for working
set spikes.

PERSONA_MODEL tiering uses the existing public continuum-ai org models
(all gated:False per earlier audit). All three remain HF-public so
Carl never needs an HF token regardless of tier.

CONTINUUM_TIER env var is exported so future code paths (compose env,
runtime feature gates for Bevy/vision/audio) can consult it. This PR
doesn't yet skip Bevy/vision pull on MBA tier — that's a follow-up
once the runtime supports a chat-only mode flag.

Failure message rewritten to be actionable:
  - Names the specific minimums + what each subsystem reserves
  - Says "16GB MBA: chat-only OOTB works (smaller model). For 32GB+:
    full multimodal experience." — gives the user a sense of what
    they get at each tier instead of just a price-tag rejection.

Validation needed:
  - 16GB MBA (when available): expect tier=MBA, install completes,
    chat works with 0.8B model
  - 32GB M-series (Joel's M5 today): expect tier=primary, no
    behavior change from current (same model, same budgets)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(gap-analysis): catalogue today's 23-PR Carl-OOTB push + chain status

End-of-day snapshot: 23 PRs landed today targeting "100% free OOTB
on MacBook Air on up, install→chat with AI flawlessly" (Joel). Lists
each PR + the Carl-OOTB chain status post-push, with explicit callouts
for what's known broken / unfixed (#980 Bug 9 leak — needs live RCA;
#75 echo loops dev-tab scope; NEW-A upstream tracking).

Also documents the worktree-based parallel-AI workflow lesson learned
the hard way (3× commit cross-contamination during today's session
before switching to per-AI worktrees + SHA-to-ref push escape valve).

Pure docs change. Tomorrow's work has a clean baseline.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Test <test@test.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant