docs(gap-analysis): catalogue today 23-PR Carl-OOTB push + chain status#1004
Merged
Conversation
Per Joel's "OOTB on all architectures from Docker" + "5090 Windows box available later." Extends the ORT GPU EP coverage from #985 (Mac/CUDA only) to the full Carl-OOTB matrix: --features rocm → AMD GPU (Linux). ROCmExecutionProvider. --features directml → Windows-native, any DX12 GPU (Nvidia/AMD/Intel). --features openvino → Intel CPU/GPU/VPU (Linux + Windows). Each is a cfg-gated branch in build_ort_gpu_execution_providers(). The no-GPU-EP-configured error message now lists all 5 features so a contributor on a new arch sees the right --features incantation. Cargo.toml feature definitions added at lines ~199-207. Per Joel's "GPU 100%" rule the EPs only activate when explicitly built with the matching feature flag — no runtime CPU fallback. Build verified: cargo check --features metal,accelerate clean (the new cfg branches don't fire on this Mac, no compile cost). Validation needed on real hardware: - BigMama or 5090 Windows box: --features cuda + --features directml - Linux+AMD box (when available): --features rocm - Intel-Arc Linux box (rarer): --features openvino Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… just CUDA Per Joel's "OOTB on all architectures from Docker" + the ORT EP coverage added in #1001. Pre-fix the script only mapped Mac→metal + Linux+Nvidia→cuda; ROCm was commented-out, Vulkan absent, Windows- native unhandled entirely. Detection order on Linux: 1. nvidia-smi → cuda (highest priority — full ORT/llama.cpp/Candle) 2. rocminfo → rocm (AMD with ROCm runtime, full ORT EP) 3. vulkaninfo → vulkan (AMD/Intel without ROCm; llama.cpp Vulkan path; ORT EPs absent — will hard-fail at session create per #985's helper, surfacing the gap clearly) 4. else: empty → continuum-core panics at startup per #998 (no CPU fallback per architectural rule) Windows-native (MINGW/MSYS/CYGWIN): - DirectML always (DX12 universal on Win10+) - +CUDA if nvidia-smi present (ORT picks CUDA first, DirectML for non-CUDA-supported ops) Tested on this Mac: still resolves to "--features metal,accelerate" (unchanged — Darwin branch). Validation needed on real hardware: - 5090 Windows box: should resolve to "--features cuda,directml" - BigMama Linux+Nvidia: still "--features cuda,load-dynamic-ort" (unchanged) - Future Linux+AMD: will resolve to "--features rocm,load-dynamic-ort" - Future Linux+Intel-Arc with Vulkan loader: "--features vulkan, load-dynamic-ort" Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ok Air on up"
Per Joel's "100% free OOTB on MacBook Air on up, accessible, high
school computer" + "we are just trying to make a viable release
candidate." Pre-fix install.sh required 28GB physical RAM and rejected
16GB MBAs with "Get a 32GB+ M-series" — categorically wrong for the
stated MBA target.
Three tiers based on Mac physical RAM:
| Tier | RAM | Native budget | PERSONA_MODEL |
|---------|-----------|---------------|---------------------------------|
| MBA | 16-23GB | 5GB | qwen3.5-0.8b-general-forged (~500MB) |
| mid | 24-31GB | 8GB | qwen3.5-2b-general-forged (~1.4GB) |
| primary | 32GB+ | 12GB | qwen3.5-4b-code-forged-GGUF (~2.7GB; original) |
| reject | <16GB | n/a | hard-fail with actionable message |
Previously hardcoded NATIVE_RESERVE_MIB=12GB + DOCKER_FLOOR=10GB =
22GB headroom alone (28GB+ total). Now MBA tier needs 5+6+4 = 15GB
total minimum, which fits a 16GB MBA with ~1GB headroom for working
set spikes.
PERSONA_MODEL tiering uses the existing public continuum-ai org models
(all gated:False per earlier audit). All three remain HF-public so
Carl never needs an HF token regardless of tier.
CONTINUUM_TIER env var is exported so future code paths (compose env,
runtime feature gates for Bevy/vision/audio) can consult it. This PR
doesn't yet skip Bevy/vision pull on MBA tier — that's a follow-up
once the runtime supports a chat-only mode flag.
Failure message rewritten to be actionable:
- Names the specific minimums + what each subsystem reserves
- Says "16GB MBA: chat-only OOTB works (smaller model). For 32GB+:
full multimodal experience." — gives the user a sense of what
they get at each tier instead of just a price-tag rejection.
Validation needed:
- 16GB MBA (when available): expect tier=MBA, install completes,
chat works with 0.8B model
- 32GB M-series (Joel's M5 today): expect tier=primary, no
behavior change from current (same model, same budgets)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…atus End-of-day snapshot: 23 PRs landed today targeting "100% free OOTB on MacBook Air on up, install→chat with AI flawlessly" (Joel). Lists each PR + the Carl-OOTB chain status post-push, with explicit callouts for what's known broken / unfixed (#980 Bug 9 leak — needs live RCA; #75 echo loops dev-tab scope; NEW-A upstream tracking). Also documents the worktree-based parallel-AI workflow lesson learned the hard way (3× commit cross-contamination during today's session before switching to per-AI worktrees + SHA-to-ref push escape valve). Pure docs change. Tomorrow's work has a clean baseline. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Pure docs. End-of-day snapshot of 23 PRs landed + the Carl-OOTB chain status post-push. Tomorrow has a clean baseline.
🤖 Generated with Claude Code