fix(inference,#1262): delete dead compute_router.rs (no-CPU-fallback alpha)#1277
Merged
Merged
Conversation
inference/compute_router.rs declared a CPU-vs-GPU dispatch policy that
sequential_always_cpu=true on Apple Silicon and routed any matmul under
500K flops to CPU. The file had ZERO callers anywhere in the crate (only
its own tests use ComputeRouter). Production hot path goes through
LlamaCppAdapter -> LlamaCppBackend -> llama.cpp Metal/CUDA which already
loud-fails on no-GPU per inference/model.rs:82
("CPU fallback is disabled.").
Carrying dead code that contradicts the no-CPU-fallback alpha contract
on paper but never executes is the same anti-pattern this card was
filed against. Delete to remove the misleading signal; if a future
tier-aware router is needed, build it then.
Audit findings + 3 sibling cards (#1273 verify+delete Candle qwen3.5,
#1274 delete metal_deltanet.rs, #1275 regression test) posted in
#1262 (comment).
Verified:
- cargo check --features metal: clean (0 errors, pre-existing warnings)
- cargo test --lib --features metal: 2092 passed, 0 failed
Lane: alpha flywheel #1272 lane 6.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
joelteply
added a commit
that referenced
this pull request
May 16, 2026
… document map; lane status truth-up (#1316) * docs(alpha): refresh status against 2026-05-16 canary Three changes to ALPHA-GAP-ANALYSIS.md: 1. Header date 2026-05-13 -> 2026-05-16. Add explicit cross-link to CBAR-SUBSTRATE-ARCHITECTURE.md as the runtime substrate spec. 2. Restructure the Document Map (was a flat list) into categorized references (Runtime substrate / Cognition migration / Memory paging / Model registry / Grid), and add the precedence rule: if any supporting doc disagrees with ALPHA-GAP on the substrate contract (concurrency, scheduling, memory, pressure, telemetry, artifact handles), defer to CBAR-SUBSTRATE-ARCHITECTURE.md. 3. Refresh the Current Snapshot table against canary @ 2026-05-16: - Rust core row reflects the PressureBroker bootstrap stack (#1307 / #1308 / #1310), runtime lease broker (#1313), cognition oxidization (#1284 / #1290 / #1291 / #1293 / #1298 / #1301 / #1303 / #1292), dead-Candle deletes (#1277 / #1279 / #1281 / #1288), and the inference-grpc fail-closed (#1314). GRID-INFERENCE-ROUTING PR-1 announcer in flight on feat/grid-inference-routing-pr2-announcer. - Node/TS row notes net-negative trend (~2500 LOC TS deleted via the 8-PR cognition stacks). - Docker row records Docker tier Phase 1 (#1297). - Config row records SQLite-first default (#1271). - Tests row records the no-CPU-fallback contract gap: the existing regression test in workers/continuum-core covers llama.cpp / ORT only, not the Candle-side paths where the orpheus + inference-grpc fallbacks lived before #1314. * docs(alpha): refresh lane status table and immediate-next-actions Two updates to ALPHA-GAP-ANALYSIS.md: 1. Lane status table now reflects actual state @ 2026-05-16, not aspiration: - Lane A: in progress, model_registry/ exists with admission resolver. - Lane B: Phase 1 landed (#1297 docker-tier-stats); GPU profile + tier-pool eviction (#1238 / #1239) still open. - Lane C: structured RuntimeMetric emits from inference paths; vdd-report-command not yet bound. - Lane D: UNSTARTED — flagged as the highest-leverage open lane because Lane E (PressureBroker) and the inbox coalescing pattern both presuppose RuntimeFrame / CognitionTurnFrame. - Lane E: bootstrap landed (#1307 / #1308 / #1310 / #1313); paging and pre-broker concurrency-hack deletion remain. Concrete deletion target called out: get_num_workers() in inference-grpc/main.rs, which reads INFERENCE_WORKERS from config.env and otherwise picks worker count from system memory at startup — both branches violate the "we do not hard code" / "dynamic, broker-owned concurrency" rule. - Lane F: ~2500 LOC TS deleted manually this session; mechanical CI ratchet still not landed (deletion is reversible until it is). - Lane G: refresh in flight on joel/docs-alpha-refresh. Adds an "adjacent active workstream" note for GRID-INFERENCE-ROUTING (PR-1 announcer + probe + registry in flight on feat/grid-inference-routing-pr2-announcer) as the grid-side counterpart to Lane A. 2. Immediate Next Actions reordered by alpha leverage, not by who is online. Top three items are Lane D claim, the universal-trait "for free" triplet (RuntimeModule base trait + derive macro + scaffold generator from CBAR-SUBSTRATE-ARCHITECTURE.md), and the get_num_workers() deletion. Adds the Lane C VDD report command and the widening of no_cpu_fallback_contract.rs to cover Candle paths. Adds doc-refresh follow-ups so each supporting doc gets cross-linked back into the Document Map. * docs(alpha): add Lane H + GENOME-FOUNDRY-SENTINEL cross-links Three updates to ALPHA-GAP-ANALYSIS.md following continuum#1327: 1. Lane H added to the lane status table: Substrate governor + tiered genome cache. Sibling to Lane E (broker owns admission; governor owns sizing). 7-PR implementation sequence detailed in GENOME-FOUNDRY-SENTINEL.md Part 13. Currently Proposed, needs owner claim. 2. Lane claim update at end of the lane discussion: Lane H proposed via continuum#1327 with full design pinned to that doc; sibling to Lane E with the boundary stated explicitly. 3. Document Map gets GENOME-FOUNDRY-SENTINEL entry under "Runtime substrate (load-bearing)" — the artifact-sharing economy on top of the CBAR substrate. Tiered genome cache, page faults, foundry as JIT, sentinel-AI as profile-guided optimizer, demand-aligned recall, composer + speculator, SubstrateGovernor (DVFS). 4. Immediate Next Actions step 9 added: claim Lane H. Step 10 (formerly step 9) updated to reflect what's landed in this doc batch (CBAR-SUBSTRATE refinement via #1324, CONTINUUM-ARCHITECTURE refresh via #1317, CONTINUUM-VISION refresh via #1320, GENOME-FOUNDRY-SENTINEL via #1327) and what's next (CLAUDE.md substrate pointer; stale-section deprecations in UNIVERSAL-SENSORY / LEARNING / QUEUE-DRIVEN-COGNITION). --------- Co-authored-by: Test <test@test.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
inference/compute_router.rs— pure dead code withsequential_always_cpu=truepolicy on Apple Silicon and amatmul_cpu_ceiling=500_000flop threshold that routed any small matmul to CPU. Greps confirm zero callers (only its own internalmod testsblock usesComputeRouter).pub mod compute_router;frominference/mod.rs.Why
Production hot path goes through
LlamaCppAdapter→LlamaCppBackend→ llama.cpp Metal/CUDA, which already loud-fails on no-GPU perinference/model.rs:82(panic!("No GPU device available for inference. CPU fallback is disabled.")). The compute_router file shipped a CPU-fallback policy that contradicted the no-CPU-fallback alpha contract on paper but never actually executed — exactly the misleading-dead-code anti-pattern the parent card was filed against.Audit context
This is PR 1 of 4 for #1262 (silent CPU model fallbacks audit). Full audit findings + sibling cards filed at:
#1262 (comment)
Sibling follow-ons: #1273 (verify+delete Candle qwen3.5 path), #1274 (delete vendored/metal_deltanet.rs), #1275 (regression test for no-CPU-fallback contract).
Lane: alpha flywheel #1272 lane 6.
Test plan
cargo check --features metal— clean (0 errors, only pre-existing warnings)cargo test --lib --features metal -- --test-threads=1— 2092 passed, 0 failed🤖 Generated with Claude Code