Skip to content

fix(inference,#1262): delete dead compute_router.rs (no-CPU-fallback alpha)#1277

Merged
joelteply merged 1 commit into
canaryfrom
fix/delete-dead-compute-router-1262
May 15, 2026
Merged

fix(inference,#1262): delete dead compute_router.rs (no-CPU-fallback alpha)#1277
joelteply merged 1 commit into
canaryfrom
fix/delete-dead-compute-router-1262

Conversation

@joelteply
Copy link
Copy Markdown
Contributor

Summary

  • Delete inference/compute_router.rs — pure dead code with sequential_always_cpu=true policy on Apple Silicon and a matmul_cpu_ceiling=500_000 flop threshold that routed any small matmul to CPU. Greps confirm zero callers (only its own internal mod tests block uses ComputeRouter).
  • Drop pub mod compute_router; from inference/mod.rs.

Why

Production hot path goes through LlamaCppAdapterLlamaCppBackend → llama.cpp Metal/CUDA, which already loud-fails on no-GPU per inference/model.rs:82 (panic!("No GPU device available for inference. CPU fallback is disabled.")). The compute_router file shipped a CPU-fallback policy that contradicted the no-CPU-fallback alpha contract on paper but never actually executed — exactly the misleading-dead-code anti-pattern the parent card was filed against.

Audit context

This is PR 1 of 4 for #1262 (silent CPU model fallbacks audit). Full audit findings + sibling cards filed at:
#1262 (comment)

Sibling follow-ons: #1273 (verify+delete Candle qwen3.5 path), #1274 (delete vendored/metal_deltanet.rs), #1275 (regression test for no-CPU-fallback contract).

Lane: alpha flywheel #1272 lane 6.

Test plan

  • cargo check --features metal — clean (0 errors, only pre-existing warnings)
  • cargo test --lib --features metal -- --test-threads=1 — 2092 passed, 0 failed
  • Precommit (TypeScript + browser ping): PASSED

🤖 Generated with Claude Code

inference/compute_router.rs declared a CPU-vs-GPU dispatch policy that
sequential_always_cpu=true on Apple Silicon and routed any matmul under
500K flops to CPU. The file had ZERO callers anywhere in the crate (only
its own tests use ComputeRouter). Production hot path goes through
LlamaCppAdapter -> LlamaCppBackend -> llama.cpp Metal/CUDA which already
loud-fails on no-GPU per inference/model.rs:82
("CPU fallback is disabled.").

Carrying dead code that contradicts the no-CPU-fallback alpha contract
on paper but never executes is the same anti-pattern this card was
filed against. Delete to remove the misleading signal; if a future
tier-aware router is needed, build it then.

Audit findings + 3 sibling cards (#1273 verify+delete Candle qwen3.5,
#1274 delete metal_deltanet.rs, #1275 regression test) posted in
#1262 (comment).

Verified:
- cargo check --features metal: clean (0 errors, pre-existing warnings)
- cargo test --lib --features metal: 2092 passed, 0 failed

Lane: alpha flywheel #1272 lane 6.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@joelteply joelteply merged commit 364b1c4 into canary May 15, 2026
3 checks passed
@joelteply joelteply deleted the fix/delete-dead-compute-router-1262 branch May 15, 2026 17:21
joelteply added a commit that referenced this pull request May 16, 2026
… document map; lane status truth-up (#1316)

* docs(alpha): refresh status against 2026-05-16 canary

Three changes to ALPHA-GAP-ANALYSIS.md:

1. Header date 2026-05-13 -> 2026-05-16. Add explicit cross-link to
   CBAR-SUBSTRATE-ARCHITECTURE.md as the runtime substrate spec.

2. Restructure the Document Map (was a flat list) into categorized
   references (Runtime substrate / Cognition migration / Memory paging /
   Model registry / Grid), and add the precedence rule: if any supporting
   doc disagrees with ALPHA-GAP on the substrate contract (concurrency,
   scheduling, memory, pressure, telemetry, artifact handles), defer to
   CBAR-SUBSTRATE-ARCHITECTURE.md.

3. Refresh the Current Snapshot table against canary @ 2026-05-16:
   - Rust core row reflects the PressureBroker bootstrap stack
     (#1307 / #1308 / #1310), runtime lease broker (#1313), cognition
     oxidization (#1284 / #1290 / #1291 / #1293 / #1298 / #1301 / #1303
     / #1292), dead-Candle deletes (#1277 / #1279 / #1281 / #1288), and
     the inference-grpc fail-closed (#1314). GRID-INFERENCE-ROUTING
     PR-1 announcer in flight on feat/grid-inference-routing-pr2-announcer.
   - Node/TS row notes net-negative trend (~2500 LOC TS deleted via the
     8-PR cognition stacks).
   - Docker row records Docker tier Phase 1 (#1297).
   - Config row records SQLite-first default (#1271).
   - Tests row records the no-CPU-fallback contract gap: the existing
     regression test in workers/continuum-core covers llama.cpp / ORT
     only, not the Candle-side paths where the orpheus + inference-grpc
     fallbacks lived before #1314.

* docs(alpha): refresh lane status table and immediate-next-actions

Two updates to ALPHA-GAP-ANALYSIS.md:

1. Lane status table now reflects actual state @ 2026-05-16, not aspiration:
   - Lane A: in progress, model_registry/ exists with admission resolver.
   - Lane B: Phase 1 landed (#1297 docker-tier-stats); GPU profile +
     tier-pool eviction (#1238 / #1239) still open.
   - Lane C: structured RuntimeMetric emits from inference paths;
     vdd-report-command not yet bound.
   - Lane D: UNSTARTED — flagged as the highest-leverage open lane because
     Lane E (PressureBroker) and the inbox coalescing pattern both
     presuppose RuntimeFrame / CognitionTurnFrame.
   - Lane E: bootstrap landed (#1307 / #1308 / #1310 / #1313); paging and
     pre-broker concurrency-hack deletion remain. Concrete deletion target
     called out: get_num_workers() in inference-grpc/main.rs, which reads
     INFERENCE_WORKERS from config.env and otherwise picks worker count
     from system memory at startup — both branches violate the
     "we do not hard code" / "dynamic, broker-owned concurrency" rule.
   - Lane F: ~2500 LOC TS deleted manually this session; mechanical CI
     ratchet still not landed (deletion is reversible until it is).
   - Lane G: refresh in flight on joel/docs-alpha-refresh.

   Adds an "adjacent active workstream" note for GRID-INFERENCE-ROUTING
   (PR-1 announcer + probe + registry in flight on
   feat/grid-inference-routing-pr2-announcer) as the grid-side counterpart
   to Lane A.

2. Immediate Next Actions reordered by alpha leverage, not by who is
   online. Top three items are Lane D claim, the universal-trait "for free"
   triplet (RuntimeModule base trait + derive macro + scaffold generator
   from CBAR-SUBSTRATE-ARCHITECTURE.md), and the
   get_num_workers() deletion. Adds the Lane C VDD report command and the
   widening of no_cpu_fallback_contract.rs to cover Candle paths. Adds
   doc-refresh follow-ups so each supporting doc gets cross-linked back
   into the Document Map.

* docs(alpha): add Lane H + GENOME-FOUNDRY-SENTINEL cross-links

Three updates to ALPHA-GAP-ANALYSIS.md following continuum#1327:

1. Lane H added to the lane status table: Substrate governor + tiered
   genome cache. Sibling to Lane E (broker owns admission; governor
   owns sizing). 7-PR implementation sequence detailed in
   GENOME-FOUNDRY-SENTINEL.md Part 13. Currently Proposed, needs owner
   claim.

2. Lane claim update at end of the lane discussion: Lane H proposed
   via continuum#1327 with full design pinned to that doc; sibling to
   Lane E with the boundary stated explicitly.

3. Document Map gets GENOME-FOUNDRY-SENTINEL entry under "Runtime
   substrate (load-bearing)" — the artifact-sharing economy on top of
   the CBAR substrate. Tiered genome cache, page faults, foundry as JIT,
   sentinel-AI as profile-guided optimizer, demand-aligned recall,
   composer + speculator, SubstrateGovernor (DVFS).

4. Immediate Next Actions step 9 added: claim Lane H. Step 10 (formerly
   step 9) updated to reflect what's landed in this doc batch
   (CBAR-SUBSTRATE refinement via #1324, CONTINUUM-ARCHITECTURE refresh
   via #1317, CONTINUUM-VISION refresh via #1320, GENOME-FOUNDRY-SENTINEL
   via #1327) and what's next (CLAUDE.md substrate pointer; stale-section
   deprecations in UNIVERSAL-SENSORY / LEARNING / QUEUE-DRIVEN-COGNITION).

---------

Co-authored-by: Test <test@test.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant