fix(inference,#1273): delete dead Candle Qwen3.5 GGUF backend (~1100 LOC) by joelteply · Pull Request #1279 · CambrianTech/continuum

joelteply · 2026-05-15T17:39:33Z

Summary

Delete the Candle-side Qwen3.5 inference path:

inference/backends/qwen35_gguf.rs — Qwen35GgufBackend ModelBackend wrapper (194 LOC)
inference/vendored/quantized_qwen35.rs — vendored hybrid DeltaNet + Attention recurrence loop (919 LOC)
Module declarations in backends/mod.rs and vendored/mod.rs
Replace the qwen3/qwen35 arm of backends::load_gguf_backend with a typed error pointing callers at LlamaCppAdapter

Net: +14 / -1132 LOC.

Why it was dead

AIProviderModule::register_adapters (modules/ai_provider.rs:221) only registers LlamaCppAdapter for local inference. CandleAdapter is imported but never instantiated. Qwen35GgufBackend was only reachable via backends::load_gguf_backend, whose only callers were unregistered (CandleAdapter, ContinuumModel, bin/* debug utilities) — none in the production hot path. Production Qwen3.5 chat goes through llama.cpp via LlamaCppAdapter → LlamaCppBackend.

Scope-down from initial #1273 plan

The original plan (#1273 (comment)) was to delete the entire Candle inference chain (CandleAdapter, ContinuumModel, quantized.rs, vendored qwen2/llama backends). cargo check confirmed broader scope is entangled with plasticity LoRA training tests, which use compact_llama_safetensors + model::rebuild_with_stacked_lora. That broader deletion needs a separate audit of plasticity's production reachability and is deferred to a follow-up card.

This PR keeps everything plasticity touches (model.rs, candle_adapter.rs, quantized.rs, llama_safetensors.rs, compact_llama_safetensors.rs, vendored qwen2/llama) and only deletes the Qwen3.5-specific Candle code that has no plasticity dependency.

Audit context

This is PR 2 of 4 for #1262 (silent CPU model fallbacks audit). Audit findings:
#1262 (comment)

Sibling PRs:

fix(inference,#1262): delete dead compute_router.rs (no-CPU-fallback alpha) #1277 (PR 1) — delete compute_router.rs (213 LOC dead-code policy)
perf(inference): delete vendored/metal_deltanet.rs (zero callers, misleading 'fall back to CPU' doc) #1274 (PR 3) — delete vendored/metal_deltanet.rs (zero callers)
test(inference): regression test for no-CPU-fallback contract on LlamaCppBackend + ORT #1275 (PR 4) — regression test for no-CPU-fallback contract

Lane: alpha flywheel #1272 lane 6.

Wire change

backends::load_gguf_backend for qwen3/qwen35 architecture now returns a typed error ("Qwen3.5 GGUF routing through the Candle backend was removed in #1273. Use LlamaCppAdapter"). Callers in dead code paths (ContinuumModel, CandleAdapter) get a loud-fail rather than a silent dispatch to deleted code.

Test plan

cargo check --features metal — clean (0 errors, 61 pre-existing warnings)
cargo test --lib --features metal -- --test-threads=1 — 2096 passed, 0 failed (4 more than baseline; vendored qwen35 module removal cleared dead-code warnings that suppressed test discovery)
Precommit (TypeScript + browser ping): PASSED

🤖 Generated with Claude Code

Remove the Candle-side Qwen3.5 inference path (the hybrid DeltaNet + Attention recurrence loop in vendored/quantized_qwen35.rs and its ModelBackend wrapper in backends/qwen35_gguf.rs). 1100+ LOC removed. Why it was dead: - AIProviderModule::register_adapters (modules/ai_provider.rs:221) only registers LlamaCppAdapter for local inference. CandleAdapter is imported but never instantiated. - Qwen35GgufBackend was only reachable via backends::load_gguf_backend, whose only callers were unregistered (CandleAdapter, ContinuumModel, bin/* utilities) — none in the production hot path. - Production Qwen3.5 chat goes through llama.cpp (vendored, statically linked) via LlamaCppAdapter → LlamaCppBackend. Scope-down from initial #1273 plan: The original plan was to delete the entire Candle inference chain (CandleAdapter, ContinuumModel, quantized.rs, vendored qwen2/llama backends). cargo check confirmed broader scope is entangled with plasticity LoRA training tests, which use compact_llama_safetensors + rebuild_with_stacked_lora. That broader deletion needs a separate audit of plasticity's production reachability and is deferred to a follow-up card. This PR keeps everything plasticity touches (model.rs, candle_adapter.rs, quantized.rs, llama_safetensors.rs, compact_llama_safetensors.rs, vendored qwen2/llama) and only deletes the qwen3.5-specific Candle path that has no plasticity dependency. Wire change: - backends::load_gguf_backend now returns a typed error for "qwen3"|"qwen35" architectures pointing callers at LlamaCppAdapter, rather than silently dispatching to the deleted Candle backend. Verified: - cargo check --features metal: clean (0 errors, 61 pre-existing warnings) - cargo test --lib --features metal: 2096 passed, 0 failed (4 more than baseline — vendored qwen35 module registration removed some dead-code warnings that were eating test discovery) Lane: alpha flywheel #1272 lane 6. Audit context: #1262 (comment) Verification: #1273 (comment) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

… document map; lane status truth-up (#1316) * docs(alpha): refresh status against 2026-05-16 canary Three changes to ALPHA-GAP-ANALYSIS.md: 1. Header date 2026-05-13 -> 2026-05-16. Add explicit cross-link to CBAR-SUBSTRATE-ARCHITECTURE.md as the runtime substrate spec. 2. Restructure the Document Map (was a flat list) into categorized references (Runtime substrate / Cognition migration / Memory paging / Model registry / Grid), and add the precedence rule: if any supporting doc disagrees with ALPHA-GAP on the substrate contract (concurrency, scheduling, memory, pressure, telemetry, artifact handles), defer to CBAR-SUBSTRATE-ARCHITECTURE.md. 3. Refresh the Current Snapshot table against canary @ 2026-05-16: - Rust core row reflects the PressureBroker bootstrap stack (#1307 / #1308 / #1310), runtime lease broker (#1313), cognition oxidization (#1284 / #1290 / #1291 / #1293 / #1298 / #1301 / #1303 / #1292), dead-Candle deletes (#1277 / #1279 / #1281 / #1288), and the inference-grpc fail-closed (#1314). GRID-INFERENCE-ROUTING PR-1 announcer in flight on feat/grid-inference-routing-pr2-announcer. - Node/TS row notes net-negative trend (~2500 LOC TS deleted via the 8-PR cognition stacks). - Docker row records Docker tier Phase 1 (#1297). - Config row records SQLite-first default (#1271). - Tests row records the no-CPU-fallback contract gap: the existing regression test in workers/continuum-core covers llama.cpp / ORT only, not the Candle-side paths where the orpheus + inference-grpc fallbacks lived before #1314. * docs(alpha): refresh lane status table and immediate-next-actions Two updates to ALPHA-GAP-ANALYSIS.md: 1. Lane status table now reflects actual state @ 2026-05-16, not aspiration: - Lane A: in progress, model_registry/ exists with admission resolver. - Lane B: Phase 1 landed (#1297 docker-tier-stats); GPU profile + tier-pool eviction (#1238 / #1239) still open. - Lane C: structured RuntimeMetric emits from inference paths; vdd-report-command not yet bound. - Lane D: UNSTARTED — flagged as the highest-leverage open lane because Lane E (PressureBroker) and the inbox coalescing pattern both presuppose RuntimeFrame / CognitionTurnFrame. - Lane E: bootstrap landed (#1307 / #1308 / #1310 / #1313); paging and pre-broker concurrency-hack deletion remain. Concrete deletion target called out: get_num_workers() in inference-grpc/main.rs, which reads INFERENCE_WORKERS from config.env and otherwise picks worker count from system memory at startup — both branches violate the "we do not hard code" / "dynamic, broker-owned concurrency" rule. - Lane F: ~2500 LOC TS deleted manually this session; mechanical CI ratchet still not landed (deletion is reversible until it is). - Lane G: refresh in flight on joel/docs-alpha-refresh. Adds an "adjacent active workstream" note for GRID-INFERENCE-ROUTING (PR-1 announcer + probe + registry in flight on feat/grid-inference-routing-pr2-announcer) as the grid-side counterpart to Lane A. 2. Immediate Next Actions reordered by alpha leverage, not by who is online. Top three items are Lane D claim, the universal-trait "for free" triplet (RuntimeModule base trait + derive macro + scaffold generator from CBAR-SUBSTRATE-ARCHITECTURE.md), and the get_num_workers() deletion. Adds the Lane C VDD report command and the widening of no_cpu_fallback_contract.rs to cover Candle paths. Adds doc-refresh follow-ups so each supporting doc gets cross-linked back into the Document Map. * docs(alpha): add Lane H + GENOME-FOUNDRY-SENTINEL cross-links Three updates to ALPHA-GAP-ANALYSIS.md following continuum#1327: 1. Lane H added to the lane status table: Substrate governor + tiered genome cache. Sibling to Lane E (broker owns admission; governor owns sizing). 7-PR implementation sequence detailed in GENOME-FOUNDRY-SENTINEL.md Part 13. Currently Proposed, needs owner claim. 2. Lane claim update at end of the lane discussion: Lane H proposed via continuum#1327 with full design pinned to that doc; sibling to Lane E with the boundary stated explicitly. 3. Document Map gets GENOME-FOUNDRY-SENTINEL entry under "Runtime substrate (load-bearing)" — the artifact-sharing economy on top of the CBAR substrate. Tiered genome cache, page faults, foundry as JIT, sentinel-AI as profile-guided optimizer, demand-aligned recall, composer + speculator, SubstrateGovernor (DVFS). 4. Immediate Next Actions step 9 added: claim Lane H. Step 10 (formerly step 9) updated to reflect what's landed in this doc batch (CBAR-SUBSTRATE refinement via #1324, CONTINUUM-ARCHITECTURE refresh via #1317, CONTINUUM-VISION refresh via #1320, GENOME-FOUNDRY-SENTINEL via #1327) and what's next (CLAUDE.md substrate pointer; stale-section deprecations in UNIVERSAL-SENSORY / LEARNING / QUEUE-DRIVEN-COGNITION). --------- Co-authored-by: Test <test@test.com>

joelteply merged commit 4180fe1 into canary May 15, 2026
3 checks passed

joelteply deleted the fix/delete-dead-candle-qwen-path-1273 branch May 15, 2026 17:40

github-actions Bot added the size: XL label May 15, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(inference,#1273): delete dead Candle Qwen3.5 GGUF backend (~1100 LOC)#1279

fix(inference,#1273): delete dead Candle Qwen3.5 GGUF backend (~1100 LOC)#1279
joelteply merged 1 commit into
canaryfrom
fix/delete-dead-candle-qwen-path-1273

joelteply commented May 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

joelteply commented May 15, 2026

Summary

Why it was dead

Scope-down from initial #1273 plan

Audit context

Wire change

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant