fix(inference,#1273): delete dead Candle Qwen3.5 GGUF backend (~1100 LOC)#1279
Merged
Merged
Conversation
Remove the Candle-side Qwen3.5 inference path (the hybrid DeltaNet + Attention recurrence loop in vendored/quantized_qwen35.rs and its ModelBackend wrapper in backends/qwen35_gguf.rs). 1100+ LOC removed. Why it was dead: - AIProviderModule::register_adapters (modules/ai_provider.rs:221) only registers LlamaCppAdapter for local inference. CandleAdapter is imported but never instantiated. - Qwen35GgufBackend was only reachable via backends::load_gguf_backend, whose only callers were unregistered (CandleAdapter, ContinuumModel, bin/* utilities) — none in the production hot path. - Production Qwen3.5 chat goes through llama.cpp (vendored, statically linked) via LlamaCppAdapter → LlamaCppBackend. Scope-down from initial #1273 plan: The original plan was to delete the entire Candle inference chain (CandleAdapter, ContinuumModel, quantized.rs, vendored qwen2/llama backends). cargo check confirmed broader scope is entangled with plasticity LoRA training tests, which use compact_llama_safetensors + rebuild_with_stacked_lora. That broader deletion needs a separate audit of plasticity's production reachability and is deferred to a follow-up card. This PR keeps everything plasticity touches (model.rs, candle_adapter.rs, quantized.rs, llama_safetensors.rs, compact_llama_safetensors.rs, vendored qwen2/llama) and only deletes the qwen3.5-specific Candle path that has no plasticity dependency. Wire change: - backends::load_gguf_backend now returns a typed error for "qwen3"|"qwen35" architectures pointing callers at LlamaCppAdapter, rather than silently dispatching to the deleted Candle backend. Verified: - cargo check --features metal: clean (0 errors, 61 pre-existing warnings) - cargo test --lib --features metal: 2096 passed, 0 failed (4 more than baseline — vendored qwen35 module registration removed some dead-code warnings that were eating test discovery) Lane: alpha flywheel #1272 lane 6. Audit context: #1262 (comment) Verification: #1273 (comment) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This was referenced May 15, 2026
joelteply
added a commit
that referenced
this pull request
May 16, 2026
… document map; lane status truth-up (#1316) * docs(alpha): refresh status against 2026-05-16 canary Three changes to ALPHA-GAP-ANALYSIS.md: 1. Header date 2026-05-13 -> 2026-05-16. Add explicit cross-link to CBAR-SUBSTRATE-ARCHITECTURE.md as the runtime substrate spec. 2. Restructure the Document Map (was a flat list) into categorized references (Runtime substrate / Cognition migration / Memory paging / Model registry / Grid), and add the precedence rule: if any supporting doc disagrees with ALPHA-GAP on the substrate contract (concurrency, scheduling, memory, pressure, telemetry, artifact handles), defer to CBAR-SUBSTRATE-ARCHITECTURE.md. 3. Refresh the Current Snapshot table against canary @ 2026-05-16: - Rust core row reflects the PressureBroker bootstrap stack (#1307 / #1308 / #1310), runtime lease broker (#1313), cognition oxidization (#1284 / #1290 / #1291 / #1293 / #1298 / #1301 / #1303 / #1292), dead-Candle deletes (#1277 / #1279 / #1281 / #1288), and the inference-grpc fail-closed (#1314). GRID-INFERENCE-ROUTING PR-1 announcer in flight on feat/grid-inference-routing-pr2-announcer. - Node/TS row notes net-negative trend (~2500 LOC TS deleted via the 8-PR cognition stacks). - Docker row records Docker tier Phase 1 (#1297). - Config row records SQLite-first default (#1271). - Tests row records the no-CPU-fallback contract gap: the existing regression test in workers/continuum-core covers llama.cpp / ORT only, not the Candle-side paths where the orpheus + inference-grpc fallbacks lived before #1314. * docs(alpha): refresh lane status table and immediate-next-actions Two updates to ALPHA-GAP-ANALYSIS.md: 1. Lane status table now reflects actual state @ 2026-05-16, not aspiration: - Lane A: in progress, model_registry/ exists with admission resolver. - Lane B: Phase 1 landed (#1297 docker-tier-stats); GPU profile + tier-pool eviction (#1238 / #1239) still open. - Lane C: structured RuntimeMetric emits from inference paths; vdd-report-command not yet bound. - Lane D: UNSTARTED — flagged as the highest-leverage open lane because Lane E (PressureBroker) and the inbox coalescing pattern both presuppose RuntimeFrame / CognitionTurnFrame. - Lane E: bootstrap landed (#1307 / #1308 / #1310 / #1313); paging and pre-broker concurrency-hack deletion remain. Concrete deletion target called out: get_num_workers() in inference-grpc/main.rs, which reads INFERENCE_WORKERS from config.env and otherwise picks worker count from system memory at startup — both branches violate the "we do not hard code" / "dynamic, broker-owned concurrency" rule. - Lane F: ~2500 LOC TS deleted manually this session; mechanical CI ratchet still not landed (deletion is reversible until it is). - Lane G: refresh in flight on joel/docs-alpha-refresh. Adds an "adjacent active workstream" note for GRID-INFERENCE-ROUTING (PR-1 announcer + probe + registry in flight on feat/grid-inference-routing-pr2-announcer) as the grid-side counterpart to Lane A. 2. Immediate Next Actions reordered by alpha leverage, not by who is online. Top three items are Lane D claim, the universal-trait "for free" triplet (RuntimeModule base trait + derive macro + scaffold generator from CBAR-SUBSTRATE-ARCHITECTURE.md), and the get_num_workers() deletion. Adds the Lane C VDD report command and the widening of no_cpu_fallback_contract.rs to cover Candle paths. Adds doc-refresh follow-ups so each supporting doc gets cross-linked back into the Document Map. * docs(alpha): add Lane H + GENOME-FOUNDRY-SENTINEL cross-links Three updates to ALPHA-GAP-ANALYSIS.md following continuum#1327: 1. Lane H added to the lane status table: Substrate governor + tiered genome cache. Sibling to Lane E (broker owns admission; governor owns sizing). 7-PR implementation sequence detailed in GENOME-FOUNDRY-SENTINEL.md Part 13. Currently Proposed, needs owner claim. 2. Lane claim update at end of the lane discussion: Lane H proposed via continuum#1327 with full design pinned to that doc; sibling to Lane E with the boundary stated explicitly. 3. Document Map gets GENOME-FOUNDRY-SENTINEL entry under "Runtime substrate (load-bearing)" — the artifact-sharing economy on top of the CBAR substrate. Tiered genome cache, page faults, foundry as JIT, sentinel-AI as profile-guided optimizer, demand-aligned recall, composer + speculator, SubstrateGovernor (DVFS). 4. Immediate Next Actions step 9 added: claim Lane H. Step 10 (formerly step 9) updated to reflect what's landed in this doc batch (CBAR-SUBSTRATE refinement via #1324, CONTINUUM-ARCHITECTURE refresh via #1317, CONTINUUM-VISION refresh via #1320, GENOME-FOUNDRY-SENTINEL via #1327) and what's next (CLAUDE.md substrate pointer; stale-section deprecations in UNIVERSAL-SENSORY / LEARNING / QUEUE-DRIVEN-COGNITION). --------- Co-authored-by: Test <test@test.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Delete the Candle-side Qwen3.5 inference path:
inference/backends/qwen35_gguf.rs—Qwen35GgufBackendModelBackend wrapper (194 LOC)inference/vendored/quantized_qwen35.rs— vendored hybrid DeltaNet + Attention recurrence loop (919 LOC)backends/mod.rsandvendored/mod.rsqwen3/qwen35arm ofbackends::load_gguf_backendwith a typed error pointing callers atLlamaCppAdapterNet: +14 / -1132 LOC.
Why it was dead
AIProviderModule::register_adapters(modules/ai_provider.rs:221) only registersLlamaCppAdapterfor local inference.CandleAdapteris imported but never instantiated.Qwen35GgufBackendwas only reachable viabackends::load_gguf_backend, whose only callers were unregistered (CandleAdapter,ContinuumModel,bin/*debug utilities) — none in the production hot path. Production Qwen3.5 chat goes through llama.cpp viaLlamaCppAdapter→LlamaCppBackend.Scope-down from initial #1273 plan
The original plan (#1273 (comment)) was to delete the entire Candle inference chain (
CandleAdapter,ContinuumModel,quantized.rs, vendored qwen2/llama backends).cargo checkconfirmed broader scope is entangled with plasticity LoRA training tests, which usecompact_llama_safetensors+model::rebuild_with_stacked_lora. That broader deletion needs a separate audit of plasticity's production reachability and is deferred to a follow-up card.This PR keeps everything plasticity touches (
model.rs,candle_adapter.rs,quantized.rs,llama_safetensors.rs,compact_llama_safetensors.rs, vendored qwen2/llama) and only deletes the Qwen3.5-specific Candle code that has no plasticity dependency.Audit context
This is PR 2 of 4 for #1262 (silent CPU model fallbacks audit). Audit findings:
#1262 (comment)
Sibling PRs:
compute_router.rs(213 LOC dead-code policy)vendored/metal_deltanet.rs(zero callers)Lane: alpha flywheel #1272 lane 6.
Wire change
backends::load_gguf_backendforqwen3/qwen35architecture now returns a typed error ("Qwen3.5 GGUF routing through the Candle backend was removed in #1273. Use LlamaCppAdapter"). Callers in dead code paths (ContinuumModel,CandleAdapter) get a loud-fail rather than a silent dispatch to deleted code.Test plan
cargo check --features metal— clean (0 errors, 61 pre-existing warnings)cargo test --lib --features metal -- --test-threads=1— 2096 passed, 0 failed (4 more than baseline; vendored qwen35 module removal cleared dead-code warnings that suppressed test discovery)🤖 Generated with Claude Code