Skip to content

fix(inference,#1273): delete dead Candle Qwen3.5 GGUF backend (~1100 LOC)#1279

Merged
joelteply merged 1 commit into
canaryfrom
fix/delete-dead-candle-qwen-path-1273
May 15, 2026
Merged

fix(inference,#1273): delete dead Candle Qwen3.5 GGUF backend (~1100 LOC)#1279
joelteply merged 1 commit into
canaryfrom
fix/delete-dead-candle-qwen-path-1273

Conversation

@joelteply
Copy link
Copy Markdown
Contributor

Summary

Delete the Candle-side Qwen3.5 inference path:

  • inference/backends/qwen35_gguf.rsQwen35GgufBackend ModelBackend wrapper (194 LOC)
  • inference/vendored/quantized_qwen35.rs — vendored hybrid DeltaNet + Attention recurrence loop (919 LOC)
  • Module declarations in backends/mod.rs and vendored/mod.rs
  • Replace the qwen3/qwen35 arm of backends::load_gguf_backend with a typed error pointing callers at LlamaCppAdapter

Net: +14 / -1132 LOC.

Why it was dead

AIProviderModule::register_adapters (modules/ai_provider.rs:221) only registers LlamaCppAdapter for local inference. CandleAdapter is imported but never instantiated. Qwen35GgufBackend was only reachable via backends::load_gguf_backend, whose only callers were unregistered (CandleAdapter, ContinuumModel, bin/* debug utilities) — none in the production hot path. Production Qwen3.5 chat goes through llama.cpp via LlamaCppAdapterLlamaCppBackend.

Scope-down from initial #1273 plan

The original plan (#1273 (comment)) was to delete the entire Candle inference chain (CandleAdapter, ContinuumModel, quantized.rs, vendored qwen2/llama backends). cargo check confirmed broader scope is entangled with plasticity LoRA training tests, which use compact_llama_safetensors + model::rebuild_with_stacked_lora. That broader deletion needs a separate audit of plasticity's production reachability and is deferred to a follow-up card.

This PR keeps everything plasticity touches (model.rs, candle_adapter.rs, quantized.rs, llama_safetensors.rs, compact_llama_safetensors.rs, vendored qwen2/llama) and only deletes the Qwen3.5-specific Candle code that has no plasticity dependency.

Audit context

This is PR 2 of 4 for #1262 (silent CPU model fallbacks audit). Audit findings:
#1262 (comment)

Sibling PRs:

Lane: alpha flywheel #1272 lane 6.

Wire change

backends::load_gguf_backend for qwen3/qwen35 architecture now returns a typed error ("Qwen3.5 GGUF routing through the Candle backend was removed in #1273. Use LlamaCppAdapter"). Callers in dead code paths (ContinuumModel, CandleAdapter) get a loud-fail rather than a silent dispatch to deleted code.

Test plan

  • cargo check --features metal — clean (0 errors, 61 pre-existing warnings)
  • cargo test --lib --features metal -- --test-threads=1 — 2096 passed, 0 failed (4 more than baseline; vendored qwen35 module removal cleared dead-code warnings that suppressed test discovery)
  • Precommit (TypeScript + browser ping): PASSED

🤖 Generated with Claude Code

Remove the Candle-side Qwen3.5 inference path (the hybrid DeltaNet +
Attention recurrence loop in vendored/quantized_qwen35.rs and its
ModelBackend wrapper in backends/qwen35_gguf.rs). 1100+ LOC removed.

Why it was dead:
- AIProviderModule::register_adapters (modules/ai_provider.rs:221) only
  registers LlamaCppAdapter for local inference. CandleAdapter is
  imported but never instantiated.
- Qwen35GgufBackend was only reachable via backends::load_gguf_backend,
  whose only callers were unregistered (CandleAdapter, ContinuumModel,
  bin/* utilities) — none in the production hot path.
- Production Qwen3.5 chat goes through llama.cpp (vendored,
  statically linked) via LlamaCppAdapter → LlamaCppBackend.

Scope-down from initial #1273 plan:
The original plan was to delete the entire Candle inference chain
(CandleAdapter, ContinuumModel, quantized.rs, vendored qwen2/llama
backends). cargo check confirmed broader scope is entangled with
plasticity LoRA training tests, which use compact_llama_safetensors
+ rebuild_with_stacked_lora. That broader deletion needs a separate
audit of plasticity's production reachability and is deferred to a
follow-up card.

This PR keeps everything plasticity touches (model.rs,
candle_adapter.rs, quantized.rs, llama_safetensors.rs,
compact_llama_safetensors.rs, vendored qwen2/llama) and only deletes
the qwen3.5-specific Candle path that has no plasticity dependency.

Wire change:
- backends::load_gguf_backend now returns a typed error for
  "qwen3"|"qwen35" architectures pointing callers at LlamaCppAdapter,
  rather than silently dispatching to the deleted Candle backend.

Verified:
- cargo check --features metal: clean (0 errors, 61 pre-existing warnings)
- cargo test --lib --features metal: 2096 passed, 0 failed (4 more than
  baseline — vendored qwen35 module registration removed some dead-code
  warnings that were eating test discovery)

Lane: alpha flywheel #1272 lane 6.
Audit context: #1262 (comment)
Verification: #1273 (comment)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@joelteply joelteply merged commit 4180fe1 into canary May 15, 2026
3 checks passed
@joelteply joelteply deleted the fix/delete-dead-candle-qwen-path-1273 branch May 15, 2026 17:40
joelteply added a commit that referenced this pull request May 16, 2026
… document map; lane status truth-up (#1316)

* docs(alpha): refresh status against 2026-05-16 canary

Three changes to ALPHA-GAP-ANALYSIS.md:

1. Header date 2026-05-13 -> 2026-05-16. Add explicit cross-link to
   CBAR-SUBSTRATE-ARCHITECTURE.md as the runtime substrate spec.

2. Restructure the Document Map (was a flat list) into categorized
   references (Runtime substrate / Cognition migration / Memory paging /
   Model registry / Grid), and add the precedence rule: if any supporting
   doc disagrees with ALPHA-GAP on the substrate contract (concurrency,
   scheduling, memory, pressure, telemetry, artifact handles), defer to
   CBAR-SUBSTRATE-ARCHITECTURE.md.

3. Refresh the Current Snapshot table against canary @ 2026-05-16:
   - Rust core row reflects the PressureBroker bootstrap stack
     (#1307 / #1308 / #1310), runtime lease broker (#1313), cognition
     oxidization (#1284 / #1290 / #1291 / #1293 / #1298 / #1301 / #1303
     / #1292), dead-Candle deletes (#1277 / #1279 / #1281 / #1288), and
     the inference-grpc fail-closed (#1314). GRID-INFERENCE-ROUTING
     PR-1 announcer in flight on feat/grid-inference-routing-pr2-announcer.
   - Node/TS row notes net-negative trend (~2500 LOC TS deleted via the
     8-PR cognition stacks).
   - Docker row records Docker tier Phase 1 (#1297).
   - Config row records SQLite-first default (#1271).
   - Tests row records the no-CPU-fallback contract gap: the existing
     regression test in workers/continuum-core covers llama.cpp / ORT
     only, not the Candle-side paths where the orpheus + inference-grpc
     fallbacks lived before #1314.

* docs(alpha): refresh lane status table and immediate-next-actions

Two updates to ALPHA-GAP-ANALYSIS.md:

1. Lane status table now reflects actual state @ 2026-05-16, not aspiration:
   - Lane A: in progress, model_registry/ exists with admission resolver.
   - Lane B: Phase 1 landed (#1297 docker-tier-stats); GPU profile +
     tier-pool eviction (#1238 / #1239) still open.
   - Lane C: structured RuntimeMetric emits from inference paths;
     vdd-report-command not yet bound.
   - Lane D: UNSTARTED — flagged as the highest-leverage open lane because
     Lane E (PressureBroker) and the inbox coalescing pattern both
     presuppose RuntimeFrame / CognitionTurnFrame.
   - Lane E: bootstrap landed (#1307 / #1308 / #1310 / #1313); paging and
     pre-broker concurrency-hack deletion remain. Concrete deletion target
     called out: get_num_workers() in inference-grpc/main.rs, which reads
     INFERENCE_WORKERS from config.env and otherwise picks worker count
     from system memory at startup — both branches violate the
     "we do not hard code" / "dynamic, broker-owned concurrency" rule.
   - Lane F: ~2500 LOC TS deleted manually this session; mechanical CI
     ratchet still not landed (deletion is reversible until it is).
   - Lane G: refresh in flight on joel/docs-alpha-refresh.

   Adds an "adjacent active workstream" note for GRID-INFERENCE-ROUTING
   (PR-1 announcer + probe + registry in flight on
   feat/grid-inference-routing-pr2-announcer) as the grid-side counterpart
   to Lane A.

2. Immediate Next Actions reordered by alpha leverage, not by who is
   online. Top three items are Lane D claim, the universal-trait "for free"
   triplet (RuntimeModule base trait + derive macro + scaffold generator
   from CBAR-SUBSTRATE-ARCHITECTURE.md), and the
   get_num_workers() deletion. Adds the Lane C VDD report command and the
   widening of no_cpu_fallback_contract.rs to cover Candle paths. Adds
   doc-refresh follow-ups so each supporting doc gets cross-linked back
   into the Document Map.

* docs(alpha): add Lane H + GENOME-FOUNDRY-SENTINEL cross-links

Three updates to ALPHA-GAP-ANALYSIS.md following continuum#1327:

1. Lane H added to the lane status table: Substrate governor + tiered
   genome cache. Sibling to Lane E (broker owns admission; governor
   owns sizing). 7-PR implementation sequence detailed in
   GENOME-FOUNDRY-SENTINEL.md Part 13. Currently Proposed, needs owner
   claim.

2. Lane claim update at end of the lane discussion: Lane H proposed
   via continuum#1327 with full design pinned to that doc; sibling to
   Lane E with the boundary stated explicitly.

3. Document Map gets GENOME-FOUNDRY-SENTINEL entry under "Runtime
   substrate (load-bearing)" — the artifact-sharing economy on top of
   the CBAR substrate. Tiered genome cache, page faults, foundry as JIT,
   sentinel-AI as profile-guided optimizer, demand-aligned recall,
   composer + speculator, SubstrateGovernor (DVFS).

4. Immediate Next Actions step 9 added: claim Lane H. Step 10 (formerly
   step 9) updated to reflect what's landed in this doc batch
   (CBAR-SUBSTRATE refinement via #1324, CONTINUUM-ARCHITECTURE refresh
   via #1317, CONTINUUM-VISION refresh via #1320, GENOME-FOUNDRY-SENTINEL
   via #1327) and what's next (CLAUDE.md substrate pointer; stale-section
   deprecations in UNIVERSAL-SENSORY / LEARNING / QUEUE-DRIVEN-COGNITION).

---------

Co-authored-by: Test <test@test.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant