fix(inference,#1280): delete dead Candle adapter chain (Phase 1, ~2500 LOC) by joelteply · Pull Request #1288 · CambrianTech/continuum

joelteply · 2026-05-15T19:26:20Z

Summary

Phase 1 of #1280 — delete the Candle inference chain that's been vestigial since the llama.cpp migration. Production routes local inference exclusively through `LlamaCppAdapter`; this PR removes ~2500 LOC that was reachable only from itself or orphaned bin/* files.

Net: 7 files changed, +92 / -2546 LOC.

Deleted

`inference/candle_adapter.rs` (1486 LOC) — `CandleAdapter`, never instantiated by `AIProviderModule::register_adapters`
`inference/quantized.rs` (287 LOC) — `load_quantized_model` / `load_default_quantized`, only consumer was `candle_adapter.rs`
`inference/model.rs` collapsed from 857 → 167 LOC, retaining only `rebuild_with_stacked_lora` (used by `backends/llama_safetensors.rs::CompactLlamaSafetensorsBackend`, test-only, slated for Phase 2)

Wire updates

`ai/mod.rs` — drop `pub use crate::inference::CandleAdapter`
`inference/mod.rs` — drop `candle_adapter` / `quantized` modules + their re-exports
`modules/ai_provider.rs` — drop dead `CandleAdapter` import (was imported but never instantiated)

Contract relocation (audit-flagged risk)

The audit (#1280 (comment)) flagged that the no-CPU-fallback `panic!("...CPU fallback is disabled")` lived in `select_best_device` (deleted with the dead chain). The contract's actual production enforcement was already on llama.cpp: `LlamaCppConfig::default()` sets `n_gpu_layers: -1` (= "all layers on GPU"); llama.cpp's loader hard-fails when no GPU is available.

`tests/no_cpu_fallback_contract.rs` is updated atomically to assert the `n_gpu_layers: -1` invariant in `backends/llamacpp.rs` rather than the deleted panic site:

```rust
#[test]
fn llamacpp_default_config_requires_full_gpu_offload() {
assert!(LLAMACPP_BACKEND_SOURCE.contains("n_gpu_layers: -1"), "...");
}
```

The `ort_providers` and `LlamaCppAdapter::NoLocalModelLoadable` assertions survive unchanged.

Why safe

Per the plasticity reachability audit (#1280 audit comment): plasticity's IPC handlers (`plasticity/{analyze,compact,compress,topology,pipeline}`) work on safetensors files via plasticity's own `compactor::` / `pipeline::` helpers and don't import anything from the deleted Candle chain. Only plasticity's `#[cfg(test)] mod tests` block touches the safetensors backends — those tests are preserved (they go in Phase 2 alongside the safetensors backends).

Mission context

Joel 2026-05-15: "mission to eliminate slop and slowly oxidize this project". This PR is the second-largest single dead-code deletion in the #1262 audit's follow-on series (after #1279's qwen3.5 deletion at 1132 LOC).

Test plan

`cargo check --features metal` — clean (52 pre-existing warnings, 0 errors)
`cargo test --test no_cpu_fallback_contract --features metal` — 3 passed (new `llamacpp_default_config_requires_full_gpu_offload` green, alongside the surviving ort_providers + NoLocalModelLoadable assertions)
`cargo test --lib --features metal -- --test-threads=1` — 2166 passed, 0 failed
Precommit (TypeScript + browser ping): PASSED

Phase 2 (deferred)

Delete safetensors backends (`llama_safetensors.rs`, `compact_llama_safetensors.rs`, `qwen2_safetensors.rs`, `llama_gguf.rs`) + vendored Candle modules (`compact_llama`, `quantized_llama`, `qwen2`) + `rebuild_with_stacked_lora` once plasticity's LoRA training infrastructure is migrated to llama.cpp or formally retired. Estimated additional ~2500 LOC.

🤖 Generated with Claude Code

…0 LOC) Per the plasticity reachability audit on #1280 (#1280 (comment)), production routes local inference exclusively through `LlamaCppAdapter`. The Candle-side chain — `CandleAdapter`, `ContinuumModel`, `select_best_device`, `load_model_by_id`, `quantized.rs::load_*_quantized`, `backends::generate`, `backends::load_gguf_backend` — was reachable only through itself or orphaned `bin/*` files. Plasticity's IPC handlers (`plasticity/{analyze,compact,compress,topology,pipeline}`) work on safetensors files via plasticity's own helpers and don't touch this chain. Deleted: - `inference/candle_adapter.rs` (1486 LOC) - `inference/quantized.rs` (287 LOC) - `inference/model.rs` collapsed from 857 → 167 LOC, retaining only `rebuild_with_stacked_lora` (used by `backends/llama_safetensors.rs::CompactLlamaSafetensorsBackend`, test-only, slated for Phase 2 deletion alongside the safetensors backends once plasticity LoRA training is migrated or retired) Wire updates: - `ai/mod.rs`: drop `pub use crate::inference::CandleAdapter` re-export - `inference/mod.rs`: drop `candle_adapter`/`quantized` modules + their re-exports; keep `model::rebuild_with_stacked_lora` only - `modules/ai_provider.rs`: drop dead `CandleAdapter` import (it was imported but never instantiated by `register_adapters`) Contract relocation (the audit's flagged risk): The no-CPU-fallback `panic!("...CPU fallback is disabled")` in `select_best_device` was deleted along with the rest of the dead chain. The contract's actual production enforcement was already on llama.cpp: `LlamaCppConfig::default()` sets `n_gpu_layers: -1` (= "all layers on GPU"), and llama.cpp's loader hard-fails when no GPU is available. `tests/no_cpu_fallback_contract.rs` is updated atomically to assert the `n_gpu_layers: -1` invariant in `backends/llamacpp.rs` rather than the deleted panic site. The `ort_providers` and `LlamaCppAdapter` assertions survive unchanged. Net: 7 files changed, +92 / -2546 LOC. Verified: - cargo check --features metal: clean (52 pre-existing warnings, 0 errors) - cargo test --test no_cpu_fallback_contract: 3 passed (new contract assertion `llamacpp_default_config_requires_full_gpu_offload` green) - cargo test --lib --features metal: 2166 passed, 0 failed Phase 2 (deferred): delete safetensors backends + vendored qwen2/llama backends + `rebuild_with_stacked_lora` once plasticity's production reachability allows. Audit: #1262 (comment) Mission: Joel 2026-05-15 — "eliminate slop and slowly oxidize this project" Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions Bot added the size: XL label May 15, 2026

joelteply merged commit 3a34535 into canary May 15, 2026
3 checks passed

joelteply deleted the fix/delete-dead-candle-chain-1280 branch May 15, 2026 19:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(inference,#1280): delete dead Candle adapter chain (Phase 1, ~2500 LOC)#1288

fix(inference,#1280): delete dead Candle adapter chain (Phase 1, ~2500 LOC)#1288
joelteply merged 1 commit into
canaryfrom
fix/delete-dead-candle-chain-1280

joelteply commented May 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

joelteply commented May 15, 2026

Summary

Deleted

Wire updates

Contract relocation (audit-flagged risk)

Why safe

Mission context

Test plan

Phase 2 (deferred)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant