Skip to content

fix(inference,#1280): delete dead Candle adapter chain (Phase 1, ~2500 LOC)#1288

Merged
joelteply merged 1 commit into
canaryfrom
fix/delete-dead-candle-chain-1280
May 15, 2026
Merged

fix(inference,#1280): delete dead Candle adapter chain (Phase 1, ~2500 LOC)#1288
joelteply merged 1 commit into
canaryfrom
fix/delete-dead-candle-chain-1280

Conversation

@joelteply
Copy link
Copy Markdown
Contributor

Summary

Phase 1 of #1280 — delete the Candle inference chain that's been vestigial since the llama.cpp migration. Production routes local inference exclusively through `LlamaCppAdapter`; this PR removes ~2500 LOC that was reachable only from itself or orphaned bin/* files.

Net: 7 files changed, +92 / -2546 LOC.

Deleted

  • `inference/candle_adapter.rs` (1486 LOC) — `CandleAdapter`, never instantiated by `AIProviderModule::register_adapters`
  • `inference/quantized.rs` (287 LOC) — `load_quantized_model` / `load_default_quantized`, only consumer was `candle_adapter.rs`
  • `inference/model.rs` collapsed from 857 → 167 LOC, retaining only `rebuild_with_stacked_lora` (used by `backends/llama_safetensors.rs::CompactLlamaSafetensorsBackend`, test-only, slated for Phase 2)

Wire updates

  • `ai/mod.rs` — drop `pub use crate::inference::CandleAdapter`
  • `inference/mod.rs` — drop `candle_adapter` / `quantized` modules + their re-exports
  • `modules/ai_provider.rs` — drop dead `CandleAdapter` import (was imported but never instantiated)

Contract relocation (audit-flagged risk)

The audit (#1280 (comment)) flagged that the no-CPU-fallback `panic!("...CPU fallback is disabled")` lived in `select_best_device` (deleted with the dead chain). The contract's actual production enforcement was already on llama.cpp: `LlamaCppConfig::default()` sets `n_gpu_layers: -1` (= "all layers on GPU"); llama.cpp's loader hard-fails when no GPU is available.

`tests/no_cpu_fallback_contract.rs` is updated atomically to assert the `n_gpu_layers: -1` invariant in `backends/llamacpp.rs` rather than the deleted panic site:

```rust
#[test]
fn llamacpp_default_config_requires_full_gpu_offload() {
assert!(LLAMACPP_BACKEND_SOURCE.contains("n_gpu_layers: -1"), "...");
}
```

The `ort_providers` and `LlamaCppAdapter::NoLocalModelLoadable` assertions survive unchanged.

Why safe

Per the plasticity reachability audit (#1280 audit comment): plasticity's IPC handlers (`plasticity/{analyze,compact,compress,topology,pipeline}`) work on safetensors files via plasticity's own `compactor::` / `pipeline::` helpers and don't import anything from the deleted Candle chain. Only plasticity's `#[cfg(test)] mod tests` block touches the safetensors backends — those tests are preserved (they go in Phase 2 alongside the safetensors backends).

Mission context

Joel 2026-05-15: "mission to eliminate slop and slowly oxidize this project". This PR is the second-largest single dead-code deletion in the #1262 audit's follow-on series (after #1279's qwen3.5 deletion at 1132 LOC).

Test plan

  • `cargo check --features metal` — clean (52 pre-existing warnings, 0 errors)
  • `cargo test --test no_cpu_fallback_contract --features metal` — 3 passed (new `llamacpp_default_config_requires_full_gpu_offload` green, alongside the surviving ort_providers + NoLocalModelLoadable assertions)
  • `cargo test --lib --features metal -- --test-threads=1` — 2166 passed, 0 failed
  • Precommit (TypeScript + browser ping): PASSED

Phase 2 (deferred)

Delete safetensors backends (`llama_safetensors.rs`, `compact_llama_safetensors.rs`, `qwen2_safetensors.rs`, `llama_gguf.rs`) + vendored Candle modules (`compact_llama`, `quantized_llama`, `qwen2`) + `rebuild_with_stacked_lora` once plasticity's LoRA training infrastructure is migrated to llama.cpp or formally retired. Estimated additional ~2500 LOC.

🤖 Generated with Claude Code

…0 LOC)

Per the plasticity reachability audit on #1280
(#1280 (comment)),
production routes local inference exclusively through `LlamaCppAdapter`.
The Candle-side chain — `CandleAdapter`, `ContinuumModel`,
`select_best_device`, `load_model_by_id`, `quantized.rs::load_*_quantized`,
`backends::generate`, `backends::load_gguf_backend` — was reachable only
through itself or orphaned `bin/*` files. Plasticity's IPC handlers
(`plasticity/{analyze,compact,compress,topology,pipeline}`) work on
safetensors files via plasticity's own helpers and don't touch this
chain.

Deleted:
- `inference/candle_adapter.rs` (1486 LOC)
- `inference/quantized.rs` (287 LOC)
- `inference/model.rs` collapsed from 857 → 167 LOC, retaining only
  `rebuild_with_stacked_lora` (used by `backends/llama_safetensors.rs::CompactLlamaSafetensorsBackend`,
  test-only, slated for Phase 2 deletion alongside the safetensors
  backends once plasticity LoRA training is migrated or retired)

Wire updates:
- `ai/mod.rs`: drop `pub use crate::inference::CandleAdapter` re-export
- `inference/mod.rs`: drop `candle_adapter`/`quantized` modules + their
  re-exports; keep `model::rebuild_with_stacked_lora` only
- `modules/ai_provider.rs`: drop dead `CandleAdapter` import (it was
  imported but never instantiated by `register_adapters`)

Contract relocation (the audit's flagged risk):
The no-CPU-fallback `panic!("...CPU fallback is disabled")` in
`select_best_device` was deleted along with the rest of the dead chain.
The contract's actual production enforcement was already on llama.cpp:
`LlamaCppConfig::default()` sets `n_gpu_layers: -1` (= "all layers on
GPU"), and llama.cpp's loader hard-fails when no GPU is available.
`tests/no_cpu_fallback_contract.rs` is updated atomically to assert the
`n_gpu_layers: -1` invariant in `backends/llamacpp.rs` rather than the
deleted panic site. The `ort_providers` and `LlamaCppAdapter` assertions
survive unchanged.

Net: 7 files changed, +92 / -2546 LOC.

Verified:
- cargo check --features metal: clean (52 pre-existing warnings, 0 errors)
- cargo test --test no_cpu_fallback_contract: 3 passed (new contract
  assertion `llamacpp_default_config_requires_full_gpu_offload` green)
- cargo test --lib --features metal: 2166 passed, 0 failed

Phase 2 (deferred): delete safetensors backends + vendored
qwen2/llama backends + `rebuild_with_stacked_lora` once plasticity's
production reachability allows.

Audit: #1262 (comment)
Mission: Joel 2026-05-15 — "eliminate slop and slowly oxidize this project"

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@joelteply joelteply merged commit 3a34535 into canary May 15, 2026
3 checks passed
@joelteply joelteply deleted the fix/delete-dead-candle-chain-1280 branch May 15, 2026 19:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant