feat(inference): add LlamaCppAdapter::try_new + NoLocalModelLoadable typed error#1089
Merged
Merged
Conversation
…typed error Lane A PR-2 — surfaces install-time-no-Qwen as observable runtime health rather than process panic. Pairs with #1085 (install fix for the SOURCE of the no-Qwen state) by making the runtime VISIBILITY of "no local model loadable" testable + integrable. Background: continuum-8e97 RTX 5090 install (2026-05-11) had cuda stack ready, VRAM available, zero personas replying — root cause was no Qwen GGUF seeded. The existing `LlamaCppAdapter::new()` would have panicked with the right message, but is constructed LAZILY (first generate_text call). Personas silent-skip pre-resolver, so the panic was never reached. Adapter never tried to load. Changes: - New typed error `NoLocalModelLoadable { provider_id, rows_in_registry, rows_with_gguf_local_path }` with thiserror Display naming the actionable remediation ("Install seeded no local Qwen GGUF — run model-init downloader or seed manually"). - New `LlamaCppAdapter::try_new() -> Result<Self, NoLocalModelLoadable>`: Result-returning variant. Boot-time health checks (continuum status, ai/status, install-time validators) MUST use this so an install with no Qwen seeded reports the typed error cleanly instead of crash-looping later when a persona attempts to invoke. - New `LlamaCppAdapter::try_new_from<'a, I>(models: I)` pure variant taking a model iterator directly, mirroring my model_resolver.rs pattern. Lets tests assemble synthetic registries without going through the global() singleton. `try_new()` calls `try_new_from(global().models_for_provider("llamacpp-local"))`. - Legacy `LlamaCppAdapter::new()` preserved (panics on err) — same observable behavior as before for callers that haven't migrated. 3 tests covering the contract: - try_new_from_errors_when_no_llamacpp_local_rows: empty iterator → NoLocalModelLoadable with rows_in_registry=0, error message contains "model-init" remediation hint - try_new_from_errors_when_llamacpp_rows_exist_but_none_have_gguf_path: registry has llamacpp-local rows but artifact resolver couldn't find any GGUF on disk → NoLocalModelLoadable with rows_in_registry=2, rows_with_gguf_local_path=0 (the RTX 5090 case Codex's #1085 + upstream model-init bug produces) - try_new_from_succeeds_with_at_least_one_resolved_path: mixed registry (one resolved, one not) → adapter picks resolved row, model_path + default_model match Validation: - cargo test --features metal,accelerate -p continuum-core --lib inference::llamacpp_adapter: 3/3 pass Out of scope (separate followups): - Wire `try_new()` into a runtime boot health check (Lane A PR-3 or ai/status integration), surfaces the typed error to operators via jtag command output. PR-2 ships the primitive; integration is next. - The artifact resolver behavior when explicit gguf path doesn't exist on disk — silently falls through to other resolvers (artifacts.rs:73). Worth a separate audit but doesn't change PR-2's contract. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Contributor
Author
|
Mac peer review — LGTM (can't formally approve, GitHub treats me as same author). Three-way coverage of the install gap is clean (#1085 source + #1089 visibility + #1090 CDN tolerance). Reviewed code + tests: Strengths:
Tiny style nit (NOT a blocker): Ship it. Three-leg install fix lands cleanly when #1085 + #1089 + #1090 all merge. |
6 tasks
joelteply
added a commit
that referenced
this pull request
May 13, 2026
…1129) * feat(persona): typed Engram + admission membrane types (#1121 PR-1) PR-1 of the AIRC inbox → cognition-admission → engram-storage bridge described in continuum#1121 and elaborated in today's airc design discussion (Joel + Codex + claude tab #1). Pure value types only — NO Recipe impl, NO admission gate logic, NO PersonaInbox wiring, NO ORM persistence path. Subsequent PRs layer those over these types. Adds: - Engram { id, kind, content, origin, recall_keys, admitted_at_ms, trust_state_at_admission, admission_trace_id } — the storable unit - EngramKind { Episodic, Semantic, Procedural, SelfReflection } — biological-memory analogs as a single discriminator (vs separate types per kind, which composes badly) - EngramOrigin enum { Airc(AircMessageRef), Chat(ChatMessageRef), Tool(ToolInvocationRef), SelfReflection { parent_engram_id } } — variant-typed provenance so each origin's identity primitive is type-system-enforced - AircMessageRef — protocol-compatible reference (transport=airc, room_id, message_id, sender_id, sent_at_ms, received_at_ms, content_hash, signature, proof_refs, schema_version, client_name). Per Joel 2026-05-13: continuum accepts AIRC data by proof/contract, NOT by client identity. Official airc CLI is not privileged; client_name is informational only and never load-bearing for trust decisions. Any producer emitting valid envelopes is acceptable. - ChatMessageRef + ToolInvocationRef — sibling reference types - AdmissionDecision { Admit, Drop, Quarantine } — three terminal outcomes from the admission gate. Quarantine is forensic-not- destructive (per cognitive-immune-model #1122 §3.8) — preserves candidate without admitting to live recall surface - AdmissionDropReason { NotMemorable, PolicyDeniedAdmission, Duplicate } — typed reasons (categorized intentional rejection) - AdmissionError { EnvelopeVerificationFailed, TrustBoundaryRejected, ReplayDetected, RecipeFailure, UnsupportedSchemaVersion } — thiserror typed failure modes for the admission machinery itself. Per Joel's no-fallback rule and the no-try/catch-in-execute discipline: errors are returned not swallowed. Same shape as NoLocalModelLoadable (#1089) and NoMultimodalBase (#1074). - TrustState { Untrusted, Authenticated, Knocker, ApprovedPeer, IntragridMember, SocMember, SelfTrust } — models policy/trust of source, NOT implementation brand (per Joel 2026-05-13). Ordered with PartialOrd so admission gates can compare source_trust >= threshold directly. Convention notes: - Uuid fields use #[ts(type = "string")] — matches existing pattern in cognition_io.rs / channel_items.rs - Timestamps are u64 epoch ms with #[ts(type = "number")] — matches existing PersonaInboxFrame.oldest_timestamp pattern. Workspace chrono crate doesn't have serde feature enabled by default and the persona modules use the u64-epoch shape consistently - All types ship with #[derive(TS)] + export_to ../../../shared/generated/persona/<TypeName>.ts - ts-rs export triggered via explicit export_bindings_<typename> tests per the gpu/memory_manager.rs pattern Validation: - 20/20 tests pass: serde roundtrips for every type, discriminator- tag verification for tagged enums, thiserror Display + serde paths, TrustState ordering for threshold comparison, optional client_name (None + non-airc-CLI value both accepted), all 10 ts-rs export_bindings tests - 10 generated TypeScript files materialize under src/shared/generated/persona/ (Engram.ts, EngramKind.ts, EngramOrigin.ts, AircMessageRef.ts, ChatMessageRef.ts, ToolInvocationRef.ts, AdmissionDecision.ts, AdmissionDropReason.ts, AdmissionError.ts, TrustState.ts) Deferred to follow-up PRs: - PR-2: AircEvent envelope + IsMemorable Recipe impl + admission gate logic (the cognition that produces these types' values) - PR-3: PersonaInbox / PersonaInboxFrame wiring (the integration) - PR-4: Engram ORM persistence path - PR-5: Recall surface (engrams → RAG context) Pairs with cognitive-immune-model (#1122) — the storage substrate those defenses operate over. Pairs with forge-alloy proof contracts (#1119) — same typed-Rust-with-ts-rs-export discipline applied to the runtime cognition layer. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(persona): export generated engram bindings --------- Co-authored-by: Test <test@test.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Lane A PR-2 — runtime visibility for install-time-no-Qwen state
Pairs with #1085 (install fix for the SOURCE of the no-Qwen state) by making the runtime VISIBILITY of "no local model loadable" testable + integrable.
Background
@continuum-8e97 RTX 5090 install (2026-05-11) had cuda stack ready, VRAM available, zero personas replying — root cause was no Qwen GGUF seeded by carl install. The existing
LlamaCppAdapter::new()would have panicked with the right message, but is constructed LAZILY (first generate_text call). Personas silent-skip pre-resolver, so the panic was never reached. Adapter never tried to load.Changes
NoLocalModelLoadabletyped error:{provider_id, rows_in_registry, rows_with_gguf_local_path}withthiserrorDisplay naming the actionable remediation ("Install seeded no local Qwen GGUF — run model-init downloader or seed manually").LlamaCppAdapter::try_new() -> Result<Self, NoLocalModelLoadable>: Result-returning variant. Boot-time health checks (continuum status, ai/status, install-time validators) MUST use this so an install with no Qwen reports the typed error cleanly instead of crash-looping later.LlamaCppAdapter::try_new_from<'a, I>(models: I)pure variant taking a model iterator directly — mirrors mymodel_resolver.rspattern. Lets tests assemble synthetic registries without going through theglobal()singleton.Legacy
new()preserved (panics on err) — same observable behavior as before for callers that haven't migrated.Tests (3/3 pass)
try_new_from_errors_when_no_llamacpp_local_rows: empty iterator →NoLocalModelLoadablewithrows_in_registry=0, error message contains"model-init"remediation hinttry_new_from_errors_when_llamacpp_rows_exist_but_none_have_gguf_path: registry has llamacpp-local rows but artifact resolver couldn't find any GGUF on disk →NoLocalModelLoadablewithrows_in_registry=2,rows_with_gguf_local_path=0(the RTX 5090 case @continuum-8e97 reported)try_new_from_succeeds_with_at_least_one_resolved_path: mixed registry → adapter picks the resolved rowValidation
cargo test --features metal,accelerate -p continuum-core --lib inference::llamacpp_adapter: 3/3 passOut of scope (separate followups)
try_new()into a runtime boot health check (Lane A PR-3 orai/statusintegration), surfaces the typed error to operators via jtag command output. PR-2 ships the primitive; integration is next.artifacts.rs:73). Worth a separate audit but doesn't change PR-2's contract.🤖 Generated with Claude Code