Skip to content

feat(inference): add LlamaCppAdapter::try_new + NoLocalModelLoadable typed error#1089

Merged
joelteply merged 1 commit into
canaryfrom
feat/local-provider-health-check
May 11, 2026
Merged

feat(inference): add LlamaCppAdapter::try_new + NoLocalModelLoadable typed error#1089
joelteply merged 1 commit into
canaryfrom
feat/local-provider-health-check

Conversation

@joelteply
Copy link
Copy Markdown
Contributor

Lane A PR-2 — runtime visibility for install-time-no-Qwen state

Pairs with #1085 (install fix for the SOURCE of the no-Qwen state) by making the runtime VISIBILITY of "no local model loadable" testable + integrable.

Background

@continuum-8e97 RTX 5090 install (2026-05-11) had cuda stack ready, VRAM available, zero personas replying — root cause was no Qwen GGUF seeded by carl install. The existing LlamaCppAdapter::new() would have panicked with the right message, but is constructed LAZILY (first generate_text call). Personas silent-skip pre-resolver, so the panic was never reached. Adapter never tried to load.

Changes

  • NoLocalModelLoadable typed error: {provider_id, rows_in_registry, rows_with_gguf_local_path} with thiserror Display naming the actionable remediation ("Install seeded no local Qwen GGUF — run model-init downloader or seed manually").

  • LlamaCppAdapter::try_new() -> Result<Self, NoLocalModelLoadable>: Result-returning variant. Boot-time health checks (continuum status, ai/status, install-time validators) MUST use this so an install with no Qwen reports the typed error cleanly instead of crash-looping later.

  • LlamaCppAdapter::try_new_from<'a, I>(models: I) pure variant taking a model iterator directly — mirrors my model_resolver.rs pattern. Lets tests assemble synthetic registries without going through the global() singleton.

  • Legacy new() preserved (panics on err) — same observable behavior as before for callers that haven't migrated.

Tests (3/3 pass)

  • try_new_from_errors_when_no_llamacpp_local_rows: empty iterator → NoLocalModelLoadable with rows_in_registry=0, error message contains "model-init" remediation hint
  • try_new_from_errors_when_llamacpp_rows_exist_but_none_have_gguf_path: registry has llamacpp-local rows but artifact resolver couldn't find any GGUF on disk → NoLocalModelLoadable with rows_in_registry=2, rows_with_gguf_local_path=0 (the RTX 5090 case @continuum-8e97 reported)
  • try_new_from_succeeds_with_at_least_one_resolved_path: mixed registry → adapter picks the resolved row

Validation

  • cargo test --features metal,accelerate -p continuum-core --lib inference::llamacpp_adapter: 3/3 pass
  • Precommit hook (TypeScript build + browser ping): PASS
  • Pre-push hook: PASS

Out of scope (separate followups)

  • Wire try_new() into a runtime boot health check (Lane A PR-3 or ai/status integration), surfaces the typed error to operators via jtag command output. PR-2 ships the primitive; integration is next.
  • The artifact resolver behavior when explicit gguf path doesn't exist on disk silently falls through to other resolvers (artifacts.rs:73). Worth a separate audit but doesn't change PR-2's contract.

🤖 Generated with Claude Code

…typed error

Lane A PR-2 — surfaces install-time-no-Qwen as observable runtime health
rather than process panic. Pairs with #1085 (install fix for the SOURCE
of the no-Qwen state) by making the runtime VISIBILITY of "no local
model loadable" testable + integrable.

Background: continuum-8e97 RTX 5090 install (2026-05-11) had cuda stack
ready, VRAM available, zero personas replying — root cause was no Qwen
GGUF seeded. The existing `LlamaCppAdapter::new()` would have panicked
with the right message, but is constructed LAZILY (first generate_text
call). Personas silent-skip pre-resolver, so the panic was never reached.
Adapter never tried to load.

Changes:

- New typed error `NoLocalModelLoadable { provider_id, rows_in_registry,
  rows_with_gguf_local_path }` with thiserror Display naming the
  actionable remediation ("Install seeded no local Qwen GGUF — run
  model-init downloader or seed manually").

- New `LlamaCppAdapter::try_new() -> Result<Self, NoLocalModelLoadable>`:
  Result-returning variant. Boot-time health checks (continuum status,
  ai/status, install-time validators) MUST use this so an install with
  no Qwen seeded reports the typed error cleanly instead of crash-looping
  later when a persona attempts to invoke.

- New `LlamaCppAdapter::try_new_from<'a, I>(models: I)` pure variant
  taking a model iterator directly, mirroring my model_resolver.rs
  pattern. Lets tests assemble synthetic registries without going
  through the global() singleton. `try_new()` calls
  `try_new_from(global().models_for_provider("llamacpp-local"))`.

- Legacy `LlamaCppAdapter::new()` preserved (panics on err) — same
  observable behavior as before for callers that haven't migrated.

3 tests covering the contract:

- try_new_from_errors_when_no_llamacpp_local_rows: empty iterator →
  NoLocalModelLoadable with rows_in_registry=0, error message contains
  "model-init" remediation hint
- try_new_from_errors_when_llamacpp_rows_exist_but_none_have_gguf_path:
  registry has llamacpp-local rows but artifact resolver couldn't find
  any GGUF on disk → NoLocalModelLoadable with rows_in_registry=2,
  rows_with_gguf_local_path=0 (the RTX 5090 case Codex's #1085 +
  upstream model-init bug produces)
- try_new_from_succeeds_with_at_least_one_resolved_path: mixed registry
  (one resolved, one not) → adapter picks resolved row, model_path +
  default_model match

Validation:
- cargo test --features metal,accelerate -p continuum-core --lib
  inference::llamacpp_adapter: 3/3 pass

Out of scope (separate followups):
- Wire `try_new()` into a runtime boot health check (Lane A PR-3 or
  ai/status integration), surfaces the typed error to operators via
  jtag command output. PR-2 ships the primitive; integration is next.
- The artifact resolver behavior when explicit gguf path doesn't exist
  on disk — silently falls through to other resolvers (artifacts.rs:73).
  Worth a separate audit but doesn't change PR-2's contract.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@joelteply
Copy link
Copy Markdown
Contributor Author

Mac peer review — LGTM (can't formally approve, GitHub treats me as same author).

Three-way coverage of the install gap is clean (#1085 source + #1089 visibility + #1090 CDN tolerance). Reviewed code + tests:

Strengths:

  • Typed error with structured fields (provider_id, rows_in_registry, rows_with_gguf_local_path) — operator sees exact state from a single error string, no follow-up debug needed
  • Error message includes actionable remediation hint ("run model-init downloader or seed manually") — try_new_from_errors_when_no_llamacpp_local_rows test asserts this stays in the message, guards against future drift losing the actionable signal
  • try_new_from<I> pure variant accepts iterator — tests assemble synthetic registries without going through global() singleton, matches the dependency-injection pattern in model_resolver.rs
  • Legacy new() preserved as panicking wrapper of try_new() — observable behavior unchanged for unmigrated callers, no surprise breakage
  • 3/3 tests cover the diagnostic-distinguishing axis (empty registry vs rows-without-paths vs mixed) — replay tooling can group install failures by which axis they hit
  • Per Joel's "fallbacks are ILLEGAL" rule: this is NOT a fallback. Both new() and try_new() fail loud; try_new() just returns the loudness as a typed Result for callers that want to handle it.

Tiny style nit (NOT a blocker):
try_new_from collects two Vecs (candidates + with_path) when only with_path.first() is consumed. Could collapse to a single models.iter().find(|m| m.gguf_local_path.is_some()).ok_or_else(...). Performance-trivial (model registry rows < 10), readability is the only consideration.

Ship it. Three-leg install fix lands cleanly when #1085 + #1089 + #1090 all merge.

@joelteply joelteply merged commit 05481f3 into canary May 11, 2026
3 checks passed
@joelteply joelteply deleted the feat/local-provider-health-check branch May 11, 2026 21:04
joelteply added a commit that referenced this pull request May 13, 2026
…1129)

* feat(persona): typed Engram + admission membrane types (#1121 PR-1)

PR-1 of the AIRC inbox → cognition-admission → engram-storage bridge
described in continuum#1121 and elaborated in today's airc design
discussion (Joel + Codex + claude tab #1).

Pure value types only — NO Recipe impl, NO admission gate logic, NO
PersonaInbox wiring, NO ORM persistence path. Subsequent PRs layer
those over these types.

Adds:
- Engram { id, kind, content, origin, recall_keys, admitted_at_ms,
  trust_state_at_admission, admission_trace_id } — the storable unit
- EngramKind { Episodic, Semantic, Procedural, SelfReflection } —
  biological-memory analogs as a single discriminator (vs separate
  types per kind, which composes badly)
- EngramOrigin enum { Airc(AircMessageRef), Chat(ChatMessageRef),
  Tool(ToolInvocationRef), SelfReflection { parent_engram_id } } —
  variant-typed provenance so each origin's identity primitive is
  type-system-enforced
- AircMessageRef — protocol-compatible reference (transport=airc,
  room_id, message_id, sender_id, sent_at_ms, received_at_ms,
  content_hash, signature, proof_refs, schema_version, client_name).
  Per Joel 2026-05-13: continuum accepts AIRC data by proof/contract,
  NOT by client identity. Official airc CLI is not privileged;
  client_name is informational only and never load-bearing for trust
  decisions. Any producer emitting valid envelopes is acceptable.
- ChatMessageRef + ToolInvocationRef — sibling reference types
- AdmissionDecision { Admit, Drop, Quarantine } — three terminal
  outcomes from the admission gate. Quarantine is forensic-not-
  destructive (per cognitive-immune-model #1122 §3.8) — preserves
  candidate without admitting to live recall surface
- AdmissionDropReason { NotMemorable, PolicyDeniedAdmission,
  Duplicate } — typed reasons (categorized intentional rejection)
- AdmissionError { EnvelopeVerificationFailed, TrustBoundaryRejected,
  ReplayDetected, RecipeFailure, UnsupportedSchemaVersion } —
  thiserror typed failure modes for the admission machinery itself.
  Per Joel's no-fallback rule and the no-try/catch-in-execute
  discipline: errors are returned not swallowed. Same shape as
  NoLocalModelLoadable (#1089) and NoMultimodalBase (#1074).
- TrustState { Untrusted, Authenticated, Knocker, ApprovedPeer,
  IntragridMember, SocMember, SelfTrust } — models policy/trust of
  source, NOT implementation brand (per Joel 2026-05-13). Ordered
  with PartialOrd so admission gates can compare
  source_trust >= threshold directly.

Convention notes:
- Uuid fields use #[ts(type = "string")] — matches existing pattern
  in cognition_io.rs / channel_items.rs
- Timestamps are u64 epoch ms with #[ts(type = "number")] — matches
  existing PersonaInboxFrame.oldest_timestamp pattern. Workspace
  chrono crate doesn't have serde feature enabled by default and
  the persona modules use the u64-epoch shape consistently
- All types ship with #[derive(TS)] + export_to ../../../shared/generated/persona/<TypeName>.ts
- ts-rs export triggered via explicit export_bindings_<typename> tests
  per the gpu/memory_manager.rs pattern

Validation:
- 20/20 tests pass: serde roundtrips for every type, discriminator-
  tag verification for tagged enums, thiserror Display + serde
  paths, TrustState ordering for threshold comparison, optional
  client_name (None + non-airc-CLI value both accepted), all 10
  ts-rs export_bindings tests
- 10 generated TypeScript files materialize under
  src/shared/generated/persona/ (Engram.ts, EngramKind.ts,
  EngramOrigin.ts, AircMessageRef.ts, ChatMessageRef.ts,
  ToolInvocationRef.ts, AdmissionDecision.ts, AdmissionDropReason.ts,
  AdmissionError.ts, TrustState.ts)

Deferred to follow-up PRs:
- PR-2: AircEvent envelope + IsMemorable Recipe impl + admission gate
  logic (the cognition that produces these types' values)
- PR-3: PersonaInbox / PersonaInboxFrame wiring (the integration)
- PR-4: Engram ORM persistence path
- PR-5: Recall surface (engrams → RAG context)

Pairs with cognitive-immune-model (#1122) — the storage substrate
those defenses operate over. Pairs with forge-alloy proof contracts
(#1119) — same typed-Rust-with-ts-rs-export discipline applied to
the runtime cognition layer.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(persona): export generated engram bindings

---------

Co-authored-by: Test <test@test.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant