feat(inference): inference-llm PR-3b — InferenceLlmModule auto-publishes via bus hook (pure Rust) by joelteply · Pull Request #1393 · CambrianTech/continuum

joelteply · 2026-05-18T17:01:47Z

Summary

PR-3b of inference-llm. Wires the bus helpers from PR-3a (#1392) INTO InferenceLlmModule::handle_command so every successful inference response auto-publishes InferenceComplete + FirstTokenEmitted to the trace bus.

Closes the inference-llm bus loop end-to-end, fully in Rust:

producer (command) → engine (stub for now) → response (CommandResult)
                                            ↓
                              bus dispatch (complete + first_token)
                                            ↓
                              subscribers (sentinel / VDD / audit-recorder)

Zero TS. Zero shim. Zero fallback.

What lands

BusHook private struct holding Arc<MessageBus> + Arc<ModuleRegistry>. Same shape as my genome BusHook (feat(genome): working-set-manager PR-5 — LocalWorkingSetManager auto-publishes via bus hook (+MessageBus borrow fix) #1362).
InferenceLlmModule.bus_hook: Option<BusHook> — None = bus-less PR-2 behavior; Some = auto-publish.
with_bus(bus, registry) constructor — wires both Arcs at module construction (prevents the "bus added mid-service" race).
handle_request body: on success, spawns publish_inference_complete + publish_first_token_emitted into the current tokio runtime via Handle::try_current. Spawn pattern (not await) avoids the DashMap borrow-across-await lifetime issue inside Send-bounded async_trait — same workaround as my genome LocalWorkingSetManager (feat(genome): working-set-manager PR-5 — LocalWorkingSetManager auto-publishes via bus hook (+MessageBus borrow fix) #1362).
spawn_publish_inference_complete + spawn_publish_first_token_emitted module-private helpers.

Design choices

Publishing is best-effort observability. Authoritative response goes back through CommandResult regardless of publish success.
Error paths do NOT publish. Unknown command + invalid payload silent on the bus — typed Err in Result is the authoritative signal. Tests pin this.
Two separate spawns (one per event) rather than one bundled publish. Lets subscribers see first_token even if complete hasn't dispatched (race-tolerant TTFT observability).

Test plan

cargo test --lib --features metal,accelerate inference::llm_module_service — 12/12 pass:
- 4 new PR-3b bus tests:
  - handle_command_with_bus_auto_publishes_complete_and_first_token (end-to-end)
  - handle_command_without_bus_does_not_publish (backwards-compat with PR-2)
  - handle_command_unknown_with_bus_does_not_publish (error path silent)
  - handle_command_invalid_payload_with_bus_does_not_publish (error path silent)
- 8 PR-2 tests still pass unchanged
No regressions across other 2957 lib tests

Stack

feat(inference): inference-llm PR-1 — typed event surface (MODULE-CATALOG §II) #1387 — inference-llm PR-1: typed event surface
feat(inference): inference-llm PR-2 — InferenceLlmModule ServiceModule impl (stub-backed) #1391 — inference-llm PR-2: ServiceModule impl (stub-backed)
feat(inference): inference-llm PR-3a — canonical ArtifactKeys + publishing helpers #1392 — inference-llm PR-3a: bus keys + publishing helpers
This PR — inference-llm PR-3b: auto-publish wiring
NEXT — PR-4: real LlamaCppAdapter invoke (the stub stays in place; PR-4 swaps under the same external contract)

🤖 Generated with Claude Code

…hes via bus hook PR-3b of inference-llm. Wires the bus helpers from PR-3a (#1392) INTO InferenceLlmModule's handle_command so every successful inference response auto-publishes InferenceComplete + FirstTokenEmitted to the trace bus. Closes the inference-llm bus loop: producer (command) → engine (stub for now) → response (CommandResult) → bus dispatch (complete + first_token) → subscriber (sentinel/VDD/audit). What lands - BusHook private struct: { bus: Arc<MessageBus>, registry: Arc<ModuleRegistry> }. Same shape as genome::local_manager BusHook (#1362). - InferenceLlmModule.bus_hook: Option<BusHook> — None = bus-less PR-2 behavior; Some = auto-publish on every successful handle_command. - with_bus(bus, registry) constructor — wires both Arcs at module construction; no in-flight switching (prevents the "bus added mid-service" race). - handle_request body: on success, spawns publish_inference_complete and publish_first_token_emitted into the current tokio runtime via Handle::try_current. Spawn pattern (not await) avoids the DashMap borrow-across-await lifetime issue inside Send-bounded async_trait — same workaround as my genome LocalWorkingSetManager (#1362). - spawn_publish_inference_complete + spawn_publish_first_token_emitted module-private helpers — Arcs cloned out before spawn so the &BusHook borrow doesn't outlive the spawn. Design choices - Publishing is best-effort observability. The authoritative response goes back through the CommandResult arm regardless of publish success — callers who need to know if a generation happened look at the Result, not the bus. - Error paths (unknown command + invalid payload) do NOT publish. Tests pin this — bus events represent successful generations; errors are loud in the Result and silent on the bus. - Two separate spawns (one per event) rather than one bundled publish. Lets subscribers see first_token even if the complete event hasn't dispatched yet (race-tolerant TTFT observability). Tests 4 new bus tests (12 total): - handle_command_with_bus_auto_publishes_complete_and_first_token — end-to-end: register subscriber, run handle_command, yield for spawn, verify both events landed with matching requestId - handle_command_without_bus_does_not_publish — backwards-compat with PR-2 new() constructor - handle_command_unknown_with_bus_does_not_publish — error paths silent on bus - handle_command_invalid_payload_with_bus_does_not_publish — same invariant 12/12 pass on inference::llm_module_service. No regressions across other 2957 lib tests. Stack - #1387 — inference-llm PR-1: typed event surface - #1391 — inference-llm PR-2: ServiceModule impl (stub-backed) - #1392 — inference-llm PR-3a: bus keys + publishing helpers - THIS PR — inference-llm PR-3b: auto-publish wiring - NEXT — PR-4: real LlamaCppAdapter invoke + tokenizer + streaming (the stub stays in place until then; PR-4 swaps under the same external contract) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…n layer + new constructors) (#1395) Bridges the substrate's typed InferenceRequest/InferenceComplete surface to the existing AIProviderAdapter trait (LlamaCppAdapter for local llama.cpp). PR-5 ships the LlamaCppAdapter Runtime wiring + the end-to-end stub-adapter test; PR-4 ships the translation logic + new constructors so PR-5 is just plumbing. What lands - InferenceRequest.prompt_text: Option<String> — PR-4 wire addition for adapter-based engines that tokenize internally. Backwards-compat (Option = optional on wire). - InferenceComplete.completion_text: Option<String> — wire addition for adapter-based engines that return text not tokens. - InferenceLlmModule.adapter: Option<Arc<dyn AIProviderAdapter>>. - with_adapter(adapter) constructor: real-inference + no bus. - with_bus_and_adapter(bus, registry, adapter) constructor: the full production wiring (adapter + bus publishing). - handle_request: routes via adapter when wired + prompt_text present; refuses loud when adapter wired + no prompt_text (raw- token path not yet implemented — never silent fallback); falls back to PR-2 stub when no adapter. - run_adapter_inference(adapter, request, prompt_text) — translates InferenceRequest → TextGenerationRequest, calls adapter, translates TextGenerationResponse → (InferenceComplete, FirstTokenEmitted). - translate_adapter_response(request, response) — pure-function body of the response-side translation. - translate_adapter_finish_reason(adapter_reason) — cross-enum mapping: Stop→Stop, Length→MaxTokens, ToolUse→Error{reason} (loud refusal — inference-llm doesn't model tool-use), Error→ Error{reason}. Wire-shape decisions - max_tokens=0 in substrate's GenerationBudget translates to None on adapter's wire. Substrate convention: 0=unlimited, caller takes duration responsibility. Adapter convention: None=unlimited, 0=stop immediately. The substrate's "stop immediately" doesn't have an encoding because no caller would ask for it. - stop_sequences: empty Vec on substrate translates to None on adapter (adapter convention: None = no caller stop sequences). - persona_id propagates to adapter as stringified UUID for per-persona resource attribution (matches existing adapter convention from PersonaResponseGenerator). - purpose hardcoded "inference-llm" for adapter routing diagnostics. Sub-fix: missing TS bindings from PR-1 PR-1 (#1387) shipped the Rust types but the shared/generated/inference_llm/ directory of TS exports wasn't included in the commit (regen produced them locally; they didn't get staged). PR-4 ships all 10 TS files + the barrel index. Closes a wire-contract gap. Tests 13 new behavioral tests (44 total in inference::llm_module + inference::llm_module_service + inference::llm_module_bus): - translate_adapter_response_carries_text_and_usage — completion_text + tokens_generated mapping - translate_finish_reason_covers_all_adapter_variants — cross-enum mapping pin - with_adapter_constructor_routes_via_adapter_path — constructors compile + no-adapter regression - 8 existing PR-2 + 4 existing PR-3b tests still pass (no regressions) End-to-end "stub adapter via Arc<dyn AIProviderAdapter>" tests deferred to PR-5: the AIProviderAdapter trait has 8+ methods (provider_id / api_style / default_model / get_available_models / health_check / model_metadata / capabilities / initialize / shutdown / generate_text / create_embedding) and implementing all of them on a test stub here would pull in ProviderHealth + AdapterCapabilities + ApiStyle + ModelInfo + their dependencies — bigger than atomic-slice. PR-5 will wire LlamaCppAdapter directly through Runtime registration. 44/44 inference::llm_module tests pass. No regressions across other 2928 lib tests. Stack - #1387 — inference-llm PR-1: typed event surface - #1391 — inference-llm PR-2: ServiceModule impl (stub-backed) - #1392 — inference-llm PR-3a: bus keys + publishing helpers - #1393 — inference-llm PR-3b: auto-publish wiring - THIS PR — inference-llm PR-4: adapter integration (translation + constructors) - NEXT — PR-5: LlamaCppAdapter Runtime wiring + end-to-end integration test through real (or test-mock) adapter Co-authored-by: Test <test@test.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Wires InferenceLlmModule into the Runtime so it's callable from the cognition path via inference/llm/request commands. What lands - Add "inference-llm" to EXPECTED_MODULES in runtime/runtime.rs - runtime.register(Arc::new(InferenceLlmModule::new())) in ipc/mod.rs alongside the existing InferenceModule registration Design choices - Constructed via the .new() (bus-less, stub-backed) constructor rather than .with_bus_and_adapter(). Reason: the with_bus_and_adapter constructor requires an AIProviderAdapter Arc, which would couple PR-5's runtime registration to a specific LlamaCppAdapter init lifecycle. The substrate's LlamaCppAdapter is owned by AIProviderModule's adapter registry with its own initialization phase; threading the adapter Arc here would either duplicate the registration or create an init-ordering dependency this slice shouldn't introduce. - The stub-backed registration is still useful: it exposes the inference/llm/request command surface to the cognition path so downstream PRs (turn-execute that chains drain-turn-frame → response_prompt → inference/llm/request) can wire against the real command name. Bus + adapter integration is a follow-up PR that updates the construction call here. What is NOT changed - AIProviderModule + LlamaCppAdapter unchanged - All InferenceLlmModule trait impl logic unchanged (PR-2/3/4 work intact) - The stub vs real-adapter swap point stays exactly where PR-4 put it: with_bus_and_adapter constructor + run_adapter_inference function Tests - cargo build --features metal,accelerate --lib clean (no new test fixtures needed — the module's existing 44/44 tests cover the trait-impl correctness; this PR just plumbs construction into runtime startup) - EXPECTED_MODULES enforcement validates at boot: if the registration is missing the runtime fails with "missing inference-llm" error - Pre-push gate clean Stack - #1387 PR-1: typed event surface - #1391 PR-2: ServiceModule impl (stub-backed) - #1392 PR-3a: bus keys + publishing helpers - #1393 PR-3b: auto-publish wiring - #1395 PR-4: adapter integration (translation + new constructors) - THIS PR — PR-5: Runtime registration - FOLLOW-UP — adapter Arc wiring when LlamaCppAdapter init phase is integrated with Runtime startup Co-authored-by: Test <test@test.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

joelteply merged commit d8c66c1 into canary May 18, 2026
3 checks passed

joelteply deleted the feat/inference-llm-autopub-pr3b branch May 18, 2026 17:02

github-actions Bot added the size: L label May 18, 2026

joelteply mentioned this pull request May 18, 2026

feat(inference): inference-llm PR-4 — adapter integration (translation + new constructors) #1395

Merged

3 tasks

joelteply mentioned this pull request May 18, 2026

feat(inference): inference-llm PR-5 — Runtime registration #1404

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(inference): inference-llm PR-3b — InferenceLlmModule auto-publishes via bus hook (pure Rust)#1393

feat(inference): inference-llm PR-3b — InferenceLlmModule auto-publishes via bus hook (pure Rust)#1393
joelteply merged 1 commit into
canaryfrom
feat/inference-llm-autopub-pr3b

joelteply commented May 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

joelteply commented May 18, 2026

Summary

What lands

Design choices

Test plan

Stack

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant