Skip to content

feat(inference): inference-llm PR-5 — Runtime registration#1404

Merged
joelteply merged 1 commit into
canaryfrom
feat/inference-llm-runtime-register-pr5
May 18, 2026
Merged

feat(inference): inference-llm PR-5 — Runtime registration#1404
joelteply merged 1 commit into
canaryfrom
feat/inference-llm-runtime-register-pr5

Conversation

@joelteply
Copy link
Copy Markdown
Contributor

Summary

PR-5 of inference-llm. Wires InferenceLlmModule into the Runtime so it's callable from the cognition path via inference/llm/request commands.

Pure Rust, zero TS, 20-line diff.

What lands

  • Add "inference-llm" to EXPECTED_MODULES in runtime/runtime.rs
  • runtime.register(Arc::new(InferenceLlmModule::new())) in ipc/mod.rs alongside the existing InferenceModule registration

Design choices

  • Constructed via .new() (bus-less, stub-backed) rather than .with_bus_and_adapter(). Reason: with_bus_and_adapter requires an AIProviderAdapter Arc, which would couple PR-5's runtime registration to a specific LlamaCppAdapter init lifecycle. The substrate's LlamaCppAdapter is owned by AIProviderModule's adapter registry with its own initialization phase; threading the adapter Arc here would either duplicate the registration or create an init-ordering dependency this slice shouldn't introduce.
  • The stub-backed registration is still useful: it exposes the inference/llm/request command surface to the cognition path so downstream PRs (turn-execute chaining drain-turn-frameresponse_promptinference/llm/request) can wire against the real command name. Bus + adapter integration is a follow-up PR that updates the construction call here.

Test plan

  • cargo build --features metal,accelerate --lib clean
  • EXPECTED_MODULES enforcement validates at boot — if the registration is missing the runtime fails with "missing inference-llm" error
  • Pre-push gate clean
  • No new test fixtures needed — the module's existing 44/44 tests cover the trait-impl correctness; this PR just plumbs construction into runtime startup

Stack

🤖 Generated with Claude Code

Wires InferenceLlmModule into the Runtime so it's callable from
the cognition path via inference/llm/request commands.

What lands

- Add "inference-llm" to EXPECTED_MODULES in runtime/runtime.rs
- runtime.register(Arc::new(InferenceLlmModule::new())) in
  ipc/mod.rs alongside the existing InferenceModule registration

Design choices

- Constructed via the .new() (bus-less, stub-backed) constructor
  rather than .with_bus_and_adapter(). Reason: the
  with_bus_and_adapter constructor requires an AIProviderAdapter
  Arc, which would couple PR-5's runtime registration to a
  specific LlamaCppAdapter init lifecycle. The substrate's
  LlamaCppAdapter is owned by AIProviderModule's adapter registry
  with its own initialization phase; threading the adapter Arc
  here would either duplicate the registration or create an
  init-ordering dependency this slice shouldn't introduce.
- The stub-backed registration is still useful: it exposes the
  inference/llm/request command surface to the cognition path so
  downstream PRs (turn-execute that chains drain-turn-frame →
  response_prompt → inference/llm/request) can wire against the
  real command name. Bus + adapter integration is a follow-up
  PR that updates the construction call here.

What is NOT changed

- AIProviderModule + LlamaCppAdapter unchanged
- All InferenceLlmModule trait impl logic unchanged (PR-2/3/4
  work intact)
- The stub vs real-adapter swap point stays exactly where PR-4
  put it: with_bus_and_adapter constructor + run_adapter_inference
  function

Tests

- cargo build --features metal,accelerate --lib clean (no new
  test fixtures needed — the module's existing 44/44 tests cover
  the trait-impl correctness; this PR just plumbs construction
  into runtime startup)
- EXPECTED_MODULES enforcement validates at boot: if the registration
  is missing the runtime fails with "missing inference-llm" error
- Pre-push gate clean

Stack

- #1387 PR-1: typed event surface
- #1391 PR-2: ServiceModule impl (stub-backed)
- #1392 PR-3a: bus keys + publishing helpers
- #1393 PR-3b: auto-publish wiring
- #1395 PR-4: adapter integration (translation + new constructors)
- THIS PR — PR-5: Runtime registration
- FOLLOW-UP — adapter Arc wiring when LlamaCppAdapter init phase
  is integrated with Runtime startup

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@joelteply joelteply merged commit 1c0656b into canary May 18, 2026
1 check passed
@joelteply joelteply deleted the feat/inference-llm-runtime-register-pr5 branch May 18, 2026 18:22
joelteply added a commit that referenced this pull request May 18, 2026
…mpt -> inference (#1409) (#1415)

* feat(persona): Lane D — Rust persona/turn-execute chains drain -> prompt -> inference (#1409)

Adds the `persona/turn-execute` command in CognitionModule that
executes a full persona turn in ONE Rust hop:

  drain inbox
    -> wrap in PersonaTurnFrame
    -> derive ResponsePrompt (lazy)
    -> build InferenceRequest (prompt_text path)
    -> dispatch `inference/llm/request` via the global
       command_executor (routes to InferenceLlmModule registered
       in PR-5 #1404)
    -> bundle replayRecord + inferenceResponse
    -> persist replay record (v2 schema with response_prompt
       captured from #1412)

Files changed:

* src/persona/turn_frame.rs: new `ResponsePrompt::to_prompt_text`
  helper that flattens system_prompt + chat messages into a single
  deterministic plain-text prompt for adapter-based engines
  (LlamaCppAdapter, cloud adapters). Format:
    "<system>\n\nrole: content\nrole: content\n..."
  Empty system_prompt produces no leading paragraph; lowercase
  role matches the on-the-wire PromptRole serde format.

* src/modules/cognition.rs: new `persona/turn-execute` command.
  Inputs:
    - persona_id (required)
    - window_ms (default 80), max_items (default 16)
    - composition_artifact_id (default Uuid::nil())
    - max_tokens (default 512), max_duration_ms (default 10_000)
  Returns:
    { "replayRecord": PersonaTurnFrameReplayRecord | null,
      "inferenceResponse": InferenceResponse | null }
  Empty drain returns the null pair (no-op, not Err). Missing
  persona returns typed Err per Joel's never-swallow rule.

Tests (+9, all green):

* persona::turn_frame (6 new, total 18):
  - to_prompt_text_renders_each_message_as_role_colon_content
  - to_prompt_text_prepends_system_prompt_when_present
  - to_prompt_text_skips_empty_system_prompt
  - to_prompt_text_handles_mixed_roles_in_order
  - to_prompt_text_handles_no_messages
  - to_prompt_text_empty_prompt_returns_empty_string

* modules::cognition::turn_execute_tests (3 new):
  - turn_execute_persona_not_found_returns_typed_error
  - turn_execute_empty_drain_returns_null_bundle
  - turn_execute_bad_max_items_returns_typed_error

The dispatch-success path (drain -> dispatch -> inference response)
runs through `command_executor::executor()` which is only
initialized at runtime startup (ipc/mod.rs). Tests that exercise
the executor live in the integration suite; unit-tests here cover
the param-parse + short-circuit + persona-not-found paths.

Builds atop #1412 (v2 schema with response_prompt) and #1404
(InferenceLlmModule runtime registration). Closes alpha card
#1409.

Why one command: the TS persona loop previously executed each
stage with its own IPC round-trip (drain, then build prompt,
then call inference) — 3 round-trips per turn, prompt-building
lived in TS. Lane D pulls all three into the substrate so
(a) the prompt is built in Rust where the turn-frame lives,
(b) the production replay record carries the exact prompt that
fed inference, (c) the persona turn becomes one observable unit
on the bus.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(persona,#1409): force turn-execute through Rust registry (#1417)

* fix(persona,#1409): force turn-execute through Rust registry

* fix(runtime,#1409): use unlimited concurrency contract for cognition

---------

Co-authored-by: Test <test@test.com>

---------

Co-authored-by: Test <test@test.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant