feat(cognition,#1385): generate_response PR-2 — async evaluate_response + cognition/generate-response IPC handler#1390
Merged
Conversation
…se + cognition/generate-response IPC handler Stacks on PR-1 #1388 (pure types + prompt builder + identity-reminder template). PR-2 wires the async path: build_response_messages → adapter.generate_text (existing local Qwen router via global_registry) → result with timing + tokio::time::timeout replacing the TS Promise.race. ## What this ships (PR-2) - `evaluate_response(GenerateResponseRequest) -> Result<GenerateResponseResult, GenerateResponseError>` — async composer. Honors per-request model/temperature/max_tokens/ timeout overrides; defaults match TS (Qwen3.5 / 0.7 / 150 / 180_000ms). - `GenerateResponseError` — typed: NoAdapter, Generation, Timeout. No silent default-on-error; caller picks fail-open vs fail-closed. - `build_response_generation_request(&request, model, start_ms) -> TextGenerationRequest` — pure helper. Pins wire shape (provider="local", response_format=Text, purpose="cognition/generate-response", persona/room attribution). - `result_from_response(response, model, start_ms, end_ms) -> GenerateResponseResult` — pure helper. Trims text, stamps model + timing, populates tokens_used only when total_tokens > 0 (mirrors TS truthiness). - `cognition/generate-response` command arm in CognitionModule. ## Discipline - `tokio::time::timeout` wraps `adapter.generate_text` — clean Timeout variant on the error enum (TS Promise.race equivalent). - Saturating subtraction on response_time_ms — clock-backwards artifact (NTP adjustment mid-call) reports 0, not a wrapped huge u64. - tokens_used = None when provider reports zeros — avoids emitting fake {0,0,0} measurements for providers that don't instrument usage. - response_format=Text (TS default) — local Qwen takes plain text, no JSON-mode constraint. - All constants are documented (DEFAULT_GENERATE_PROVIDER/MODEL/ TEMPERATURE/MAX_TOKENS/TIMEOUT_MS). ## Tests (10 new — full module now 39 passing) build_response_generation_request: - defaults: provider=local, model=Qwen-default, temp=0.7, max=150, response_format=Text, purpose="cognition/generate-response", persona/room attribution, message count - overrides honored (custom model + temp + max) - caller timestamp embedded in identity reminder (time-flow through layers) result_from_response: - trims surrounding whitespace - stamps model + timing - populates tokens when provider reports total > 0 - tokens None when provider reports 0 - response_time saturates clock-backwards GenerateResponseError: - NoAdapter Display carries provider + model - Timeout Display includes duration Full cognition regression: 335/335 pass. ## NOT in this PR - **PR-3**: TS shim — AIDecisionService.generateResponse delegates to RustCoreIPCClient.cognitionGenerateResponse + cognition mixin binding. - **PR-4**: Delete dead TS — buildResponseMessages helper + inline identity-reminder template (~250 LOC removed). Ref: #1385 oxidizer card, #1388 PR-1 (MERGED). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
4 tasks
joelteply
pushed a commit
that referenced
this pull request
May 18, 2026
… TS (PR-4 folded) Stacks on PR-2 #1390 (async evaluate_response + cognition/generate-response IPC handler). AIDecisionService.generateResponse now delegates to RustCoreIPCClient.cognitionGenerateResponse; ~110 LOC of TS prompt assembly + timeout race + token decoding deleted. Mirrors codex's check_redundancy PR-3 #1383 shape (folded PR-4 dead-code delete in). ## What this ships - `AIDecisionService.generateResponse` now a thin shim: - InferenceCoordinator.requestSlot (TS owns slot coordination — platform concern) - client.cognitionGenerateResponse(request) — single IPC call - InferenceCoordinator.releaseSlot - logError + rethrow on failure (no fail-open silent default) - New TS binding method `cognitionGenerateResponse(GenerateResponseRequest) -> Promise<GenerateResponseResult>` in the cognition mixin - `GenerateResponseRequest` + `GenerateResponseResult` re-exported from the generated barrel (already present from PR-1) ## Dead TS deleted (PR-4 folded in) - `private static buildResponseMessages(context)` helper (~115 LOC): system-prompt injection, conversation history with [HH:MM] prefix, hour-gap markers, ~50-line identity-reminder template — all moved to Rust in PR-1. - `import { AIProviderDaemon }` — no longer referenced after both checkRedundancy (#1383) + generateResponse migrations. - `import type { TextGenerationRequest, TextGenerationResponse }` — ditto, only used by deleted helper. - Inline timeout Promise.race code — replaced by Rust-side tokio::time::timeout in PR-2. After this PR, `AIDecisionService.ts` contains only: - evaluateGating (already shim to cognition/should-respond) - checkRedundancy (already shim to cognition/check-redundancy) - generateResponse (now shim to cognition/generate-response) - InferenceCoordinator slot management (TS-owned platform concern) - logging helpers (TS-owned platform concern) ## Discipline - No fail-open path — errors throw, caller decides (consistent with codex's check_redundancy shim pattern). - Cast `context as unknown as RustAIDecisionContext` matches the pattern in cognitionShouldRespond + cognitionCheckRedundancy — TS RAGContext.identity wraps the system prompt; TS already resolves to context.systemPrompt before sending. - Slot coordination explicitly stays TS — that's the seam codex drew with check_redundancy, preserved here. - Token shape preserved: `result.tokensUsed` is `TokenUsage | None`; TS just passes through (Rust already mapped from provider's UsageMetrics, returning None for zero-token providers). ## Stack progress - #1385 PR-1 (pure types + prompt builder + identity-reminder template): #1388 MERGED - #1385 PR-2 (async evaluate_response + IPC handler): #1390 OPEN - #1385 PR-3 (TS shim + dead-TS delete): **this PR** - #1385 PR-4 (dead-TS delete): **folded into this PR** ## Refs - #1385 sub-card - #1388 PR-1 (MERGED) - #1390 PR-2 (in flight) - #1383 codex's check_redundancy PR-3 — same shape - #1248 umbrella Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
joelteply
added a commit
that referenced
this pull request
May 18, 2026
… TS (PR-4 folded) (#1402) Stacks on PR-2 #1390 (async evaluate_response + cognition/generate-response IPC handler). AIDecisionService.generateResponse now delegates to RustCoreIPCClient.cognitionGenerateResponse; ~110 LOC of TS prompt assembly + timeout race + token decoding deleted. Mirrors codex's check_redundancy PR-3 #1383 shape (folded PR-4 dead-code delete in). ## What this ships - `AIDecisionService.generateResponse` now a thin shim: - InferenceCoordinator.requestSlot (TS owns slot coordination — platform concern) - client.cognitionGenerateResponse(request) — single IPC call - InferenceCoordinator.releaseSlot - logError + rethrow on failure (no fail-open silent default) - New TS binding method `cognitionGenerateResponse(GenerateResponseRequest) -> Promise<GenerateResponseResult>` in the cognition mixin - `GenerateResponseRequest` + `GenerateResponseResult` re-exported from the generated barrel (already present from PR-1) ## Dead TS deleted (PR-4 folded in) - `private static buildResponseMessages(context)` helper (~115 LOC): system-prompt injection, conversation history with [HH:MM] prefix, hour-gap markers, ~50-line identity-reminder template — all moved to Rust in PR-1. - `import { AIProviderDaemon }` — no longer referenced after both checkRedundancy (#1383) + generateResponse migrations. - `import type { TextGenerationRequest, TextGenerationResponse }` — ditto, only used by deleted helper. - Inline timeout Promise.race code — replaced by Rust-side tokio::time::timeout in PR-2. After this PR, `AIDecisionService.ts` contains only: - evaluateGating (already shim to cognition/should-respond) - checkRedundancy (already shim to cognition/check-redundancy) - generateResponse (now shim to cognition/generate-response) - InferenceCoordinator slot management (TS-owned platform concern) - logging helpers (TS-owned platform concern) ## Discipline - No fail-open path — errors throw, caller decides (consistent with codex's check_redundancy shim pattern). - Cast `context as unknown as RustAIDecisionContext` matches the pattern in cognitionShouldRespond + cognitionCheckRedundancy — TS RAGContext.identity wraps the system prompt; TS already resolves to context.systemPrompt before sending. - Slot coordination explicitly stays TS — that's the seam codex drew with check_redundancy, preserved here. - Token shape preserved: `result.tokensUsed` is `TokenUsage | None`; TS just passes through (Rust already mapped from provider's UsageMetrics, returning None for zero-token providers). ## Stack progress - #1385 PR-1 (pure types + prompt builder + identity-reminder template): #1388 MERGED - #1385 PR-2 (async evaluate_response + IPC handler): #1390 OPEN - #1385 PR-3 (TS shim + dead-TS delete): **this PR** - #1385 PR-4 (dead-TS delete): **folded into this PR** ## Refs - #1385 sub-card - #1388 PR-1 (MERGED) - #1390 PR-2 (in flight) - #1383 codex's check_redundancy PR-3 — same shape - #1248 umbrella Co-authored-by: Test <test@test.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This was referenced May 18, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Stacks on PR-1 #1388 (pure types + prompt builder + identity-reminder template, MERGED at 872e84a). PR-2 wires the async path:
build_response_messages→adapter.generate_text(existing local Qwen router viaglobal_registry) → result with timing +tokio::time::timeoutreplacing the TSPromise.race.What this ships
evaluate_response(GenerateResponseRequest) -> Result<GenerateResponseResult, GenerateResponseError>— async composer. Honors per-requestmodel/temperature/max_tokens/timeout_msoverrides; defaults match TS (Qwen3.5 / 0.7 / 150 / 180_000ms).GenerateResponseError— typed:NoAdapter,Generation,Timeout. No silent default-on-error; caller picks fail-open vs fail-closed.build_response_generation_request(&request, model, start_ms) -> TextGenerationRequest— pure helper. Pins wire shape (provider="local", response_format=Text, purpose="cognition/generate-response", persona/room attribution).result_from_response(response, model, start_ms, end_ms) -> GenerateResponseResult— pure helper. Trims text, stamps model + timing, populatestokens_usedonly whentotal_tokens > 0(mirrors TS truthiness check on usage object).cognition/generate-responsecommand arm inCognitionModule::handle_command.Discipline
tokio::time::timeoutwrapsadapter.generate_text— cleanTimeoutvariant on error enum (TSPromise.raceequivalent).tokens_used = Nonewhen provider reports zeros — avoids emitting fake{0,0,0}measurements for providers that don't instrument usage.response_format=Text(TS default) — local Qwen takes plain text, no JSON-mode constraint.Tests (10 new — full module now 39 passing)
build_response_generation_request (3): defaults shape (provider/model/temp/max/response_format/purpose/attribution/message count), overrides honored, caller timestamp embedded in identity reminder.
result_from_response (5): trims whitespace, stamps model + timing, populates tokens when provider reports total > 0, tokens None when zero, response_time saturates on clock-backwards.
GenerateResponseError (2): NoAdapter Display carries provider + model, Timeout Display includes duration.
Full cognition regression: 335/335 pass.
NOT in this PR
AIDecisionService.generateResponsedelegates toRustCoreIPCClient.cognitionGenerateResponse+ cognition mixin binding.buildResponseMessageshelper + inline identity-reminder template (~250 LOC removed). After PR-3 + PR-4,AIDecisionService.tsis pure slot-coordination + IPC shim code.Refs
Test plan
cargo test --package continuum-core --lib --features metal,accelerate cognition::generate_response::— 39/39 passcargo test --package continuum-core --lib --features metal,accelerate cognition::— 335/335 pass🤖 Generated with Claude Code