Skip to content

feat(cognition,#1385): generate_response PR-2 — async evaluate_response + cognition/generate-response IPC handler#1390

Merged
joelteply merged 1 commit into
canaryfrom
feat/oxidizer-generate-response-pr2
May 18, 2026
Merged

feat(cognition,#1385): generate_response PR-2 — async evaluate_response + cognition/generate-response IPC handler#1390
joelteply merged 1 commit into
canaryfrom
feat/oxidizer-generate-response-pr2

Conversation

@joelteply
Copy link
Copy Markdown
Contributor

Summary

Stacks on PR-1 #1388 (pure types + prompt builder + identity-reminder template, MERGED at 872e84a). PR-2 wires the async path: build_response_messagesadapter.generate_text (existing local Qwen router via global_registry) → result with timing + tokio::time::timeout replacing the TS Promise.race.

What this ships

  • evaluate_response(GenerateResponseRequest) -> Result<GenerateResponseResult, GenerateResponseError> — async composer. Honors per-request model/temperature/max_tokens/timeout_ms overrides; defaults match TS (Qwen3.5 / 0.7 / 150 / 180_000ms).
  • GenerateResponseError — typed: NoAdapter, Generation, Timeout. No silent default-on-error; caller picks fail-open vs fail-closed.
  • build_response_generation_request(&request, model, start_ms) -> TextGenerationRequest — pure helper. Pins wire shape (provider="local", response_format=Text, purpose="cognition/generate-response", persona/room attribution).
  • result_from_response(response, model, start_ms, end_ms) -> GenerateResponseResult — pure helper. Trims text, stamps model + timing, populates tokens_used only when total_tokens > 0 (mirrors TS truthiness check on usage object).
  • cognition/generate-response command arm in CognitionModule::handle_command.

Discipline

  • tokio::time::timeout wraps adapter.generate_text — clean Timeout variant on error enum (TS Promise.race equivalent).
  • Saturating subtraction on response_time_ms — clock-backwards artifact (NTP adjustment mid-call) reports 0, not a wrapped huge u64.
  • tokens_used = None when provider reports zeros — avoids emitting fake {0,0,0} measurements for providers that don't instrument usage.
  • response_format=Text (TS default) — local Qwen takes plain text, no JSON-mode constraint.
  • All constants documented with their TS-default origins.

Tests (10 new — full module now 39 passing)

build_response_generation_request (3): defaults shape (provider/model/temp/max/response_format/purpose/attribution/message count), overrides honored, caller timestamp embedded in identity reminder.

result_from_response (5): trims whitespace, stamps model + timing, populates tokens when provider reports total > 0, tokens None when zero, response_time saturates on clock-backwards.

GenerateResponseError (2): NoAdapter Display carries provider + model, Timeout Display includes duration.

Full cognition regression: 335/335 pass.

NOT in this PR

  • PR-3: TS shim — AIDecisionService.generateResponse delegates to RustCoreIPCClient.cognitionGenerateResponse + cognition mixin binding.
  • PR-4: Delete dead TS — buildResponseMessages helper + inline identity-reminder template (~250 LOC removed). After PR-3 + PR-4, AIDecisionService.ts is pure slot-coordination + IPC shim code.

Refs

Test plan

  • cargo test --package continuum-core --lib --features metal,accelerate cognition::generate_response:: — 39/39 pass
  • cargo test --package continuum-core --lib --features metal,accelerate cognition:: — 335/335 pass
  • Pre-push TS clean, ESLint baseline held, Rust compile clean
  • CI green
  • PR-3 (TS shim) + PR-4 (dead-TS delete) stack on top

🤖 Generated with Claude Code

…se + cognition/generate-response IPC handler

Stacks on PR-1 #1388 (pure types + prompt builder + identity-reminder
template). PR-2 wires the async path: build_response_messages →
adapter.generate_text (existing local Qwen router via global_registry)
→ result with timing + tokio::time::timeout replacing the TS
Promise.race.

## What this ships (PR-2)

- `evaluate_response(GenerateResponseRequest) -> Result<GenerateResponseResult, GenerateResponseError>`
  — async composer. Honors per-request model/temperature/max_tokens/
  timeout overrides; defaults match TS (Qwen3.5 / 0.7 / 150 / 180_000ms).
- `GenerateResponseError` — typed: NoAdapter, Generation, Timeout. No
  silent default-on-error; caller picks fail-open vs fail-closed.
- `build_response_generation_request(&request, model, start_ms) -> TextGenerationRequest`
  — pure helper. Pins wire shape (provider="local", response_format=Text,
  purpose="cognition/generate-response", persona/room attribution).
- `result_from_response(response, model, start_ms, end_ms) -> GenerateResponseResult`
  — pure helper. Trims text, stamps model + timing, populates
  tokens_used only when total_tokens > 0 (mirrors TS truthiness).
- `cognition/generate-response` command arm in CognitionModule.

## Discipline

- `tokio::time::timeout` wraps `adapter.generate_text` — clean Timeout
  variant on the error enum (TS Promise.race equivalent).
- Saturating subtraction on response_time_ms — clock-backwards artifact
  (NTP adjustment mid-call) reports 0, not a wrapped huge u64.
- tokens_used = None when provider reports zeros — avoids emitting
  fake {0,0,0} measurements for providers that don't instrument usage.
- response_format=Text (TS default) — local Qwen takes plain text,
  no JSON-mode constraint.
- All constants are documented (DEFAULT_GENERATE_PROVIDER/MODEL/
  TEMPERATURE/MAX_TOKENS/TIMEOUT_MS).

## Tests (10 new — full module now 39 passing)

build_response_generation_request:
- defaults: provider=local, model=Qwen-default, temp=0.7, max=150,
  response_format=Text, purpose="cognition/generate-response",
  persona/room attribution, message count
- overrides honored (custom model + temp + max)
- caller timestamp embedded in identity reminder (time-flow through layers)

result_from_response:
- trims surrounding whitespace
- stamps model + timing
- populates tokens when provider reports total > 0
- tokens None when provider reports 0
- response_time saturates clock-backwards

GenerateResponseError:
- NoAdapter Display carries provider + model
- Timeout Display includes duration

Full cognition regression: 335/335 pass.

## NOT in this PR

- **PR-3**: TS shim — AIDecisionService.generateResponse delegates to
  RustCoreIPCClient.cognitionGenerateResponse + cognition mixin
  binding.
- **PR-4**: Delete dead TS — buildResponseMessages helper + inline
  identity-reminder template (~250 LOC removed).

Ref: #1385 oxidizer card, #1388 PR-1 (MERGED).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@joelteply joelteply merged commit 3fe8ecd into canary May 18, 2026
3 checks passed
@joelteply joelteply deleted the feat/oxidizer-generate-response-pr2 branch May 18, 2026 16:24
joelteply pushed a commit that referenced this pull request May 18, 2026
… TS (PR-4 folded)

Stacks on PR-2 #1390 (async evaluate_response + cognition/generate-response
IPC handler). AIDecisionService.generateResponse now delegates to
RustCoreIPCClient.cognitionGenerateResponse; ~110 LOC of TS prompt
assembly + timeout race + token decoding deleted. Mirrors codex's
check_redundancy PR-3 #1383 shape (folded PR-4 dead-code delete in).

## What this ships

- `AIDecisionService.generateResponse` now a thin shim:
  - InferenceCoordinator.requestSlot (TS owns slot coordination — platform concern)
  - client.cognitionGenerateResponse(request) — single IPC call
  - InferenceCoordinator.releaseSlot
  - logError + rethrow on failure (no fail-open silent default)
- New TS binding method `cognitionGenerateResponse(GenerateResponseRequest)
  -> Promise<GenerateResponseResult>` in the cognition mixin
- `GenerateResponseRequest` + `GenerateResponseResult` re-exported
  from the generated barrel (already present from PR-1)

## Dead TS deleted (PR-4 folded in)

- `private static buildResponseMessages(context)` helper (~115 LOC):
  system-prompt injection, conversation history with [HH:MM] prefix,
  hour-gap markers, ~50-line identity-reminder template — all moved
  to Rust in PR-1.
- `import { AIProviderDaemon }` — no longer referenced after both
  checkRedundancy (#1383) + generateResponse migrations.
- `import type { TextGenerationRequest, TextGenerationResponse }` —
  ditto, only used by deleted helper.
- Inline timeout Promise.race code — replaced by Rust-side
  tokio::time::timeout in PR-2.

After this PR, `AIDecisionService.ts` contains only:
  - evaluateGating (already shim to cognition/should-respond)
  - checkRedundancy (already shim to cognition/check-redundancy)
  - generateResponse (now shim to cognition/generate-response)
  - InferenceCoordinator slot management (TS-owned platform concern)
  - logging helpers (TS-owned platform concern)

## Discipline

- No fail-open path — errors throw, caller decides (consistent with
  codex's check_redundancy shim pattern).
- Cast `context as unknown as RustAIDecisionContext` matches the
  pattern in cognitionShouldRespond + cognitionCheckRedundancy —
  TS RAGContext.identity wraps the system prompt; TS already
  resolves to context.systemPrompt before sending.
- Slot coordination explicitly stays TS — that's the seam codex
  drew with check_redundancy, preserved here.
- Token shape preserved: `result.tokensUsed` is `TokenUsage | None`;
  TS just passes through (Rust already mapped from provider's
  UsageMetrics, returning None for zero-token providers).

## Stack progress

- #1385 PR-1 (pure types + prompt builder + identity-reminder
  template): #1388 MERGED
- #1385 PR-2 (async evaluate_response + IPC handler): #1390 OPEN
- #1385 PR-3 (TS shim + dead-TS delete): **this PR**
- #1385 PR-4 (dead-TS delete): **folded into this PR**

## Refs

- #1385 sub-card
- #1388 PR-1 (MERGED)
- #1390 PR-2 (in flight)
- #1383 codex's check_redundancy PR-3 — same shape
- #1248 umbrella

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
joelteply added a commit that referenced this pull request May 18, 2026
… TS (PR-4 folded) (#1402)

Stacks on PR-2 #1390 (async evaluate_response + cognition/generate-response
IPC handler). AIDecisionService.generateResponse now delegates to
RustCoreIPCClient.cognitionGenerateResponse; ~110 LOC of TS prompt
assembly + timeout race + token decoding deleted. Mirrors codex's
check_redundancy PR-3 #1383 shape (folded PR-4 dead-code delete in).

## What this ships

- `AIDecisionService.generateResponse` now a thin shim:
  - InferenceCoordinator.requestSlot (TS owns slot coordination — platform concern)
  - client.cognitionGenerateResponse(request) — single IPC call
  - InferenceCoordinator.releaseSlot
  - logError + rethrow on failure (no fail-open silent default)
- New TS binding method `cognitionGenerateResponse(GenerateResponseRequest)
  -> Promise<GenerateResponseResult>` in the cognition mixin
- `GenerateResponseRequest` + `GenerateResponseResult` re-exported
  from the generated barrel (already present from PR-1)

## Dead TS deleted (PR-4 folded in)

- `private static buildResponseMessages(context)` helper (~115 LOC):
  system-prompt injection, conversation history with [HH:MM] prefix,
  hour-gap markers, ~50-line identity-reminder template — all moved
  to Rust in PR-1.
- `import { AIProviderDaemon }` — no longer referenced after both
  checkRedundancy (#1383) + generateResponse migrations.
- `import type { TextGenerationRequest, TextGenerationResponse }` —
  ditto, only used by deleted helper.
- Inline timeout Promise.race code — replaced by Rust-side
  tokio::time::timeout in PR-2.

After this PR, `AIDecisionService.ts` contains only:
  - evaluateGating (already shim to cognition/should-respond)
  - checkRedundancy (already shim to cognition/check-redundancy)
  - generateResponse (now shim to cognition/generate-response)
  - InferenceCoordinator slot management (TS-owned platform concern)
  - logging helpers (TS-owned platform concern)

## Discipline

- No fail-open path — errors throw, caller decides (consistent with
  codex's check_redundancy shim pattern).
- Cast `context as unknown as RustAIDecisionContext` matches the
  pattern in cognitionShouldRespond + cognitionCheckRedundancy —
  TS RAGContext.identity wraps the system prompt; TS already
  resolves to context.systemPrompt before sending.
- Slot coordination explicitly stays TS — that's the seam codex
  drew with check_redundancy, preserved here.
- Token shape preserved: `result.tokensUsed` is `TokenUsage | None`;
  TS just passes through (Rust already mapped from provider's
  UsageMetrics, returning None for zero-token providers).

## Stack progress

- #1385 PR-1 (pure types + prompt builder + identity-reminder
  template): #1388 MERGED
- #1385 PR-2 (async evaluate_response + IPC handler): #1390 OPEN
- #1385 PR-3 (TS shim + dead-TS delete): **this PR**
- #1385 PR-4 (dead-TS delete): **folded into this PR**

## Refs

- #1385 sub-card
- #1388 PR-1 (MERGED)
- #1390 PR-2 (in flight)
- #1383 codex's check_redundancy PR-3 — same shape
- #1248 umbrella

Co-authored-by: Test <test@test.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant