🤖 fix: keep ask_user_question waiting across restart #1152

ThomasK33 · 2025-12-14T19:44:54Z

Fixes a restart edge case where ask_user_question was treated as an interrupted stream.

When Mux is closed while the agent is blocked on ask_user_question, we now treat that tool call as a durable waiting-for-input state:

No Retry/Interrupted UI on restart
No auto-resume that re-runs the LLM call and re-asks the questions
Interrupt keybinds (Esc/Ctrl+C) + command palette interrupt are disabled while awaiting questions

On restart, answering the questions now works even though the in-memory pending tool call is gone:

Backend persists the tool result into partial.json (or chat history) and emits a synthetic tool-call-end
Frontend triggers a manual resume check so the assistant continues immediately after answers

📋 Implementation Plan

🤖 Plan: Persist `ask_user_question` as a true “waiting for input” state

Goal

When a workspace is blocked on ask_user_question and the user closes/reopens Mux, do not treat that as an interrupted stream that must be auto-resumed. Instead:

Restore the existing question UI
Allow answering the questions
Resume the assistant only after answers are submitted
Do not let Esc cancel/interrupt while we’re awaiting an ask_user_question

Recommended approach (minimal + consistent)

Make ask_user_question “resume-safe” by treating it as a special waiting state (not an interruption).

What changes, behavior-wise

After app restart with a partial message whose last part is an unfinished ask_user_question tool call:
- Tool UI shows as executing (answerable), not interrupted.
- We do not show the “Interrupted” chat barrier.
- We do not show the RetryBarrier.
- We do not auto-call resumeStream().
While actively awaiting ask_user_question (stream is still “running” but blocked on user input):
- Esc / Ctrl+C interrupt keybind becomes a no-op for this state.
- UI hints should not advertise interrupting; they should point to answering/cancel-by-chat.
When the user submits answers after a restart (no active stream exists anymore):
- Backend updates the persisted partial (or history) message to mark the tool call as output-available with { questions, answers }.
- Backend emits a synthetic tool-call-end event so the renderer updates immediately.
- Frontend triggers a resume check (manual) so the assistant continues promptly.

Why this works with the current architecture

On restart, we usually have partial.json with the assistant message containing the tool call.
Today, we mark unfinished tools in partial messages as interrupted, which triggers:
- Retry UI + auto-resume manager → extra LLM call → new tool call
By:
- keeping the tool answerable, and
- suppressing the “interrupted” UX + auto-resume,
  we avoid re-running the LLM just to re-create questions.

Implementation steps

1) Frontend: classify `ask_user_question` in partial messages as “executing”

Files:

src/browser/utils/messages/StreamingMessageAggregator.ts

Change: In getDisplayedMessages() tool status mapping:

Current: input-available && message.metadata.partial → status = "interrupted"
New: if toolName === "ask_user_question", treat input-available as "executing" even when partial.

Also tighten hasAwaitingUserQuestion():

Only consider the latest displayed message (or latest tool message) to avoid “stale waiting” if the user continues the chat.

2) Frontend: suppress “Interrupted” + Retry + auto-resume for that state

Files:

src/browser/utils/messages/retryEligibility.ts
src/browser/utils/messages/messageUtils.ts
src/browser/components/AIView.tsx (optional defense-in-depth)

Change:

hasInterruptedStream(...): if the last message is a tool message with toolName === "ask_user_question" and status === "executing", return false.
- This automatically disables:
  - RetryBarrier
  - useResumeManager auto-resume
shouldShowInterruptedBarrier(msg): return false for the same tool message type.
(Optional) In AIView, also gate showRetryBarrier by !awaitingUserQuestion for extra safety.

3) Frontend: disable interrupt keybind while awaiting questions

File:

src/browser/hooks/useAIViewKeybinds.ts

Change:

When the interrupt keybind is pressed:
- If aggregator?.hasAwaitingUserQuestion() is true, do not call workspace.interruptStream and do not toggle autoRetry.

4) Frontend: stop advertising `Esc` for this state

Files:

src/browser/components/ChatInput/index.tsx
src/browser/components/AIView.tsx

Change (small UX polish):

Add awaitingUserQuestion as a prop to ChatInput so the placeholder/hints avoid “Esc to interrupt” and instead reflect “Answer above / type a message to respond”.

5) Backend: allow answering after restart (no active stream)

Files:

src/node/services/workspaceService.ts

Change: make answerAskUserQuestion(...) async and implement fallback:

Try the current in-memory path:
- If the tool is actually pending in askUserQuestionManager, resolve it (existing behavior).
Otherwise (restart case):
- Read partial.json via partialService.readPartial(workspaceId).
  - If the partial message contains the ask_user_question toolCallId, update that tool part to output-available with { questions, answers } and write back via partialService.writePartial.
- Else: locate the message in chat.jsonl via historyService.getHistory and update via historyService.updateHistory.
- Emit a synthetic tool-call-end chat event using session.emitChatEvent(...) so the UI updates immediately.

Important guardrails (defensive programming):

Validate the tool part’s input matches AskUserQuestionToolArgs shape before using it.
Refuse to answer if the tool call is stale (e.g., the message is not the latest assistant message in history), returning a clear error.

6) Frontend: after successful submit, request resume immediately

File:

src/browser/components/tools/AskUserQuestionToolCall.tsx

Change: after answerAskUserQuestion succeeds:

Dispatch CUSTOM_EVENTS.RESUME_CHECK_REQUESTED with { workspaceId, isManual: true }.
- This bypasses autoRetry=false state and resumes promptly.

Tests

Frontend unit tests

src/browser/utils/messages/StreamingMessageAggregator.status.test.ts
- New case: partial message with unfinished ask_user_question → tool status is executing and hasAwaitingUserQuestion() is true.
src/browser/utils/messages/retryEligibility.test.ts
- New case: last message is partial ask_user_question executing → hasInterruptedStream is false.

Backend unit tests

Add a small pure helper (new file or colocated) that:
- finds a tool part by toolCallId,
- builds { questions, answers },
- returns updated message.
Test:
- success path (input has questions)
- failure path (toolCallId missing)
- stale-guard path (message not last)

Rollout / validation

Manual repro:
- Trigger ask_user_question.
- Quit Mux completely.
- Relaunch → ensure questions are still answerable, no retry UI, no auto-resume.
- Submit answers → ensure mux resumes and continues.
Confirm Esc does not interrupt while awaiting questions.
Confirm “real” interrupted streams (non-ask_user_question) still show RetryBarrier + auto-resume.

Net LoC estimate (product code only)

Recommended approach: ~180–260 LoC
- Frontend classification + eligibility + keybind gating: ~60–100
- Backend fallback answer persistence + event emission: ~120–160

Alternatives considered

A) Persist/resume the actual in-flight model request

Would require provider-specific “resume from tool call” / response-id continuation.
High complexity, brittle across providers.
Not recommended.

B) Make `convertToModelMessages(ignoreIncompleteToolCalls=false)` for `ask_user_question`

Risks sending incomplete tool calls to providers (API validation failures).
Still doesn’t solve “answer after restart” unless we persist tool result.
Not recommended.

Generated with mux • Model: openai:gpt-5.2 • Thinking: xhigh

chatgpt-codex-connector · 2025-12-14T19:45:03Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Repo admins can enable using credits for code reviews in their settings.

Change-Id: I0ae14c18a3380a752e787012e4b3a3ee88429b54 Signed-off-by: Thomas Kosiewski <tk@coder.com>

Change-Id: Id57c2cf78994d5713020f7922bb414e5af773410 Signed-off-by: Thomas Kosiewski <tk@coder.com>

ThomasK33 added 2 commits December 14, 2025 21:22

🤖 fix: keep ask_user_question waiting across restart

3760d72

Change-Id: I0ae14c18a3380a752e787012e4b3a3ee88429b54 Signed-off-by: Thomas Kosiewski <tk@coder.com>

🤖 refactor: keep ChatInput placeholder unchanged

8420997

Change-Id: Id57c2cf78994d5713020f7922bb414e5af773410 Signed-off-by: Thomas Kosiewski <tk@coder.com>

ThomasK33 force-pushed the ask-user-questions-interruption branch from 26ae06e to 8420997 Compare December 14, 2025 20:24

ThomasK33 added this pull request to the merge queue Dec 14, 2025

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Dec 14, 2025

ThomasK33 merged commit f50fb3a into main Dec 14, 2025
20 checks passed

ThomasK33 deleted the ask-user-questions-interruption branch December 14, 2025 20:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

🤖 fix: keep ask_user_question waiting across restart #1152

🤖 fix: keep ask_user_question waiting across restart #1152

Uh oh!

ThomasK33 commented Dec 14, 2025

Uh oh!

chatgpt-codex-connector bot commented Dec 14, 2025

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

🤖 fix: keep ask_user_question waiting across restart #1152

🤖 fix: keep ask_user_question waiting across restart #1152

Uh oh!

Conversation

ThomasK33 commented Dec 14, 2025

🤖 Plan: Persist ask_user_question as a true “waiting for input” state

Goal

Recommended approach (minimal + consistent)

What changes, behavior-wise

Why this works with the current architecture

Implementation steps

1) Frontend: classify ask_user_question in partial messages as “executing”

2) Frontend: suppress “Interrupted” + Retry + auto-resume for that state

3) Frontend: disable interrupt keybind while awaiting questions

4) Frontend: stop advertising Esc for this state

5) Backend: allow answering after restart (no active stream)

6) Frontend: after successful submit, request resume immediately

Tests

Frontend unit tests

Backend unit tests

Rollout / validation

Net LoC estimate (product code only)

A) Persist/resume the actual in-flight model request

B) Make convertToModelMessages(ignoreIncompleteToolCalls=false) for ask_user_question

Uh oh!

chatgpt-codex-connector bot commented Dec 14, 2025

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

🤖 Plan: Persist `ask_user_question` as a true “waiting for input” state

1) Frontend: classify `ask_user_question` in partial messages as “executing”

4) Frontend: stop advertising `Esc` for this state

B) Make `convertToModelMessages(ignoreIncompleteToolCalls=false)` for `ask_user_question`