fix: drain pending replay notifications before emitting replay_complete#163
fix: drain pending replay notifications before emitting replay_complete#163
Conversation
… creation triggers Adds [session] prefixed console.log statements at key points: - createDraftSession: when a frontend draft is created - promoteDraft: when a draft is promoted for first message send - acpPrepareSession: before/after backend session creation with trigger reason - acp:session_bound: when backend confirms session binding - setSessionAcpId: when frontend links to backend session ID - loadSessions: summary of backend sessions including empty session count - sendMessage error/abort: when message send fails after session creation These logs help diagnose empty chat sessions by tracing what triggered backend session creation and whether the subsequent message send succeeded. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replaces [session] lifecycle logs with [replay] logs that trace the full session history loading pipeline: Frontend (useAcpStream/AppShell): - Log when loadSessionMessages starts with goose session ID and messageCount - Count replay events (user messages, assistant messages, total) as they arrive - Log replay_complete with event counts and buffer size - Log buffer flush with message role breakdown - Log timeout with accumulated event counts to detect stalled replays Backend (Rust dispatcher/session_ops): - Log load_session start with session ID mapping and working dir - Count replayed user/assistant messages in SessionRoute - Log load_session completion with event counts and elapsed time These logs help diagnose sessions that load with missing messages by showing exactly how many events the backend emitted vs how many the frontend received and buffered. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
After load_session().await resolves, the ACP RPC layer may still have queued notification tasks (session/update with message chunks) that haven't been processed yet on the single-threaded runtime. This caused non-deterministic message loss when loading older sessions. Yield repeatedly after load_session returns, waiting for the replay event count to stabilize, before finalizing and emitting replay_complete. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Tests all 6 exported functions from replayBuffer.ts: - ensureReplayBuffer: creation, identity, session isolation - getBufferedMessage: lookup by id, missing cases - getReplayBufferSize: unknown session, accurate count - getAndDeleteReplayBuffer: returns and removes buffer - clearReplayBuffer: removes without returning - findLatestUnpairedToolRequest: edge cases for paired/unpaired tool requests Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This reverts commit dd51a72.
Extract the yield-and-wait-for-stability loop from load_session_inner into a generic wait_for_replay_drain() function that accepts an async counter callback, making it testable in isolation. Add 4 tests: - replay_drain_returns_immediately_when_count_is_zero - replay_drain_returns_stable_count (already-stable counter) - replay_drain_waits_for_spawned_notifications (tokio::spawn simulating async ACP notifications arriving over multiple yields) - replay_drain_resets_stability_on_late_arrival (counter bumps just before stability threshold, verifying the reset logic) The last two tests fail when the drain logic is replaced with a single poll (the pre-fix behavior), confirming the fix is necessary. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Remove [session] and [replay] info-level console.log/log::info statements added during empty-chats investigation. Revert the dotenv-load justfile addition. Simplify dispatcher replay counting to a single total (only field the drain fix actually reads). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add a MAX_DRAIN_ITERATIONS (100) safety cap to wait_for_replay_drain so a runaway counter cannot spin-yield forever and freeze the UI. Log a warning when the cap is hit, returning the partial count. Strengthen the late-arrival test to assert poll_count >= 7, confirming the stability window truly resets rather than coincidentally seeing the bumped value. Add a new test for the iteration cap behavior. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…path Merge the two consecutive routes lock acquisitions in the UserMessageChunk replay handler into a single lock scope. The first lock finalized any in-progress assistant message, and the second incremented replay_events — both on the same route entry. Combining them reduces contention and simplifies the code. Tighten the file size limit for dispatcher.rs accordingly (545 → 540).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ed559e808f
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| let mut routes = self.routes.lock().await; | ||
| if let Some(route) = routes.get_mut(&goose_session_id) { | ||
| route.replay_message_id = Some(new_id.clone()); | ||
| route.replay_events += 1; |
There was a problem hiding this comment.
Count every replay event before finalizing replay
wait_for_replay_drain waits for replay_events to stabilize, but this counter is only incremented when a new replay assistant message is created (plus user chunks), not for subsequent AgentMessageChunk/tool replay updates on the same message. That lets load_session_inner call finalize_replay and emit acp:replay_complete while replay chunks are still queued, and those late chunks can be dropped once the frontend marks the message completed (shouldTrackStreamingEvent in useAcpStream). Please increment the drain counter for every replay notification that can still mutate buffered replay state, or use a completion signal tied to the notification queue instead of message creation.
Useful? React with 👍 / 👎.
The debug logging cleanup commit (5fdab4e) accidentally removed [perf:load] and [perf:stream] console.log instrumentation by Bradley Axen and a "Failed to prepare ACP session" console.error by Taylor Ho. These were pre-existing logs, not debug logs added during investigation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary
wait_for_replay_drainhelper that yields to the async runtime and waits for the replay event counter to stabilise before finalizing replay, fixing a race wherereplay_completefired before all replay notifications were dispatchedSessionRoutevia a newreplay_eventsfield, with aget_replay_event_countaccessor on the dispatcherAppShell.tsxanduseAcpStream.tsthat was added during investigationcheck-file-sizes.mjsto account for the new codeTest plan
wait_for_replay_draincover: immediate zero, stable count, spawned notifications, late arrival reset, iteration cap🤖 Generated with Claude Code