feat(chat): regenerate last response (CLI, WebSocket, Web UI)#349
Merged
feat(chat): regenerate last response (CLI, WebSocket, Web UI)#349
Conversation
Collaborator
|
Thanks for your contribution! Just done what i'm going to do lol |
…-response diagnostics Add a "regenerate last response" capability that re-runs the prior user message in a session as a fresh turn without duplicating the user row. Exposed across all three entry points: - WebSocket: new `type: "regenerate"` message accepting an optional `overrides` field; mirrors the `start_turn` flow including `subscribe_turn`. Errors `regenerate_busy` / `nothing_to_regenerate` surface as non-fatal `error` events. - CLI: `/regenerate` (alias `/retry`) inside `deeptutor chat`, backed by a shared `regenerate_and_render` helper. - Web UI: per-message Regenerate button (RefreshCcw icon) sits next to Copy on the last assistant message; uses a new `regenerateLastMessage()` action that pops the trailing assistant optimistically and sends the new WS message. Replaces the previous Retry button (which depended on an in-memory `requestSnapshot` and silently disappeared after page reload), so regeneration now also works for historical sessions hydrated from the store. Backend mechanics: - `SQLiteSessionStore.delete_message` and `get_last_message` helpers for tail rollback (append-only invariant preserved). - `TurnRuntimeManager.regenerate_last_turn` deletes the trailing assistant (if any), rebuilds the payload from session preferences and the stored user message (including attachments), sets `_persist_user_message=False` and `_regenerate=True`, then delegates to `start_turn`. New `runtime_only_keys` and SESSION-event metadata carry `regenerated_from_message_id` and `superseded_turn_id` so clients can reconcile the UI without a schema migration. - `_run_turn` skips `memory_service.refresh_from_turn` on regeneration to avoid double-updating long-term memory for the same user prompt. - Rejects when an active turn is already running (`regenerate_busy`) and when no prior user message exists (`nothing_to_regenerate`). Operational improvement bundled in: - `_stage_responding` now warns and emits a non-terminal `error` event when the LLM returns an empty stream (zero non-empty chunks). The warning includes model, chunk count, prompt char size, and observation size so empty assistant messages can be diagnosed from logs and the Trace panel without code instrumentation. Tests: `tests/services/session/test_regenerate.py` (18 tests) covers store deletion helpers, runtime regeneration paths (assistant tail, user tail, empty/missing session, busy guard, override precedence) and an end-to-end flow asserting the user row is preserved, the assistant is replaced, the new SESSION event carries `regenerate=true` / `regenerated_from_message_id`, and `refresh_from_turn` is not called again on regeneration.
Collapses single-line test signatures that fit within the project's 100-char line length, matching what `uvx pre-commit run` would do on this file. No behavioural change.
1233ebe to
f37c385
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a "regenerate last response" capability that re-runs the prior user message in a session as a fresh turn without duplicating the user row. Exposed across all three entry points:
type: "regenerate"message accepting an optionaloverridesfield; mirrors thestart_turnflow includingsubscribe_turn. Errorsregenerate_busy/nothing_to_regeneratesurface as non-fatalerrorevents./regenerate(alias/retry) insidedeeptutor chat, backed by a sharedregenerate_and_renderhelper.RefreshCcwicon) sits next to Copy on the last assistant message. Uses a newregenerateLastMessage()action that pops the trailing assistant optimistically and sends the new WS message. Replaces the previous Retry button (which depended on an in-memoryrequestSnapshotand silently disappeared after page reload), so regeneration now also works for historical sessions hydrated from the store.Design
SQLiteSessionStore.delete_messageandget_last_messagehelpers added for tail rollback (append-only invariant preserved — only the trailing row is ever removed, sosummary_up_to_msg_idis unaffected).TurnRuntimeManager.regenerate_last_turn(session_id, overrides=None):regenerate_busy),nothing_to_regenerate),_persist_user_message=Falseand_regenerate=True, then delegates to the regularstart_turnto keep a single execution path.runtime_only_keysplus SESSION-event metadata (regenerated_from_message_id,superseded_turn_id,regenerate=true) so clients can reconcile the UI without a schema migration. Old turn rows are kept untouched (no destructive history mutation)._run_turnskipsmemory_service.refresh_from_turnon regeneration to avoid double-updating long-term memory for the same user prompt.Operational improvement bundled in
AgenticChatPipeline._stage_respondingnow warns and emits a non-terminalerrorevent when the LLM returns an empty stream (zero non-empty chunks). The warning includes model, chunk count, prompt char size, and observation size so empty assistant messages can be diagnosed from logs and the Trace panel without further code instrumentation. This is independent of regenerate but materially improves UX when a model returns nothing — the user now sees a clear hint to hit Regenerate instead of staring at a silent empty bubble.Test plan
tests/services/session/test_regenerate.py(18 tests):_extract_regenerate_flagparsing.delete_messageremoves only the targeted row;get_last_messagewith/without role filter and on empty sessions.start_turn:assistant→ assistant deleted, payload replays user content / capability / tools / kb / language / attachments;user→ no deletion, new turn proceeds;nothing_to_regenerate;regenerate_busy;regenerate=trueandregenerated_from_message_id, andrefresh_from_turnis not invoked for the regenerated turn.All 18 + neighbouring tests in
tests/services/session/,tests/api/test_unified_ws_turn_runtime.py,tests/cli/test_chat_cli.py,tests/services/test_app_facade.pypass locally.Manual smoke
deeptutor chat, send a message, then/regenerate→ re-runs the previous prompt with the same preferences;/retryis an alias.{ "type": "regenerate", "session_id": "..." }→subscribe_turnfor the new turn id;regenerate_busy/nothing_to_regeneratereturned aserrorevent withmetadata.reason.Notes
unified_ws.pymodule docstring updated to document the new CLI command and WS message type.