🤖 feat: AI-generated sidebar status via small-model loop#3238
Conversation
|
Preview deployment for your docs. Learn more about Mintlify Previews.
💡 Tip: Enable Workflows to automatically generate PRs for you. |
|
@codex review Please review this PR. Key areas:
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f1c982f6c7
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
@codex review Addressed the partial.json feedback: |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: b52db3d1d4
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
@codex review Addressed the fairness feedback: |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 265f907bf5
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
@codex review Fixed the persist-vs-hash ordering. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 3d2ac73d1c
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
@codex review Made |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 3b118b76b2
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
@codex review Two follow-ups:
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 2264113ab8
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
@codex review Addressed the lifecycle hazard: |
|
Codex Review: Didn't find any major issues. Breezy! ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
|
@codex review Refactor pass for simplicity / minimal LoC / maintenance burden. Net delta: -438 lines.
All previously-resolved review concerns (round-robin fairness, persist-then-hash, partial-stream inclusion, lifecycle stop guard, transient status precedence) are preserved and still tested. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 5a67d08ddf
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
@codex review Addressed P1: |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 5325acf7a2
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
@codex review Addressed P2: added a |
|
Codex Review: Didn't find any major issues. More of your lovely PRs please. ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
|
@codex review Addressed three review threads:
Net diff: +83 / -163. All previously-resolved Codex concerns (recency preservation, pre-generator stop guard, partial-stream inclusion, round-robin fairness, persist-then-hash, lifecycle stop guard) are preserved. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 9b9aafe40b
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
@codex review Addressed P2: inverted the precedence in the active-workspace branch so Added a regression test that gives the workspace BOTH live aggregator todos AND a persisted activity todoStatus, asserting the persisted one wins. |
|
Codex Review: Didn't find any major issues. You're on a roll. ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
|
@codex review Bug fix prompted by deployed behavior: most workspaces were showing the literal phrase "Awaiting next task" in the sidebar. Root cause was the small-model prompt itself, which both:
Reframed around "most recent activity", dropped the special idle branch, and explicitly forbid generic placeholders so the model names the concrete activity from the transcript instead. |
|
Codex Review: Didn't find any major issues. Already looking forward to the next diff. ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: fc33fa5cbe
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 32a90729c5
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
@codex review Handled the startup/no-settled-hash variant too: a first recent recency bump now waits one scheduler tick instead of generating against potentially stale pre-pivot history, then the still-unconsumed recency can force the refresh once the user message is in history. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: fd88f3f4a7
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
@codex review Fixed the empty-transcript starvation path: non-racy empty transcripts now consume observed recency so they do not keep sorting ahead of populated workspaces under max-concurrency=1, while the recent first-recency guard still protects the sendMessage recency/history race. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 40d38dd213
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
@codex review Fixed the pre-provider failure starvation path: config/auth failures now consume the observed recency priority while leaving |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: bb21f74090
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
@codex review Addressed the pre-provider/history race by tracking the last transcript hash the scheduler examined separately from the settled dedup hash. Recent recency bumps now wait if history still shows the same transcript, including pre-provider retry states, while older/config-failure recency is still consumed to avoid starvation. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 37a9dc9a72
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
@codex review Fixed stale dedup recency priority: once the recent history-catchup guard has expired, unchanged transcript dedup skips now consume observed recency so the workspace stops bypassing cadence and cannot starve other workspaces. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: be1bf725a0
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
@codex review Fixed startup behavior too: AgentStatusService.start() now kicks off one immediate runTick in addition to scheduling future interval ticks, matching the no-startup-delay comment while preserving max-concurrency=1. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 07e2ef0c88
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
@codex review Fixed in-flight stale writes: after generation returns, AgentStatusService now reloads the workspace snapshot and drops the result if recency advanced beyond the value observed at dispatch, so a slow provider response cannot persist/emit a pre-pivot status after a newer user turn. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c8f8d35d80
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
@codex review Fixed the atomicity issue: ExtensionMetadataService.setSidebarStatus now accepts a serialized-mutation guard ( |
|
Codex Review: Didn't find any major issues. Swish! ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
The post-provider-failure / placeholder-rejection / successful-persist branches in runForWorkspace each ran the same two-line pair: markRecencyObserved() to consume any observed recency bump, then state.lastInputHash = inputHash to advance the dedup hash so the next tick won't regenerate against the same transcript. Hoist that pair into a settleOnTranscript() closure (next to the existing markRecencyObserved closure) so the three settlement branches read as a single named intent. The pre-provider-failure and empty/dedup-hit branches keep using bare markRecencyObserved() because they should still retry against the same transcript when conditions change — captured in a comment on the new helper so a future reader doesn't reintroduce the duplicated pair. Pure refactor — emitted JS, all six callsite branches, and the existing 25-test agentStatusService suite (including the post-provider/placeholder/success/pre-provider/pre-provider-retry/dedup regression cases added in #3238) are all unchanged. 🤖 Generated with mux Co-Authored-By: Mux <noreply@coder.com>
The post-provider-failure / placeholder-rejection / successful-persist branches in runForWorkspace each ran the same two-line pair: markRecencyObserved() to consume any observed recency bump, then state.lastInputHash = inputHash to advance the dedup hash so the next tick won't regenerate against the same transcript. Hoist that pair into a settleOnTranscript() closure (next to the existing markRecencyObserved closure) so the three settlement branches read as a single named intent. The pre-provider-failure and empty/dedup-hit branches keep using bare markRecencyObserved() because they should still retry against the same transcript when conditions change — captured in a comment on the new helper so a future reader doesn't reintroduce the duplicated pair. Pure refactor — emitted JS, all six callsite branches, and the existing 25-test agentStatusService suite (including the post-provider/placeholder/success/pre-provider/pre-provider-retry/dedup regression cases added in #3238) are all unchanged. 🤖 Generated with mux Co-Authored-By: Mux <noreply@coder.com>
Summary
Adds an
AgentStatusServicethat produces the sidebar agent-status text using the same "small model" path as workspace title generation (NAME_GEN_PREFERRED_MODELS— Claude Haiku → GPT-mini), instead of inferring it from the workspace's TODO list. The status is regenerated on a focus-aware cadence, skips when the trailing transcript hash is unchanged, and is fed at most ~8k trailing transcript tokens.Background
Today the sidebar's per-workspace status string is derived from
deriveTodoStatus, which picks the first in-progress / pending / last-completed TODO. That's a noisy proxy for "what is the agent doing right now":todo_write, so a long bash run or a multi-step assistant turn shows a stale TODO.The user asked for the status to come from a small LLM call instead, with sensible cost controls (token cap, cadence, dedup, focus gating).
Implementation
Constants (
src/constants/agentStatus.ts)30s, unfocused cadence120s, scheduler tick10s, startup delay30s.80messages, ≤8000tokens, with a per-message char cap of4000.1global generation in flight.Small-model path (
src/node/services/workspaceStatusGenerator.ts)workspaceTitleGenerator.ts.streamText+ a newpropose_statustool (added totoolDefinitions.ts); reusesmapNameGenerationError/mapModelCreationErrorso all the existing 401/402/403/429/5xx classifications apply.<transcript>markers and instructs the model to focus on current activity rather than overall task scope.propose_statusis excluded from defaulttools.tsregistration with the same comment aspropose_name.Scheduler (
src/node/services/agentStatusService.ts)setIntervalticking every 10s. Per-workspacelastRanAt+ a current focused/unfocused interval gate the actual generation.getLastMessages(80)+readPartial→ format → trim oldest entries until under the token budget (usingTokenizerServicewith a fallback model constant), then SHA-256 the transcript.lastRanAtascending soMAX_CONCURRENT=1doesn't starve workspaces deeper in the list (round-robin without an explicit queue).runTickawaits all dispatched workspaces (via trackedinFlightPromises) instead of fire-and-forget, so production schedulers and tests can observe completion deterministically.if (this.stopped) returnguards (post-generation, post-persist) so astop()mid-flight never leaks writes past the service's lifecycle.Focus tracking (
src/node/services/windowService.ts)WindowService.isFocused()delegates toBrowserWindow.isFocused()directly. Falls back totruein non-desktop contexts (CLI server, tests) so backend services don't throttle themselves to "unfocused" cadence when there is no window.Schema + persistence
aiStatusfield onExtensionMetadata+WorkspaceActivitySnapshotSchema.extensionMetadata.setAiStatusdirectly (bypassing the historicalemitWorkspaceActivityUpdatewrapper that would have swallowed disk errors) and emits the activity snapshot viaworkspaceService.emitWorkspaceActivityonly after the disk write resolves successfully.ExtensionMetadataService.save()re-throws on failure to support this;initialize()keeps its startup-safety semantics by wrapping its mutation calls in a logged try/catch per AGENTS.md.Frontend (
src/browser/stores/WorkspaceStore.ts)transientStatus ?? aiStatus ?? todoStatus.transientStatuscollapsesdisplayStatus(inactive workspaces) andaggregator.getAgentStatus()(active workspaces) so explicit system-set messages always beat aiStatus regardless of which branch produces them.applyWorkspaceActivitySnapshotdiffsaiStatusso a fresh small-model status triggers re-render.Validation
workspaceStatusGenerator.test.ts: prompt structure (transcript delimiters, empty-transcript sentinel) + empty-candidates short-circuit.agentStatusService.test.ts: generates on first tick, dedups when transcript is unchanged, regenerates after history changes, includes the partial assistant message mid-stream, skips empty workspaces, focused-vs-unfocused cadence boundaries, round-robin fairness across multiple workspaces, archived workspaces are skipped, stops cleanly mid-generation, persist failures don't advance the dedup hash.WorkspaceStore.test.ts: aiStatus beats todoStatus; displayStatus still beats aiStatus.make static-checkclean (eslint, typecheck, prettier, doc-link check).make testshows only pre-existing flaky-timing failures present onmain(CLI auth + GitStatusStore subset). My new files pass cleanly.Risks
mapNameGenerationErrorretry budget capped at 3 candidates.Codex review history (resolved)
chat.jsonl. Fixed by appendingreadPartialso the hash refreshes mid-stream.MAX_CONCURRENT=1and fixed iteration order, deep workspaces would starve. Fixed withlastRanAtascending sort.updateAiStatuswent through a wrapper that silently swallowed disk errors. Fixed by callingsetAiStatusdirectly and updating the hash only on success.save()swallowing errors:ExtensionMetadataService.save()itself swallowed errors. Fixed by re-throwing;initialize()wraps its calls in try/catch to preserve startup-safety.aggregator.getAgentStatus()was being buried under aiStatus for active workspaces. Fixed with thetransientStatustier.runForWorkspacecould persist/emit afterstop(). Fixed with twothis.stoppedchecks at await boundaries.Simplification refactor
After the original implementation landed and Codex approved, the diff was simplified for minimal LoC / maintenance burden (8 files, +232 / -670 lines):
aiStatusInputHashround-trip turned out to add ~50 LoC of complexity (hydration sweep, getter method, conditional-spread coercion) for ~one extra LLM call per workspace per app launch. Removed the field from the type, the coercion, the setter signature, andgetAiStatusInputHashentirely; in-session dedup still works.WindowService: droppedEventEmitterinheritance,setFocused,focusedfield, focus/blur listeners, and the test-stub defensive check.isFocused()delegates toBrowserWindow.isFocused()directly.AgentStatusService: dropped__test__exports,inFlightCount(useinFlightPromises.size),markInFlighthelper,drainInFlighthelper,hashesHydrated+hydratePersistedHashes,resolveTokenizerModel. Mergedtick()+runTick(). UsedFALLBACK_TOKENIZER_MODELconstant inbuildTrailingTranscriptso we don't double-callgetWorkspaceTitleModelCandidatesper workspace.Generated with
mux• Model:anthropic:claude-opus-4-7• Thinking:max• Cost:$58.06