fix: sandbox monitoring lifecycle bounds#287
Conversation
PR SummaryMedium Risk Overview Improves monitoring charts to better reflect lifecycle state: killed spans are treated as inactive gaps, charts can draw synthetic dashed connectors for active windows with missing metrics, and connectors now support an Refines live monitoring UX by gating “live” indicators/polling on recent metric timestamps (new Reviewed by Cursor Bugbot for commit 9ce28ec. Bugbot is set up for automated code reviews on this repo. Configure here. |
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
There was a problem hiding this comment.
Pull request overview
This PR tightens how sandbox lifecycle “inactive” periods are derived and displayed in monitoring, avoiding incorrect fallback timestamps and handling duplicate lifecycle events more defensively.
Changes:
- Stop using
endAtas a fallback source forpausedAtin both serversandbox.detailsand dashboard sandbox lifecycle state. - Update
deriveSandboxLifecycleFromEventsto ignore duplicate paused/killed events (until a resume) and to clear historicalendedAton resume. - Update monitoring chart lifecycle windowing to treat killed spans as inactive until the next resume, and merge overlapping inactive windows; add/adjust unit tests accordingly.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| src/server/api/routers/sandbox.ts | Removes pausedAt fallback; relies on derived lifecycle events. |
| src/server/api/models/sandboxes.models.ts | Introduces lifecycle status tracking to bound paused/killed state transitions. |
| src/features/dashboard/sandbox/monitoring/utils/chart-lifecycle.ts | Reworks inactive window derivation (pause+kill) and merges overlaps. |
| src/features/dashboard/sandbox/context.tsx | Removes pausedAt fallback in client lifecycle state. |
| src/test/unit/sandbox-monitoring-chart-model.test.ts | Adds coverage for killed-as-inactive-until-resume behavior. |
| src/test/unit/sandbox-lifecycle.test.ts | Updates/adds tests around duplicates and resume-clearing behavior. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if (event.type === SANDBOX_LIFECYCLE_RESUMED_EVENT) { | ||
| const pausedAtMs = parseEventTimestampMs(pausedAt) | ||
| if (pausedAtMs !== null && pausedAtMs <= timestampMs) { | ||
| if (pausedAtMs === null || pausedAtMs <= timestampMs) { | ||
| pausedAt = null | ||
| endedAt = null | ||
| status = 'running' |
There was a problem hiding this comment.
parseEventTimestampMs expects a string, but this code passes pausedAt which is typed as string | null. With strict: true, this won’t type-check. Guard pausedAt before parsing (e.g., treat null as null ms) or widen parseEventTimestampMs to accept string | null | undefined and handle it consistently.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 6ed77fb549
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| if (status !== 'killed') { | ||
| endedAt = event.timestamp | ||
| pausedAt = null | ||
| status = 'killed' |
There was a problem hiding this comment.
Keep endedAt updated for later killed events
The new status !== 'killed' guard suppresses all subsequent sandbox.lifecycle.killed events until a resumed appears. If the lifecycle stream is incomplete (which can happen because event pagination stops on fetch errors), a newer kill can be present while its matching resume is missing, and endedAt remains stuck on an older timestamp. That produces incorrect stop times and lifecycle bounds for killed sandboxes.
Useful? React with 👍 / 👎.
| if (inactiveStartMs === null) { | ||
| inactiveStartMs = eventTimestampMs | ||
| } |
There was a problem hiding this comment.
Advance inactive window on newer kill events
This logic pins inactiveStartMs to the first paused/killed event and ignores later kills unless a resume is observed. When lifecycle events are partial, that can null-out chart data from an old kill all the way to rangeEnd, even if later metrics belong to a newer execution. The inactive start should be revised for newer kill boundaries (or disambiguated by execution id) rather than remaining fixed.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
LGTM — clean state-machine fix with good test coverage.
Extended reasoning...
Overview
This PR fixes sandbox monitoring lifecycle bounds across three layers: the event-derived lifecycle model (), the monitoring chart inactive-window builder (), and the context/router that assembles the final . A helper is added and the duplicated logic is correctly removed from both the router and context.
Security Risks
None. The changes are purely client-side display logic and server-side data mapping with no auth, crypto, or permission-related code touched.
Level of Scrutiny
Moderate. The logic is non-trivial (a state machine with deduplication semantics), but the changes are self-contained and well-covered by new unit tests. The test cases exercise the key edge cases: duplicate paused events, duplicate killed events, and kill-then-resume sequences. The state machine is correct: clears ; a subsequent with still resets and , which is the intended behavior for a kill-then-resume flow.
Other Factors
No prior reviews from me. The Vercel build failure appears to be in the project (not ), and the preview build for is still pending at timeline snapshot time — this looks like a pre-existing infra issue unrelated to the PR logic. No bugs were found by the bug hunting system.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 8840f8a. Configure here.
matthewlouisbrockman
left a comment
There was a problem hiding this comment.
might lose some indicator on the start -> first metric but can do that in another pr

No description provided.