Skip to content

fix: sandbox monitoring lifecycle bounds#287

Merged
ben-fornefeld merged 16 commits intomainfrom
fix/sandbox-monitoring-paused-lifecycle
Apr 6, 2026
Merged

fix: sandbox monitoring lifecycle bounds#287
ben-fornefeld merged 16 commits intomainfrom
fix/sandbox-monitoring-paused-lifecycle

Conversation

@ben-fornefeld
Copy link
Copy Markdown
Member

No description provided.

Copilot AI review requested due to automatic review settings April 4, 2026 00:16
@cursor
Copy link
Copy Markdown

cursor bot commented Apr 4, 2026

PR Summary

Medium Risk
Updates lifecycle derivation and monitoring chart rendering/polling behavior; regressions could cause incorrect time bounds, gaps, or live indicators in the dashboard charts.

Overview
Fixes sandbox lifecycle bounding by deriving pausedAt/endedAt solely from the last lifecycle event (paused/killed), avoiding stale constraints after subsequent resume/update events.

Improves monitoring charts to better reflect lifecycle state: killed spans are treated as inactive gaps, charts can draw synthetic dashed connectors for active windows with missing metrics, and connectors now support an isSynthetic style.

Refines live monitoring UX by gating “live” indicators/polling on recent metric timestamps (new SANDBOX_MONITORING_LIVE_DATA_THRESHOLD_MS) and by rendering charts from a stable snapshot (with initial loading state) to reduce flicker during refetches; adds/extends unit tests to cover these cases.

Reviewed by Cursor Bugbot for commit 9ce28ec. Bugbot is set up for automated code reviews on this repo. Configure here.

@vercel
Copy link
Copy Markdown

vercel bot commented Apr 4, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
web Ready Ready Preview, Comment Apr 6, 2026 5:37pm
web-juliett Ready Ready Preview, Comment Apr 6, 2026 5:37pm

Request Review

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR tightens how sandbox lifecycle “inactive” periods are derived and displayed in monitoring, avoiding incorrect fallback timestamps and handling duplicate lifecycle events more defensively.

Changes:

  • Stop using endAt as a fallback source for pausedAt in both server sandbox.details and dashboard sandbox lifecycle state.
  • Update deriveSandboxLifecycleFromEvents to ignore duplicate paused/killed events (until a resume) and to clear historical endedAt on resume.
  • Update monitoring chart lifecycle windowing to treat killed spans as inactive until the next resume, and merge overlapping inactive windows; add/adjust unit tests accordingly.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
src/server/api/routers/sandbox.ts Removes pausedAt fallback; relies on derived lifecycle events.
src/server/api/models/sandboxes.models.ts Introduces lifecycle status tracking to bound paused/killed state transitions.
src/features/dashboard/sandbox/monitoring/utils/chart-lifecycle.ts Reworks inactive window derivation (pause+kill) and merges overlaps.
src/features/dashboard/sandbox/context.tsx Removes pausedAt fallback in client lifecycle state.
src/test/unit/sandbox-monitoring-chart-model.test.ts Adds coverage for killed-as-inactive-until-resume behavior.
src/test/unit/sandbox-lifecycle.test.ts Updates/adds tests around duplicates and resume-clearing behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +126 to +131
if (event.type === SANDBOX_LIFECYCLE_RESUMED_EVENT) {
const pausedAtMs = parseEventTimestampMs(pausedAt)
if (pausedAtMs !== null && pausedAtMs <= timestampMs) {
if (pausedAtMs === null || pausedAtMs <= timestampMs) {
pausedAt = null
endedAt = null
status = 'running'
Copy link

Copilot AI Apr 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parseEventTimestampMs expects a string, but this code passes pausedAt which is typed as string | null. With strict: true, this won’t type-check. Guard pausedAt before parsing (e.g., treat null as null ms) or widen parseEventTimestampMs to accept string | null | undefined and handle it consistently.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6ed77fb549

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +137 to +140
if (status !== 'killed') {
endedAt = event.timestamp
pausedAt = null
status = 'killed'
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Keep endedAt updated for later killed events

The new status !== 'killed' guard suppresses all subsequent sandbox.lifecycle.killed events until a resumed appears. If the lifecycle stream is incomplete (which can happen because event pagination stops on fetch errors), a newer kill can be present while its matching resume is missing, and endedAt remains stuck on an older timestamp. That produces incorrect stop times and lifecycle bounds for killed sandboxes.

Useful? React with 👍 / 👎.

Comment on lines +169 to +171
if (inactiveStartMs === null) {
inactiveStartMs = eventTimestampMs
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Advance inactive window on newer kill events

This logic pins inactiveStartMs to the first paused/killed event and ignores later kills unless a resume is observed. When lifecycle events are partial, that can null-out chart data from an old kill all the way to rangeEnd, even if later metrics belong to a newer execution. The inactive start should be revised for newer kill boundaries (or disambiguated by execution id) rather than remaining fixed.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM — clean state-machine fix with good test coverage.

Extended reasoning...

Overview

This PR fixes sandbox monitoring lifecycle bounds across three layers: the event-derived lifecycle model (), the monitoring chart inactive-window builder (), and the context/router that assembles the final . A helper is added and the duplicated logic is correctly removed from both the router and context.

Security Risks

None. The changes are purely client-side display logic and server-side data mapping with no auth, crypto, or permission-related code touched.

Level of Scrutiny

Moderate. The logic is non-trivial (a state machine with deduplication semantics), but the changes are self-contained and well-covered by new unit tests. The test cases exercise the key edge cases: duplicate paused events, duplicate killed events, and kill-then-resume sequences. The state machine is correct: clears ; a subsequent with still resets and , which is the intended behavior for a kill-then-resume flow.

Other Factors

No prior reviews from me. The Vercel build failure appears to be in the project (not ), and the preview build for is still pending at timeline snapshot time — this looks like a pre-existing infra issue unrelated to the PR logic. No bugs were found by the bug hunting system.

Comment thread src/features/dashboard/sandbox/monitoring/utils/chart-lifecycle.ts Outdated
Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 8840f8a. Configure here.

Comment thread src/features/dashboard/sandbox/monitoring/utils/chart-lifecycle.ts
Copy link
Copy Markdown

@matthewlouisbrockman matthewlouisbrockman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might lose some indicator on the start -> first metric but can do that in another pr

@ben-fornefeld ben-fornefeld merged commit c3fb2d5 into main Apr 6, 2026
11 checks passed
@ben-fornefeld ben-fornefeld deleted the fix/sandbox-monitoring-paused-lifecycle branch April 6, 2026 17:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants