Skip to content

feat: agents desktop recordings backend#23894

Merged
hugodutka merged 1 commit intomainfrom
hugodutka/desktop-recordings-backend
Apr 2, 2026
Merged

feat: agents desktop recordings backend#23894
hugodutka merged 1 commit intomainfrom
hugodutka/desktop-recordings-backend

Conversation

@hugodutka
Copy link
Copy Markdown
Contributor

@hugodutka hugodutka commented Apr 1, 2026

This PR introduces screen recording of the computer use agent using the virtual desktop.

  • Screen recording is triggered by a wait_agent tool call. Recording is stopped by a successful wait_agent tool call or when there hasn't been any desktop activity for 10 minutes.
  • Recordings are handled by the portabledesktop cli via the record command. The videos are sped up in periods of inactivity.
  • Recordings are saved to the database to the chat_files table. There's a hard limit of 100MB per recording. Larger recordings are dropped.
  • A successful wait_agent on a computer use subagent tool call returns a recording_file_id, later allowing the frontend to display the corresponding video.

@github-actions github-actions bot added the community Pull Requests and issues created by the community. label Apr 1, 2026
@hugodutka hugodutka force-pushed the hugodutka/desktop-recordings-backend branch from 0116573 to db2414f Compare April 1, 2026 12:12
@hugodutka hugodutka marked this pull request as ready for review April 1, 2026 12:12
// MaxRecordingSize is the largest desktop recording (in bytes)
// that will be accepted. Used by both the agent-side stop handler
// and the server-side storage pipeline.
const MaxRecordingSize = 100 << 20 // 100 MB
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How large are these? Do they ever actually reach 100 MB?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in my manual testing it's ~150kB/second, so you'd need a 10-12 minute recording to get to 100 MB. These videos are massively sped up though, so I think the agent would need to work for hours and hours to actually produce a 100MB video.

Copy link
Copy Markdown
Member

@mafredri mafredri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solid approach to the recording lifecycle. The three-state idempotency in StartRecording, stopOnce coordination, and the idle monitor's timer-reset-with-remaining-budget are well-engineered. The decision to leave recordings running on timeout for seamless continuation is a good design choice.

Severity count: 1 P1, 6 P2, 4 P3. No P0s.

The P1 is an authorization ordering issue: recordings start on a target agent before verifying the caller owns the target chat. The P2s cluster around two themes: (1) the stop/store pipeline sharing timeouts and buffering 100MB in memory, and (2) the API mutex held during blocking Desktop calls. All have straightforward fixes.

As Killua put it: "The decision to leave recordings running across wait_agent timeouts and resume seamlessly on the next call is well thought out. Huh, that's clean."

🤖 This review was automatically generated with Coder Agents.

return
}

a.mu.Lock()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 handleRecordingStart holds a.mu for the entire handler, including blocking Desktop calls. (Meruem P2, Killua P2, Mafuuu P2, Kurapika P3)

a.mu.Lock() at line 566 with defer. While held, calls a.desktop.Start(ctx) (line 576, process spawn on cold desktop) and a.desktop.StartRecording(ctx, recordingID) (line 584, process spawn). Meanwhile, every handleAction call blocks at line 143 waiting to update lastDesktopActionAt.

On a warm desktop the lock hold time is <10ms. But if the desktop process died and needs respawning, Start can block for hundreds of milliseconds to seconds. During that time, every mouse/keyboard/screenshot action from the computer-use agent stalls.

Fix: acquire a.mu only for the a.closed check and idle monitor map insert. Call desktop.Start and desktop.StartRecording outside the lock. The Desktop has its own lock.

a.mu.Lock()
if a.closed { a.mu.Unlock(); /* return error */ }
a.mu.Unlock()

a.desktop.Start(ctx)
a.desktop.StartRecording(ctx, recordingID)

a.mu.Lock()
// start idle monitor
a.mu.Unlock()

🤖

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is intentional. Holding the mutex for the entire duration of handleRecordingStart makes it easier to reason whether the code is concurrency-safe. In a previous iteration the agent was being clever with mutex usage and that led to subtle race conditions. Won't change.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would be more comfortable changing this behavior tbh. This can have repercussions all the way to shutting down the workspace. By keeping this server blocked from shutting down there's a chance we prevent proper workspace shutdown as well.

What are we actually guarding against here? Multiple Start's? An implementation that handles that could be fairly simple, in addition to what the agent suggested above something like:

a.mu.Lock()
if a.closed {
	a.mu.Unlock()
	return
}
if ch := a.starting; ch != nil {
	a.mu.Unlock()
	<-ch
	// do something?, a.mu.Lock(), return, etc.
} else {
	starting = make(chan struct{})
	defer close(starting)
	a.starting = starting
}
a.mu.Unlock()

// start...

This is just off the top of my head without fully understanding the goal of this design.

We should always be very careful about locking interactions with Close, it should be possible to Close whenever without waiting for long-running operations.

Copy link
Copy Markdown
Contributor Author

@hugodutka hugodutka Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are we actually guarding against here?

We're guarding against concurrent StartRecording and StopRecording calls. Holding a mutex for the entire duration of these functions makes it simple to reason about whether their interactions are safe.

It's true that this blocks all other desktop operations including Close, but I believe that the simplicity of the code is worth the tradeoff. Is it such a big deal that in the worst case - quire rare - scenario, where the workspace agent is being shut down at the exact same time as a desktop recording is being started, Close is delayed by a second?

If we implemented your suggestion, I'm also unsure how Close should work for processes in the starting state. Would it just leave them hanging?

@jakehwll jakehwll removed the community Pull Requests and issues created by the community. label Apr 1, 2026
@hugodutka hugodutka force-pushed the hugodutka/desktop-recordings-backend branch from db2414f to 5e16133 Compare April 1, 2026 14:29
Copy link
Copy Markdown
Member

@mafredri mafredri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Round 2. 9 of 11 R1 findings resolved. The P1 (auth ordering) fix is correct: isSubagentDescendant now runs before any agent connection or recording start. The semaphore (cap 25) and 90s timeout address the memory and timing concerns.

Two R1 findings were contested (mutex scope, temp file lifecycle). Both defenses are reasonable given the current usage pattern.

Three new findings this round, all minor. The most notable gap is the portableDesktop recording logic (200+ lines) having zero unit tests at the implementation layer. The rest is cosmetic.

Severity count: 0 P0, 0 P1, 1 P2, 2 P3.

As Bisky noted about the idle timeout test: "TestRecordingIdleTimeoutResetsOnAction is especially well-crafted: quartz traps, precise clock advancement, verified both the 'reset' and 'eventual fire' branches. These are real gems."

🤖 This review was automatically generated with Coder Agents.

@hugodutka hugodutka force-pushed the hugodutka/desktop-recordings-backend branch from 5e16133 to 51d8d08 Compare April 1, 2026 15:34
Copy link
Copy Markdown
Member

@mafredri mafredri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool feature 👍🏻. I do have some concerns about the agentdesktop API mutex design though for two reasons:

  1. They may block workspace agent shutdown
  2. They're called form chatd and may block there as well

The portabledesktop implementation has similar concerns, but I did not go as thoroughly through those.

I think the mutex design can easily be made more safe.

return
}

a.mu.Lock()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would be more comfortable changing this behavior tbh. This can have repercussions all the way to shutting down the workspace. By keeping this server blocked from shutting down there's a chance we prevent proper workspace shutdown as well.

What are we actually guarding against here? Multiple Start's? An implementation that handles that could be fairly simple, in addition to what the agent suggested above something like:

a.mu.Lock()
if a.closed {
	a.mu.Unlock()
	return
}
if ch := a.starting; ch != nil {
	a.mu.Unlock()
	<-ch
	// do something?, a.mu.Lock(), return, etc.
} else {
	starting = make(chan struct{})
	defer close(starting)
	a.starting = starting
}
a.mu.Unlock()

// start...

This is just off the top of my head without fully understanding the goal of this design.

We should always be very careful about locking interactions with Close, it should be possible to Close whenever without waiting for long-running operations.

return
}

if err := a.desktop.StartRecording(ctx, recordingID); err != nil {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we clobber the recordingID here? E.g. network request intermittent failure, first one went through, then second one also goes through.

Shouldn't we have a check earlier before we retry Start + StartRecording?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we clobber the recordingID here?

What do you mean? I'm not sure if this answers your question, but repeated calls in short succession to StartRecording will be idempotent.

return
}

if info.Size() > workspacesdk.MaxRecordingSize {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we actually have any protection that stops recording once we hit max size? Should we consider returning a truncated video?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't. I'd consider this a possible future improvement. The portabledesktop cli would need support for truncation.

@hugodutka hugodutka force-pushed the hugodutka/desktop-recordings-backend branch 8 times, most recently from e172782 to e6fae35 Compare April 2, 2026 15:23
Copy link
Copy Markdown
Member

@mafredri mafredri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Round 5, post-refactor. The restructuring is a significant improvement: recording management consolidated in portableDesktop under a single mutex, the API layer is thin, and the RecordingArtifact abstraction cleanly separates file ownership from the recording lifecycle. Most of the human review findings from R4 are resolved.

Severity count: 0 P0, 0 P1, 5 P2, 3 P3.

The most interesting new finding: on Windows, interruptRecordingProcess calls Kill() but the caller never marks rec.killed = true, so every graceful stop silently serves a corrupt MP4 as valid. Currently dormant (X11 required), but structurally wrong. As Hisoka put it: "The code presents one face (interruptRecordingProcess means 'graceful') and hides another (Kill() means 'destroyed'). What happens when someone notices the videos don't play?"

Gon identified a naming collision worth fixing: RecordActivity() in a recording package reads as "start recording the activity" rather than "note that activity happened."

A few Gon observations not posted inline: the 15s/30s/90s timeout chain across three files forms an undocumented dependency. If any timeout is changed independently, the chain breaks silently.

🤖 This review was automatically generated with Coder Agents.

lastAction := time.Unix(0, lastNano)
elapsed := p.clock.Since(lastAction)
if elapsed >= idleTimeout {
p.mu.Lock()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Idle monitor holds p.mu for up to 15 seconds during graceful stop, blocking Close() and delaying workspace shutdown. (Mafuuu P2)

monitorRecordingIdle (line 736) acquires p.mu, then calls lockedStopRecordingProcess(ctx, rec, false). The false flag means SIGINT followed by a 15-second timer wait. p.mu is held for the entire duration. Close() needs p.mu.Lock() to begin cleanup. If the idle monitor is mid-stop, Close is blocked for up to 15 seconds.

The prior defense (#comment-3028467808) argued "delayed by a second" was acceptable. That was about Start. This is a 15-second graceful SIGINT stop triggered by a background goroutine. The idle monitor knows nobody is actively waiting for a clean artifact, so force=true (or releasing mu before the wait) would avoid blocking Close.

🤖

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Waiting on these processes is essential to create valid recordings. If we used force=true, recordings would get corrupted. It's possible that an idle recording will be read by a future tool call.

@hugodutka hugodutka force-pushed the hugodutka/desktop-recordings-backend branch 2 times, most recently from 4bf9386 to 31df469 Compare April 2, 2026 16:09
Copy link
Copy Markdown
Member

@mafredri mafredri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the refactor, looks a lot better 👍🏻. I flagged a few more/new things but don't foresee that fixing them should cause any issues.

}
}
return active
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works, so consider this minor bike-shedding (optional), but each active recording could instead increment and decrement an int (eg. atomic.Add*, etc).

OrganizationID: ws.OrganizationID,
Name: fmt.Sprintf("recording-%s.mp4", p.clock.Now().UTC().Format("2006-01-02T15-04-05Z")),
Mimetype: "video/mp4",
Data: data,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wish we would switch from lib/pq to pgx so we could stream the data rather than allocating it upfront. 🥲

@hugodutka hugodutka force-pushed the hugodutka/desktop-recordings-backend branch 2 times, most recently from bbe92eb to 96af52e Compare April 2, 2026 16:40
@hugodutka hugodutka enabled auto-merge (squash) April 2, 2026 16:42
@hugodutka hugodutka force-pushed the hugodutka/desktop-recordings-backend branch from 96af52e to 88fcec6 Compare April 2, 2026 16:59
@hugodutka hugodutka merged commit 17dec2a into main Apr 2, 2026
25 of 26 checks passed
@hugodutka hugodutka deleted the hugodutka/desktop-recordings-backend branch April 2, 2026 17:23
@github-actions github-actions bot locked and limited conversation to collaborators Apr 2, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants