feat: agents desktop recordings backend by hugodutka · Pull Request #23894 · coder/coder

hugodutka · 2026-04-01T10:31:28Z

This PR introduces screen recording of the computer use agent using the virtual desktop.

Screen recording is triggered by a wait_agent tool call. Recording is stopped by a successful wait_agent tool call or when there hasn't been any desktop activity for 10 minutes.
Recordings are handled by the portabledesktop cli via the record command. The videos are sped up in periods of inactivity.
Recordings are saved to the database to the chat_files table. There's a hard limit of 100MB per recording. Larger recordings are dropped.
A successful wait_agent on a computer use subagent tool call returns a recording_file_id, later allowing the frontend to display the corresponding video.

kylecarbs · 2026-04-01T12:22:14Z

codersdk/workspacesdk/agentconn.go

+// MaxRecordingSize is the largest desktop recording (in bytes)
+// that will be accepted. Used by both the agent-side stop handler
+// and the server-side storage pipeline.
+const MaxRecordingSize = 100 << 20 // 100 MB


How large are these? Do they ever actually reach 100 MB?

in my manual testing it's ~150kB/second, so you'd need a 10-12 minute recording to get to 100 MB. These videos are massively sped up though, so I think the agent would need to work for hours and hours to actually produce a 100MB video.

mafredri

Solid approach to the recording lifecycle. The three-state idempotency in StartRecording, stopOnce coordination, and the idle monitor's timer-reset-with-remaining-budget are well-engineered. The decision to leave recordings running on timeout for seamless continuation is a good design choice.

Severity count: 1 P1, 6 P2, 4 P3. No P0s.

The P1 is an authorization ordering issue: recordings start on a target agent before verifying the caller owns the target chat. The P2s cluster around two themes: (1) the stop/store pipeline sharing timeouts and buffering 100MB in memory, and (2) the API mutex held during blocking Desktop calls. All have straightforward fixes.

As Killua put it: "The decision to leave recordings running across wait_agent timeouts and resume seamlessly on the next call is well thought out. Huh, that's clean."

🤖 This review was automatically generated with Coder Agents.

coderd/x/chatd/subagent.go

coderd/x/chatd/recording.go

mafredri · 2026-04-01T12:42:25Z

agent/x/agentdesktop/api.go

+		return
+	}
+
+	a.mu.Lock()


P2 handleRecordingStart holds a.mu for the entire handler, including blocking Desktop calls. (Meruem P2, Killua P2, Mafuuu P2, Kurapika P3)

a.mu.Lock() at line 566 with defer. While held, calls a.desktop.Start(ctx) (line 576, process spawn on cold desktop) and a.desktop.StartRecording(ctx, recordingID) (line 584, process spawn). Meanwhile, every handleAction call blocks at line 143 waiting to update lastDesktopActionAt.

On a warm desktop the lock hold time is <10ms. But if the desktop process died and needs respawning, Start can block for hundreds of milliseconds to seconds. During that time, every mouse/keyboard/screenshot action from the computer-use agent stalls.

Fix: acquire a.mu only for the a.closed check and idle monitor map insert. Call desktop.Start and desktop.StartRecording outside the lock. The Desktop has its own lock.

a.mu.Lock() if a.closed { a.mu.Unlock(); /* return error */ } a.mu.Unlock() a.desktop.Start(ctx) a.desktop.StartRecording(ctx, recordingID) a.mu.Lock() // start idle monitor a.mu.Unlock()

🤖

This is intentional. Holding the mutex for the entire duration of handleRecordingStart makes it easier to reason whether the code is concurrency-safe. In a previous iteration the agent was being clever with mutex usage and that led to subtle race conditions. Won't change.

I would be more comfortable changing this behavior tbh. This can have repercussions all the way to shutting down the workspace. By keeping this server blocked from shutting down there's a chance we prevent proper workspace shutdown as well.

What are we actually guarding against here? Multiple Start's? An implementation that handles that could be fairly simple, in addition to what the agent suggested above something like:

a.mu.Lock() if a.closed { a.mu.Unlock() return } if ch := a.starting; ch != nil { a.mu.Unlock() <-ch // do something?, a.mu.Lock(), return, etc. } else { starting = make(chan struct{}) defer close(starting) a.starting = starting } a.mu.Unlock() // start...

This is just off the top of my head without fully understanding the goal of this design.

We should always be very careful about locking interactions with Close, it should be possible to Close whenever without waiting for long-running operations.

What are we actually guarding against here?

We're guarding against concurrent StartRecording and StopRecording calls. Holding a mutex for the entire duration of these functions makes it simple to reason about whether their interactions are safe.

It's true that this blocks all other desktop operations including Close, but I believe that the simplicity of the code is worth the tradeoff. Is it such a big deal that in the worst case - quire rare - scenario, where the workspace agent is being shut down at the exact same time as a desktop recording is being started, Close is delayed by a second?

If we implemented your suggestion, I'm also unsure how Close should work for processes in the starting state. Would it just leave them hanging?

agent/x/agentdesktop/portabledesktop.go

coderd/x/chatd/subagent.go

coderd/x/chatd/recording_internal_test.go

coderd/x/chatd/recording.go

agent/x/agentdesktop/api.go

coderd/x/chatd/recording_internal_test.go

agent/x/agentdesktop/api_test.go

mafredri

Round 2. 9 of 11 R1 findings resolved. The P1 (auth ordering) fix is correct: isSubagentDescendant now runs before any agent connection or recording start. The semaphore (cap 25) and 90s timeout address the memory and timing concerns.

Two R1 findings were contested (mutex scope, temp file lifecycle). Both defenses are reasonable given the current usage pattern.

Three new findings this round, all minor. The most notable gap is the portableDesktop recording logic (200+ lines) having zero unit tests at the implementation layer. The rest is cosmetic.

Severity count: 0 P0, 0 P1, 1 P2, 2 P3.

As Bisky noted about the idle timeout test: "TestRecordingIdleTimeoutResetsOnAction is especially well-crafted: quartz traps, precise clock advancement, verified both the 'reset' and 'eventual fire' branches. These are real gems."

🤖 This review was automatically generated with Coder Agents.

agent/x/agentdesktop/portabledesktop.go

coderd/x/chatd/subagent.go

agent/x/agentdesktop/api.go

mafredri

Cool feature 👍🏻. I do have some concerns about the agentdesktop API mutex design though for two reasons:

They may block workspace agent shutdown
They're called form chatd and may block there as well

The portabledesktop implementation has similar concerns, but I did not go as thoroughly through those.

I think the mutex design can easily be made more safe.

agent/x/agentdesktop/api.go

mafredri · 2026-04-01T15:49:31Z

agent/x/agentdesktop/api.go

+		return
+	}
+
+	a.mu.Lock()


I would be more comfortable changing this behavior tbh. This can have repercussions all the way to shutting down the workspace. By keeping this server blocked from shutting down there's a chance we prevent proper workspace shutdown as well.

What are we actually guarding against here? Multiple Start's? An implementation that handles that could be fairly simple, in addition to what the agent suggested above something like:

a.mu.Lock() if a.closed { a.mu.Unlock() return } if ch := a.starting; ch != nil { a.mu.Unlock() <-ch // do something?, a.mu.Lock(), return, etc. } else { starting = make(chan struct{}) defer close(starting) a.starting = starting } a.mu.Unlock() // start...

This is just off the top of my head without fully understanding the goal of this design.

We should always be very careful about locking interactions with Close, it should be possible to Close whenever without waiting for long-running operations.

mafredri · 2026-04-01T16:02:20Z

agent/x/agentdesktop/api.go

+		return
+	}
+
+	if err := a.desktop.StartRecording(ctx, recordingID); err != nil {


What if we clobber the recordingID here? E.g. network request intermittent failure, first one went through, then second one also goes through.

Shouldn't we have a check earlier before we retry Start + StartRecording?

What if we clobber the recordingID here?

What do you mean? I'm not sure if this answers your question, but repeated calls in short succession to StartRecording will be idempotent.

agent/x/agentdesktop/api.go

mafredri · 2026-04-01T16:08:33Z

agent/x/agentdesktop/api.go

+		return
+	}
+
+	if info.Size() > workspacesdk.MaxRecordingSize {


Do we actually have any protection that stops recording once we hit max size? Should we consider returning a truncated video?

We don't. I'd consider this a possible future improvement. The portabledesktop cli would need support for truncation.

agent/x/agentdesktop/portabledesktop.go

mafredri

Round 5, post-refactor. The restructuring is a significant improvement: recording management consolidated in portableDesktop under a single mutex, the API layer is thin, and the RecordingArtifact abstraction cleanly separates file ownership from the recording lifecycle. Most of the human review findings from R4 are resolved.

Severity count: 0 P0, 0 P1, 5 P2, 3 P3.

The most interesting new finding: on Windows, interruptRecordingProcess calls Kill() but the caller never marks rec.killed = true, so every graceful stop silently serves a corrupt MP4 as valid. Currently dormant (X11 required), but structurally wrong. As Hisoka put it: "The code presents one face (interruptRecordingProcess means 'graceful') and hides another (Kill() means 'destroyed'). What happens when someone notices the videos don't play?"

Gon identified a naming collision worth fixing: RecordActivity() in a recording package reads as "start recording the activity" rather than "note that activity happened."

A few Gon observations not posted inline: the 15s/30s/90s timeout chain across three files forms an undocumented dependency. If any timeout is changed independently, the chain breaks silently.

🤖 This review was automatically generated with Coder Agents.

agent/x/agentdesktop/portabledesktop_stop_windows.go

agent/x/agentdesktop/portabledesktop.go

mafredri · 2026-04-02T15:40:55Z

agent/x/agentdesktop/portabledesktop.go

+			lastAction := time.Unix(0, lastNano)
+			elapsed := p.clock.Since(lastAction)
+			if elapsed >= idleTimeout {
+				p.mu.Lock()


P2 Idle monitor holds p.mu for up to 15 seconds during graceful stop, blocking Close() and delaying workspace shutdown. (Mafuuu P2)

monitorRecordingIdle (line 736) acquires p.mu, then calls lockedStopRecordingProcess(ctx, rec, false). The false flag means SIGINT followed by a 15-second timer wait. p.mu is held for the entire duration. Close() needs p.mu.Lock() to begin cleanup. If the idle monitor is mid-stop, Close is blocked for up to 15 seconds.

The prior defense (#comment-3028467808) argued "delayed by a second" was acceptable. That was about Start. This is a 15-second graceful SIGINT stop triggered by a background goroutine. The idle monitor knows nobody is actively waiting for a clean artifact, so force=true (or releasing mu before the wait) would avoid blocking Close.

🤖

Waiting on these processes is essential to create valid recordings. If we used force=true, recordings would get corrupted. It's possible that an idle recording will be read by a future tool call.

agent/x/agentdesktop/desktop.go

agent/x/agentdesktop/portabledesktop.go

codersdk/workspacesdk/agentconn.go

agent/x/agentdesktop/portabledesktop.go

mafredri

Thanks for the refactor, looks a lot better 👍🏻. I flagged a few more/new things but don't foresee that fixing them should cause any issues.

agent/x/agentdesktop/api.go

agent/x/agentdesktop/portabledesktop.go

mafredri · 2026-04-02T16:01:00Z

agent/x/agentdesktop/portabledesktop.go

+		}
+	}
+	return active
+}


This works, so consider this minor bike-shedding (optional), but each active recording could instead increment and decrement an int (eg. atomic.Add*, etc).

agent/x/agentdesktop/portabledesktop.go

mafredri · 2026-04-02T16:07:32Z

coderd/x/chatd/recording.go

+		OrganizationID: ws.OrganizationID,
+		Name:           fmt.Sprintf("recording-%s.mp4", p.clock.Now().UTC().Format("2006-01-02T15-04-05Z")),
+		Mimetype:       "video/mp4",
+		Data:           data,


Wish we would switch from lib/pq to pgx so we could stream the data rather than allocating it upfront. 🥲

github-actions bot added the community Pull Requests and issues created by the community. label Apr 1, 2026

github-actions bot assigned hugodutka Apr 1, 2026

hugodutka mentioned this pull request Apr 1, 2026

feat(site): agents desktop recordings frontend #23895

Merged

hugodutka force-pushed the hugodutka/desktop-recordings-backend branch from 0116573 to db2414f Compare April 1, 2026 12:12

hugodutka marked this pull request as ready for review April 1, 2026 12:12

kylecarbs reviewed Apr 1, 2026

View reviewed changes

mafredri reviewed Apr 1, 2026

View reviewed changes

jakehwll removed the community Pull Requests and issues created by the community. label Apr 1, 2026

hugodutka force-pushed the hugodutka/desktop-recordings-backend branch from db2414f to 5e16133 Compare April 1, 2026 14:29

mafredri reviewed Apr 1, 2026

View reviewed changes

agent/x/agentdesktop/portabledesktop.go Show resolved Hide resolved

coderd/x/chatd/subagent.go Outdated Show resolved Hide resolved

agent/x/agentdesktop/api.go Show resolved Hide resolved

hugodutka force-pushed the hugodutka/desktop-recordings-backend branch from 5e16133 to 51d8d08 Compare April 1, 2026 15:34

mafredri reviewed Apr 1, 2026

View reviewed changes

hugodutka force-pushed the hugodutka/desktop-recordings-backend branch 8 times, most recently from e172782 to e6fae35 Compare April 2, 2026 15:23

mafredri reviewed Apr 2, 2026

View reviewed changes

hugodutka force-pushed the hugodutka/desktop-recordings-backend branch 2 times, most recently from 4bf9386 to 31df469 Compare April 2, 2026 16:09

mafredri approved these changes Apr 2, 2026

View reviewed changes

hugodutka force-pushed the hugodutka/desktop-recordings-backend branch 2 times, most recently from bbe92eb to 96af52e Compare April 2, 2026 16:40

hugodutka enabled auto-merge (squash) April 2, 2026 16:42

feat: agents desktop recordings backend

88fcec6

hugodutka force-pushed the hugodutka/desktop-recordings-backend branch from 96af52e to 88fcec6 Compare April 2, 2026 16:59

hugodutka merged commit 17dec2a into main Apr 2, 2026
25 of 26 checks passed

hugodutka deleted the hugodutka/desktop-recordings-backend branch April 2, 2026 17:23

github-actions bot locked and limited conversation to collaborators Apr 2, 2026

Conversation

hugodutka commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mafredri left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hugodutka Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mafredri left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mafredri left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mafredri left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mafredri left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

hugodutka commented Apr 1, 2026 •

edited

Loading

hugodutka Apr 2, 2026 •

edited

Loading