Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 5 additions & 1 deletion .oxfmtrc.json
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,10 @@
"dist",
"node_modules",
"package-lock.json",
"aube-lock.yaml"
"aube-lock.yaml",
"dogfood/agent-uses-agent-tty/README.md",
"dogfood/agent-uses-agent-tty/promoted-run-summary.md",
"dogfood/agent-uses-agent-tty/manifest.json",
"dogfood/agent-uses-agent-tty/artifacts"
]
}
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@

## Changed

- The README hero GIF (`assets/hero.tape`) and the `dogfood/agent-uses-agent-tty/` Codex/Claude recordings now record inside a tmux two-pane split: the agent (or, in the hero, plain `agent-tty` CLI calls) drives a session on the left while `agent-tty dashboard` live-mirrors it on the right — showing the dashboard reacting as sessions are created and modified. Both panes share one `AGENT_TTY_HOME` so the dashboard auto-follows the newest session; the status bar is disabled so VHS's whole-screen `Wait+Screen` scrape stays unambiguous, and each run uses an isolated, reaped tmux server socket. The hero hides the tmux split plumbing and instead launches the dashboard on camera — typing `agent-tty dashboard` into the right pane and hopping back with the tmux prefix — and its panes/session run `bash --norc` with a minimal prompt so the live mirror stays free of personal shell-prompt clutter. It runs against this checkout's freshly-built CLI, since `agent-tty dashboard` is unreleased. A new `mise run demo:hero` task (which `depends` on `build`) regenerates the hero GIF, joining `mise run demo:agent-uses-agent-tty` for the agent recordings. `tmux` (`>= 3.1`, pinned to `3.6` in `mise`) is now a recorder prerequisite alongside `vhs`/`ttyd`/`ffmpeg`. The agent recordings now run concurrently via a bounded worker pool (`--concurrency`, default `2`) — each run is mostly an idle review-window sleep, so overlapping the two agents roughly halves wall-clock; raising the cap also overlaps an agent's own retry attempts at the cost of more CPU and shared-account load, while same-agent attempts stay serialized so two sessions of one account never record at once.
- Spawned shells now default `PROMPT_EOL_MARK=` (empty) in the session environment, suppressing the inverse-video `%` end-of-partial-line marker that `zsh` prints when output lacks a trailing newline. agent-tty strips a hidden completion-marker postamble after each `run`, which desynced the rendered cursor and left that `%` in snapshots, screenshots, and recordings; the default keeps captures clean. The marker is zsh-only and inert in other shells. Opt back in per session with `agent-tty create --env PROMPT_EOL_MARK='%B%S%#%s%b' -- <shell>` to restore zsh's styled default (a lone `'%'` expands to nothing), or pass any explicit `--env PROMPT_EOL_MARK=...` value. The default is applied at PTY spawn time and is not written to the manifest, so `inspect`, `list`, and `create --json` env maps are unchanged ([#114](https://github.com/coder/agent-tty/pull/114)).
- `inspect` collects renderer state and the session snapshot in a single synchronous tick before awaiting, so concurrent RPC handlers cannot interleave a mutated renderer state with a stale session snapshot ([#104](https://github.com/coder/agent-tty/pull/104)).

Expand Down
6 changes: 3 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ Drive and inspect long-lived terminal sessions from the CLI, with reviewable sna
[![License: Apache-2.0](https://img.shields.io/badge/license-Apache--2.0-blue.svg)](./LICENSE)
![Node](https://img.shields.io/node/v/agent-tty)

![agent-tty: drive a terminal session and inspect it as reviewable text](./assets/hero.gif)
![agent-tty: drive a terminal session and watch it live in the dashboard](./assets/hero.gif)

`agent-tty` keeps a real PTY-backed terminal session alive across separate CLI invocations. You `run` a command in it, `wait` for the screen to reach a condition instead of sleeping, then capture what happened as a semantic text snapshot, a PNG screenshot, an asciinema-compatible `.cast`, or a WebM. The recording is the point: a human — or an AI coding agent — can replay and verify exactly what the terminal did, instead of trusting a blind script.

Expand Down Expand Up @@ -101,8 +101,8 @@ Real Codex and Claude TUIs discovering the `agent-tty` skill, driving `nvim --cl
<th width="50%">Claude</th>
</tr>
<tr>
<td><video src="https://github.com/user-attachments/assets/27cc3b9b-9b91-4cd9-a3a5-1bbb61c33e19" controls width="100%"></video></td>
<td><video src="https://github.com/user-attachments/assets/36221ef7-97c4-4b06-b673-21ac623a5f0a" controls width="100%"></video></td>
<td><video src="https://github.com/user-attachments/assets/f1823164-330c-4962-8adf-2b825080e06f" controls width="100%"></video></td>
<td><video src="https://github.com/user-attachments/assets/966bed35-9383-444e-b06a-1d103ccba49a" controls width="100%"></video></td>
</tr>
</table>

Expand Down
Binary file modified assets/hero.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
150 changes: 102 additions & 48 deletions assets/hero.tape
Original file line number Diff line number Diff line change
@@ -1,81 +1,135 @@
# assets/hero.tape — generates the README hero GIF with VHS (charmbracelet/vhs).
#
# Run from the repo root:
# vhs assets/hero.tape # writes assets/hero.gif
# Regenerate (from the repo root):
# mise run demo:hero # builds the dev CLI, then runs this tape
#
# Prereqs (vhs 0.11.0 + jq are pinned in mise's [tools]):
# - `agent-tty` on PATH (npm i -g agent-tty, or `npm run build && npm link`)
# - `jq`
# It uses the fast, browser-free `libghostty-vt` renderer so the GIF stays short
# (no Chromium boot). For a crisper, tighter, more "terminal" look, tune
# FontFamily (must be installed!), FontSize, LineHeight, LetterSpacing, and
# Padding below, plus the per-step Sleeps. The session is sized to 72x18 so a
# 26pt font fills the frame without the 80-col default wrapping. Regenerate,
# then replace the HERO DEMO comment block in README.md with:
# ![agent-tty: drive a terminal session and inspect it as reviewable text](./assets/hero.gif)
# That task `depends` on `build` and puts vhs/ffmpeg/tmux/jq on PATH. To run the
# tape directly instead, build first and provide those tools yourself:
# npm run build && vhs assets/hero.tape
#
# Why build first: this hero shows `agent-tty dashboard`, which is unreleased —
# a globally-installed `agent-tty` (e.g. `npm i -g agent-tty`) won't have it. So
# the hidden setup below points PATH at THIS checkout's freshly-built CLI
# (dist/cli/main.js) instead of `Require agent-tty`. It also needs the
# dashboard's optional renderer `@coder/libghostty-vt-node` (a normal `npm i`
# fetches it; check with `agent-tty doctor`).
#
# Prereqs (jq + tmux are pinned in mise's [tools]; tmux >= 3.1 for `-l 50%`):
# - `npm run build` has been run in this checkout
# - `jq`, `tmux`
# - `ttyd` (VHS needs it): mise installs it on Linux, but it has no macOS
# binary — on macOS run `brew install ttyd` yourself (mise finds it on PATH).
#
# Structure: the hidden setup builds a clean tmux split (left = operator shell,
# right = idle shell). The panes and the agent-tty session all run `bash --norc`
# (with a minimal `$ ` prompt on the panes via PS1) so they — and the dashboard's
# live mirror — stay free of personal prompt clutter; `--norc` is what keeps the
# user's interactive shell config (e.g. zsh `%{…%}` prompt escapes) out of the
# frame. The visible recording then *types* `agent-tty dashboard` into the right
# pane (so viewers see how it's launched), hops back to the left with the tmux
# prefix, and drives a session whose changes the dashboard mirrors live.
# AGENT_TTY_HOME is exported before tmux so the server and both panes share it.
# (For a minimal `$ ` prompt in the mirror too, add `--env 'PS1=$ '` to create.)
#
# TUNING (do a visual pass after regenerating): Width/Height/FontSize and the
# split percentage (`-l 50%`) trade off readability of the two panes. The split
# is 50/50; FontSize is 18 so the longest CLI lines fit one un-split half — bump
# the font only if you also widen the frame, or the operator-pane lines wrap.
# The session defaults to 80x24, a touch wider than the half-width dashboard
# pane, so its mirror clips to the top-left (where the `echo` output lands); add
# `--cols/--rows` to `create` for a tighter, fully-visible mirror. FontFamily fallbacks if FiraCode
# isn't installed: "Menlo", "SF Mono", "JetBrains Mono". Keep the README hero
# pointing at ./assets/hero.gif:
# ![agent-tty: drive a terminal session and watch it live in the dashboard](./assets/hero.gif)

Output assets/hero.gif

Require agent-tty
Require jq
Require tmux

Set Shell bash
# Use a font that's actually installed (VHS silently falls back to an ugly
# default otherwise). FiraCode Nerd Font Mono is on this machine and reads clean.
# Bulletproof alternatives: "Menlo", "SF Mono", "Monaco", "JetBrains Mono".
Set FontFamily "FiraCode Nerd Font Mono"
Set FontSize 26
Set Width 1280
Set Height 640
Set Padding 16 # tighter frame (was 28)
Set LineHeight 1.0 # tight lines; nudge to ~1.15 if they touch
Set LetterSpacing 0 # no extra tracking
Set Theme "Catppuccin Mocha" # any VHS theme works; try "Dracula", "Nord"
# 18pt (not 20) so each half of the 50/50 split is wide enough for the longest
# CLI lines (the `create … | jq` and `run --no-wait …` lines) without wrapping.
Set FontSize 18
Set Width 1920
Set Height 720
Set Padding 16
Set LineHeight 1.0
Set LetterSpacing 0
Set Theme "Catppuccin Mocha"
Set TypingSpeed 40ms
Set PlaybackSpeed 1.0

# --- hidden setup: isolated home + fast native renderer, then a clean screen ---
# --- hidden setup: dev CLI on PATH + a clean tmux split (operator | idle) ---
# AGENT_TTY_HOME, PATH, and PS1 are exported BEFORE tmux so the server and both
# `bash --norc` panes inherit them. The split runs idle shells; the dashboard is
# launched visibly below. kill-server first makes regeneration idempotent.
Hide
Type "export AGENT_TTY_HOME=$(mktemp -d) AGENT_TTY_RENDERER=libghostty-vt" Enter
Type "clear" Enter
Type "export HERO_BIN=$(mktemp -d)" Enter
Type 'chmod +x dist/cli/main.js && ln -sf "$PWD/dist/cli/main.js" "$HERO_BIN/agent-tty"' Enter
Type 'export PATH="$HERO_BIN:$PATH"' Enter
Type "export PS1='$ '" Enter
Type "tmux -L hero kill-server 2>/dev/null; true" Enter
Type "tmux -f /dev/null -L hero new-session -d -s hero 'bash --norc' \; set -g status off \; split-window -h -l 50% -t hero 'bash --norc' \; select-pane -t hero.0 \; attach -t hero" Enter
Show

Sleep 800ms
Type "# open a long-lived terminal session" Enter
Sleep 1000ms
Type "# drive a real terminal session with plain CLI calls" Enter
Sleep 500ms
Type 'SID=$(agent-tty create --json --cols 72 --rows 18 -- bash | jq -r .result.sessionId)' Enter
Type 'SID=$(agent-tty create --json -- bash --norc | jq -r .result.sessionId)' Enter
Sleep 1200ms

Type "# run a command inside it" Enter
Type "# open the dashboard on the right to watch it live →" Enter
Sleep 600ms
# hop to the right pane with the tmux prefix (Ctrl+B then o = next pane)
Ctrl+B
Type "o"
Sleep 500ms
Type "agent-tty dashboard --all" Enter
Sleep 2200ms
# hop back to the left pane to keep driving the session
Ctrl+B
Type "o"
Sleep 600ms

Type "# run a command inside it — watch the dashboard mirror it live →" Enter
Sleep 500ms
Type 'agent-tty run "$SID" "echo hello from agent-tty"' Enter
Sleep 1500ms
Sleep 2200ms

Type "# wait for the screen — no sleeps, no grep" Enter
Type "# fire a slow command, then wait for its OUTPUT — no sleeps, no polling →" Enter
Sleep 500ms
Type 'agent-tty wait "$SID" --text "hello from agent-tty"' Enter
Sleep 1500ms
# Two things make the wait meaningful here:
# 1. $RANDOM is single-quoted so the SESSION expands it — the echoed command
# line shows the literal "$RANDOM", not a number.
# 2. We therefore wait on a DIGIT after the colon (--regex), NOT the phrase
# "your random number is:". That phrase is already on screen from the echoed
# command, so a --text wait would match instantly and prove nothing; the
# digits only appear once the command actually prints, after the 6s sleep.
Type "agent-tty run --no-wait $SID 'sleep 6; echo your random number is: $RANDOM'" Enter
# `wait` is typed right after the fire (no comment between) so the bulk of the 6s
# sleep elapses WHILE wait blocks — the deterministic wait is visible on camera.
Sleep 400ms
Type 'agent-tty wait "$SID" --regex "random number is: [0-9]+"' Enter
Sleep 5000ms

Type "# inspect the rendered screen as text you can diff" Enter
Type "# and snapshot the result as text you can diff" Enter
Sleep 500ms
Type 'agent-tty snapshot "$SID" --format text' Enter
Sleep 2800ms

Type "# screenshots, asciicasts and WebM export come from the same session" Enter
Sleep 1200ms

# --- hidden teardown ---
# --- hidden teardown: stays hidden to the end, so the GIF's last frame is the
# split — NOT the bare outer shell that `kill-server` drops back to (a trailing
# `Show` here flashes that shell + a "[server exited]" line, which looks ugly) ---
Hide
Type 'agent-tty destroy "$SID" >/dev/null 2>&1' Enter
Show
Sleep 500ms

# -----------------------------------------------------------------------------
# ALTERNATIVE — dogfood it: record the loop with agent-tty itself, then convert.
# Drive the same create/run/wait/snapshot sequence, then:
# agent-tty record export "$SID" --format webm --out demo.webm
# ffmpeg -i demo.webm -vf "fps=12,scale=1200:-1:flags=lanczos" assets/hero.gif
# This makes the hero GIF literally the tool's own output ("recorded by the tool
# it documents"), at the cost of the Chromium-backed ghostty-web render path.
# ffmpeg isn't in mise's [tools] (it can't be cross-locked on Linux via conda);
# install it yourself for this path (brew install ffmpeg / apt-get install ffmpeg).
Type "tmux -L hero kill-server" Enter
# kill-server returns to the outer shell, which still holds these vars; clean up
# the temp home + bin so repeated local runs don't litter $TMPDIR.
Type 'rm -rf "$AGENT_TTY_HOME" "$HERO_BIN"' Enter
# Hidden settle time so the cleanup completes before the tape ends (no Show).
Sleep 1s
4 changes: 2 additions & 2 deletions dogfood/agent-uses-agent-tty/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,13 +14,13 @@ The Outer Hero Demo column embeds the uploaded H.264 MP4 recordings as inline Gi
</tr>
<tr>
<td>Codex</td>
<td><video src="https://github.com/user-attachments/assets/27cc3b9b-9b91-4cd9-a3a5-1bbb61c33e19" controls width="320"></video></td>
<td><video src="https://github.com/user-attachments/assets/f1823164-330c-4962-8adf-2b825080e06f" controls width="320"></video></td>
<td><a href="./artifacts/codex-inner-nvim.cast">cast</a>, <a href="./artifacts/codex-inner-nvim.webm">WebM</a></td>
<td><a href="./artifacts/codex-final-file-proof.txt">proof</a></td>
</tr>
<tr>
<td>Claude</td>
<td><video src="https://github.com/user-attachments/assets/36221ef7-97c4-4b06-b673-21ac623a5f0a" controls width="320"></video></td>
<td><video src="https://github.com/user-attachments/assets/966bed35-9383-444e-b06a-1d103ccba49a" controls width="320"></video></td>
<td><a href="./artifacts/claude-inner-nvim.cast">cast</a>, <a href="./artifacts/claude-inner-nvim.webm">WebM</a></td>
<td><a href="./artifacts/claude-final-file-proof.txt">proof</a></td>
</tr>
Expand Down
9 changes: 6 additions & 3 deletions dogfood/agent-uses-agent-tty/VIDEO_PLAYBACK.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,13 +35,16 @@ mise run demo:agent-uses-agent-tty:upload-assets
The task uses the pinned `ffmpeg`/`ffprobe` from `mise.toml`. For each agent it
prepends ~0.3s of `artifacts/<agent>-thumbnail.png` as the opening frames, encodes
H.264 MP4, writes ffprobe metadata, and writes checksums under `.debug/video-upload/`.
The upload MP4 is encoded at the recording's own probed resolution, so it always
preserves the source aspect ratio (no squish if the recording dimensions change).

Expected constraints for the promoted 2026-05-21 recordings:
Expected constraints for the current promoted recordings (dimensions track the
recording resolution, currently 1920x900):

| Agent | Upload file | Expected codec | Expected dimensions | Expected size |
| ------ | ------------------------------------------- | ----------------- | ------------------- | ------------- |
| Codex | `.debug/video-upload/codex-outer-h264.mp4` | H.264 / `yuv420p` | 1600x900 | ~3.5 MB |
| Claude | `.debug/video-upload/claude-outer-h264.mp4` | H.264 / `yuv420p` | 1600x900 | ~3.4 MB |
| Codex | `.debug/video-upload/codex-outer-h264.mp4` | H.264 / `yuv420p` | 1920x900 | ~3.4 MB |
| Claude | `.debug/video-upload/claude-outer-h264.mp4` | H.264 / `yuv420p` | 1920x900 | ~4.0 MB |

Both expected sizes are below GitHub's 10 MB video attachment limit for free plans.

Expand Down
Loading
Loading