Skip to content

feat(broker): snapshot module + dump-pty debug command (#869)#870

Merged
willwashburn merged 5 commits into
mainfrom
feat/869-snapshot-dump-pty
May 17, 2026
Merged

feat(broker): snapshot module + dump-pty debug command (#869)#870
willwashburn merged 5 commits into
mainfrom
feat/869-snapshot-dump-pty

Conversation

@willwashburn
Copy link
Copy Markdown
Member

Summary

Adds a self-contained PTY screen snapshot for running workers and exposes
it three ways:

  • Snapshot::capture(&pty) — copies the alacritty VT grid out from under
    the term lock, then renders to either plain UTF-8 (to_plain) or an
    ANSI reproduction byte stream (to_ansi). The render methods run
    against owned data so they cannot block the PTY reader thread.
  • GET /api/spawned/{name}/snapshot?format=plain|ansi — the broker
    proxies the request to the worker subprocess via a new snapshot_pty
    snapshot_response frame pair, correlates the response by
    request_id, and returns { rows, cols, cursor: [r, c], screen } to
    the caller. ANSI bytes are base64-encoded because they contain control
    characters. A 5s timeout sweep in the broker's existing reap_tick
    protects against a worker dying mid-request.
  • agent-relay-broker dump-pty <name> [--format plain|ansi] — admin CLI
    that discovers the broker from .agent-relay/connection.json (or
    --broker-url / RELAY_BROKER_URL) and prints the rendered screen.

Resolves #869.

Test plan

  • cargo fmt --check clean
  • cargo clippy --all-targets clean (no new warnings)
  • cargo test — 600 pass (was 588), 0 failures
  • 8 new snapshot unit tests (plain render, cursor placement, green
    SGR, ANSI round-trip through a fresh Term, empty grid, live
    PtySession::capture)
  • 4 new HTTP route tests (plain default, ansi pass-through,
    invalid-format 400 without broker call, agent_not_found → 404)
  • dump-pty --help renders correctly

Notes / open questions

  • to_ansi v1 supports foreground / background named + 256-palette +
    truecolor + bold + reverse + underline. Italic, strikeout, dim,
    hyperlinks, undercurl colour, and zerowidth chars are deliberately
    dropped — round-trip fidelity is preserved for the common case (the
    test asserts a fresh Term ends up with an identical grid).
  • The integration "spawn a real worker, hit the HTTP route" test in the
    issue's test plan is punted: the broker HTTP server isn't easy to boot
    in-process (it owns the broker loop), and the existing unit + route
    test surface already covers each end of the JSON-frame contract.

🤖 Generated with Claude Code

Adds `src/snapshot.rs` with `Snapshot::capture` / `to_plain` / `to_ansi`,
exposes it over `GET /api/spawned/{name}/snapshot?format=plain|ansi`, and
ships an `agent-relay-broker dump-pty <name>` admin CLI that wraps the
route. Renderers walk the existing alacritty VT grid (#836) once under
the term lock, then drop it — captured state is self-contained so the PTY
reader thread keeps advancing while callers render.

The HTTP request flows broker → PTY worker subprocess via a new
`snapshot_pty` / `snapshot_response` frame pair keyed by request_id, with
a 5s timeout sweep in the broker's reap_tick so a crashed worker can't
hang the HTTP oneshot. The `ansi` payload is base64 because the bytes
contain control characters; `dump-pty --format ansi` decodes and writes
raw bytes to stdout for terminal re-render.

12 new tests (8 snapshot unit + 4 HTTP route), 600 total pass (was 588).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 17, 2026

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 9bd54df6-fafd-4865-95af-bad0b9f83fa6

📥 Commits

Reviewing files that changed from the base of the PR and between 3ed3530 and 48704de.

📒 Files selected for processing (1)
  • .trajectories/index.json
🚧 Files skipped from review as they are similar to previous changes (1)
  • .trajectories/index.json

📝 Walkthrough

Walkthrough

Implements snapshot capture and dual rendering (plain text and ANSI) of PTY session screens via a new Snapshot module, exposed through an HTTP API endpoint and dump-pty CLI command, with broker orchestration for request correlation and worker-side frame handling.

Changes

Snapshot Capture, Rendering, and API Integration

Layer / File(s) Summary
Snapshot capture and rendering core
src/snapshot.rs
Defines SnapshotCell and public Snapshot types with dimensions and 1-indexed cursor. Implements Snapshot::capture (clones from live PtySession), Snapshot::from_term (extracts from standalone Term), to_plain (trimmed per-row lines), and to_ansi (ANSI clear/home, per-cell SGR transitions, CUP cursor positioning). Includes SGR encoding (reduced state, color mapping, named-color handling) and unit tests covering shape, cursor alignment, colors, ANSI replay round-trip, and wide-character spacer skipping; async integration test spawns live PTY and validates captured text.
PtySession accessor
src/pty.rs
Adds PtySession::with_term closure-based method to safely access internal Term while holding the mutex lock.
Crate module export
src/lib.rs
Exports the new snapshot module from the crate root.
Worker snapshot_pty handler
src/pty_worker.rs
Handles inbound snapshot_pty frame: captures snapshot via Snapshot::capture, renders plain or base64-encoded ansi format, returns snapshot_response payload or invalid_format error; adds base64 and snapshot imports.
HTTP snapshot endpoint and routing
src/listen_api.rs
Adds SnapshotFormat enum (Plain, Ansi) with wire/parse helpers; adds ListenApiRequest::Snapshot variant; implements GET /api/spawned/{name}/snapshot handler with format query param parsing, validation, and error responses; extends classify_error to map unsupported_runtime (409) and snapshot_timeout (504); test suite covers format defaults/passthrough, invalid format rejection, and error code mappings.
CLI dump-pty and broker orchestration
src/main.rs
Adds DumpPtyCommand/DumpPtyFormat CLI args and run_dump_pty runner (broker connection discovery, HTTP GET, JSON parsing, plain/base64-decoded output); broker adds pending_snapshots map and snapshot_request_deadlines for correlation; ListenApiRequest::Snapshot handler validates worker/PTY, generates request_id, sends snapshot_pty to worker, registers oneshot with 5s deadline; worker event handler correlates snapshot_response back to pending callers; periodic sweep expires deadlines and replies with snapshot_timeout error.
API and CLI documentation
web/content/docs/reference-broker-api.mdx
Documents broker listen API base URL/port discovery, API-key authentication, route inventory, full GET /api/spawned/{name}/snapshot endpoint semantics (format defaults, payload shapes, cursor 1-indexing, ANSI base64 decoding, status codes, curl example), dump-pty CLI wrapper flags and behavior, event stream /ws with replay/gap frames, error envelope format, and end-to-end control flow example.
Completion trajectory and metadata
.trajectories/completed/2026-05/traj_cbmwd07phhm2.md, .trajectories/completed/2026-05/traj_cbmwd07phhm2.json, .trajectories/index.json
Adds completed trajectory markdown and JSON documenting snapshot module/route/CLI implementation with design decisions (PTY worker execution, owned snapshot struct, 5s timeout sweep); updates trajectory index with lastUpdated timestamp and new trajectory entry; trajectory notes 12 new tests added, total count 588→600.
Dependency addition
Cargo.toml
Adds base64 = "0.22" dependency for ANSI base64 encoding.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

  • AgentWorkforce/relay#836: Both PRs touch src/pty.rs; #836 introduces the PtySession shared term lock for grid/state, and this PR adds PtySession::with_term that enables safe snapshot capture from the live Term.
  • AgentWorkforce/relay#865: This PR extends the broker API reference docs added by #865 by documenting the new snapshot endpoint and dump-pty CLI behavior.

Suggested reviewers

  • khaliqgant

Poem

🐰 A snapshot dance in ANSI light,
Grid cells captured, plain and bright,
Five seconds waits the pending frame,
Worker speaks, broker hears its name,
Cursor flies to grid's delight! 📸

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The PR title accurately and concisely summarizes the main change: adding a snapshot module and dump-pty debug command to the broker.
Description check ✅ Passed The PR description fully covers the summary, test plan, and notes sections matching the repository template with comprehensive details on implementation and testing.
Linked Issues check ✅ Passed All primary objectives from issue #869 are met: Snapshot struct with capture/to_plain/to_ansi methods [#869], HTTP route for snapshots [#869], CLI dump-pty command [#869], unit and route tests [#869], and documentation updates [#869].
Out of Scope Changes check ✅ Passed All changes align with the snapshot module and dump-pty command objectives; no unrelated modifications are present in the changeset.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/869-snapshot-dump-pty

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.trajectories/completed/2026-05/traj_cbmwd07phhm2.json:
- Line 75: The committed trajectory metadata contains a machine-local absolute
path in the JSON key "projectId", which leaks local environment details; update
the value stored under "projectId" to a portable identifier (e.g., repository
name, git remote/commit hash, or a relative path/slug) and remove any /Users/…
absolute paths, and change the code that generates this metadata to emit the
portable identifier instead of the local absolute path (ensure any
serializer/metadata writer that sets "projectId" is updated and re-run to
replace the current value).

In `@src/main.rs`:
- Around line 867-870: The code currently sets search_roots to probe both
cwd.join(".agent-relay") and cwd.clone(), which allows autodiscovery to read
<cwd>/connection.json; update the match that builds search_roots (the variable
named search_roots in main.rs, using state_dir and cwd) to only include the
dedicated broker metadata directory (cwd.join(".agent-relay")) when state_dir is
None so we no longer probe the repository root for connection.json and therefore
won't accidentally read unrelated config files or leak broker credentials.
- Around line 2164-2193: The Snapshot handler currently sends snapshot_pty to
all workers but run_headless_worker does not handle that RPC, so requests to
headless workers time out; before calling workers.send_to_worker in
ListenApiRequest::Snapshot check the worker's capabilities (e.g. via an existing
method like workers.has_worker + a capability/query method such as
workers.is_headless / workers.supports_pty / workers.get_worker_kind) and
immediately reply with an error (e.g. "unsupported_runtime: snapshot_pty not
supported for headless worker") when the target is a headless/non-PTY worker
instead of forwarding the request and inserting into pending_snapshots; keep the
current error path for missing workers and only call workers.send_to_worker when
the worker actually supports PTY snapshots.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: a4fb9eca-b1f1-4c56-9e5e-37f3343f8acd

📥 Commits

Reviewing files that changed from the base of the PR and between 89c2c36 and 0b0518a.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (13)
  • .trajectories/completed/2026-05/traj_cbmwd07phhm2.json
  • .trajectories/completed/2026-05/traj_cbmwd07phhm2.md
  • .trajectories/index.json
  • Cargo.toml
  • docs/reference-broker-api.md
  • src/lib.rs
  • src/listen_api.rs
  • src/main.rs
  • src/pty.rs
  • src/pty_worker.rs
  • src/snapshot.rs
  • web/content/docs/reference-broker-api.mdx
  • web/lib/docs-nav.ts

Comment thread .trajectories/completed/2026-05/traj_cbmwd07phhm2.json Outdated
Comment thread src/main.rs Outdated
Comment thread src/main.rs
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0b0518a2da

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/main.rs Outdated
Comment on lines +845 to +850
return Ok(BrokerConnection {
base_url: url.trim_end_matches('/').to_string(),
api_key: explicit_api_key
.map(ToString::to_string)
.or_else(|| std::env::var("RELAY_BROKER_API_KEY").ok())
.filter(|value| !value.trim().is_empty()),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve API key fallback when broker URL is overridden

When --broker-url is provided, this early return skips reading connection.json, so dump-pty can only authenticate via --api-key or RELAY_BROKER_API_KEY. In the common case where the broker key is only in .agent-relay/connection.json, dump-pty --broker-url ... sends an unauthenticated request and fails with 401. Keep the URL override, but still allow API-key fallback from connection.json to match the documented fallback behavior.

Useful? React with 👍 / 👎.

Comment thread src/main.rs
Comment on lines +3510 to +3512
let _ = reply.send(Err(
"snapshot_timeout: worker did not respond in time".into(),
));
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Return server error for timed-out snapshot requests

Timed-out snapshot requests are tagged as snapshot_timeout, but listen_api_snapshot classifies unknown prefixes as a client error (400), so worker crashes/timeouts are surfaced as bad requests instead of server-side failures (e.g., 5xx/504). This makes callers mis-handle transient broker/worker faults as user input problems; emit an error code that maps to server failure or extend classification for snapshot_timeout.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

9 issues found across 14 files

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name=".trajectories/completed/2026-05/traj_cbmwd07phhm2.json">

<violation number="1" location=".trajectories/completed/2026-05/traj_cbmwd07phhm2.json:75">
P2: The `projectId` field leaks a developer's local filesystem path (including username) into the repository. Consider using a relative path or an opaque project identifier instead.</violation>
</file>

<file name=".trajectories/index.json">

<violation number="1" location=".trajectories/index.json:586">
P2: Path points into a temporary `.claude/worktrees/` directory that won't exist after the worktree is cleaned up. Use a relative path (like some other entries) or an absolute path to the main project's `.trajectories/` directory so the trajectory file remains accessible.</violation>
</file>

<file name="src/main.rs">

<violation number="1" location="src/main.rs:844">
P2: When `--broker-url` is provided, this early return skips reading `connection.json`, so the API key stored there is never loaded. In the common case where the broker key lives only in `.agent-relay/connection.json`, `dump-pty --broker-url http://...` sends an unauthenticated request and gets a 401. Consider still reading `connection.json` for the API key even when the URL is overridden.</violation>

<violation number="2" location="src/main.rs:870">
P2: Including `cwd.clone()` in `search_roots` means any file named `connection.json` in the current working directory — even one unrelated to the broker — can be picked up as broker metadata. This could send the snapshot request (and the broker API key) to a wrong URL. Restrict autodiscovery to `cwd.join(".agent-relay")` only.</violation>

<violation number="3" location="src/main.rs:1361">
P2: Comment claims snapshot entries are cleaned up in the worker-exit handler, but no such cleanup exists there. The timeout sweep in `reap_tick` is the only mechanism that removes stale entries. Either add explicit cleanup in the worker-exit path (which would give faster feedback to the HTTP caller) or correct the comment to say entries are cleaned by the timeout sweep.</violation>

<violation number="4" location="src/main.rs:2170">
P2: Snapshot requests for headless (non-PTY) workers are not rejected up front. Since `run_headless_worker` doesn't handle `snapshot_pty` frames, it returns `worker_error` which is never mapped back to `pending_snapshots`. The request silently waits ~5 seconds until the timeout sweep fires, then returns a misleading `snapshot_timeout` error. Check the worker's runtime type here and return an immediate descriptive error for non-PTY workers.</violation>

<violation number="5" location="src/main.rs:3510">
P2: Timed-out snapshot requests emit `"snapshot_timeout: ..."` which `classify_error` doesn't recognize, so it falls through to a 400 Bad Request. A server-side timeout (worker died or is unresponsive) should map to a 5xx status (e.g. 504 Gateway Timeout), not a client error. Either add `"snapshot_timeout"` to the `classify_error` mapping or use an existing prefix that maps to a server error.</violation>
</file>

<file name="src/snapshot.rs">

<violation number="1" location="src/snapshot.rs:175">
P2: Wide character spacer cells are not skipped, causing misalignment when the ANSI output is replayed. When a wide char (CJK, emoji) is present, alacritty stores a spacer cell (space + `WIDE_CHAR_SPACER` flag) after it. Outputting that space as a real character will shift all subsequent content by one column. The spacer should be skipped in the render loop (similar to how alacritty's own `line_content` method works).</violation>

<violation number="2" location="src/snapshot.rs:352">
P2: `named_color_sgr` always returns `Some(...)` — the `Option` return type and the caller's `None` handling are dead code. For semantic defaults (Foreground, Background, Cursor, etc.) return `None` so the caller skips emission, matching the documented intent and avoiding redundant `;39`/`;49` after a reset.</violation>
</file>

Tip: instead of fixing issues one by one fix them all with cubic
Re-trigger cubic

Comment thread .trajectories/completed/2026-05/traj_cbmwd07phhm2.json Outdated
Comment thread .trajectories/index.json Outdated
Comment thread src/main.rs Outdated
Comment thread src/snapshot.rs
Comment thread src/snapshot.rs Outdated
Comment thread src/main.rs Outdated
Comment thread src/main.rs Outdated
Comment thread src/main.rs
request_id = %req_id,
"snapshot request timed out before worker responded"
);
let _ = reply.send(Err(
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot May 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Timed-out snapshot requests emit "snapshot_timeout: ..." which classify_error doesn't recognize, so it falls through to a 400 Bad Request. A server-side timeout (worker died or is unresponsive) should map to a 5xx status (e.g. 504 Gateway Timeout), not a client error. Either add "snapshot_timeout" to the classify_error mapping or use an existing prefix that maps to a server error.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At src/main.rs, line 3510:

<comment>Timed-out snapshot requests emit `"snapshot_timeout: ..."` which `classify_error` doesn't recognize, so it falls through to a 400 Bad Request. A server-side timeout (worker died or is unresponsive) should map to a 5xx status (e.g. 504 Gateway Timeout), not a client error. Either add `"snapshot_timeout"` to the `classify_error` mapping or use an existing prefix that maps to a server error.</comment>

<file context>
@@ -3197,6 +3484,35 @@ async fn run_init(cmd: InitCommand, telemetry: TelemetryClient) -> Result<()> {
+                            request_id = %req_id,
+                            "snapshot request timed out before worker responded"
+                        );
+                        let _ = reply.send(Err(
+                            "snapshot_timeout: worker did not respond in time".into(),
+                        ));
</file context>
Fix with Cubic

Comment thread src/main.rs
- Reject `snapshot_pty` for headless workers up front instead of waiting
  on the 5 s timeout sweep and returning a misleading `snapshot_timeout`.
  Maps to HTTP 409 via a new `unsupported_runtime` error class.
- Map `snapshot_timeout` to 504 Gateway Timeout in `classify_error`
  instead of falling through to 400 Bad Request — worker stalls are a
  server-side fault, not bad input.
- Restrict `dump-pty` broker-connection autodiscovery to
  `<cwd>/.agent-relay/connection.json`. Dropping the bare-cwd probe
  prevents an unrelated `connection.json` in a repo root from silently
  redirecting the request and leaking the broker API key.
- Still pull the API key out of `connection.json` when only
  `--broker-url` / `RELAY_BROKER_URL` is supplied, so users don't have
  to repeat `--api-key` on every dump-pty invocation.
- Fix `snapshot::to_ansi` to skip `WIDE_CHAR_SPACER` cells; emitting the
  placeholder space pushed every subsequent column one to the right on
  replay (CJK / emoji round-trip).
- Make `named_color_sgr` actually return `None` for semantic defaults
  (Foreground / Background / Cursor / Bright|DimForeground) — the
  surrounding reset already covers them, and avoiding the redundant
  `;39` / `;49` shrinks the byte stream and removes dead `None`-handling.
- Fix the misleading comment claiming the worker-exit handler reaps
  `pending_snapshots` (the deadline sweep in `reap_tick` is the only
  cleanup; reference issue #871 for the planned generalisation).
- Strip local absolute paths (`/Users/will/.../worktrees/…`) leaked into
  the new trajectory JSON.
- Add route tests for the 409 / 504 mappings and a wide-char round-trip
  + `named_color_sgr` defaults test for snapshot.

`cargo fmt` / `cargo clippy --all-targets --all-features -- -D warnings`
/ `cargo test` all clean; broker suite now 604 passed (was 600).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@willwashburn
Copy link
Copy Markdown
Member Author

Pushed d771a6e3 addressing the review feedback. Recap by reviewer:

@coderabbitai

  • (Major) Restrict autodiscovery to .agent-relay/connection.json — Fixed. discover_broker_connection no longer probes the bare cwd, so a stray connection.json in a repo root can't silently redirect the request or leak the broker API key.
  • (Major) Reject snapshot for non-PTY workers up front — Fixed. ListenApiRequest::Snapshot now matches on handle.spec.runtime: missing → 404 agent_not_found, headless → 409 unsupported_runtime, pty → forwarded. No more 5-second wait into a misleading snapshot_timeout.
  • ⏭️ (Major) Trajectory projectId absolute path — Fixed alongside cubic's identical finding below.

@chatgpt-codex-connector

  • (P2) Preserve API-key fallback when --broker-url is overridden — Fixed. Both the --broker-url and RELAY_BROKER_URL paths now route through a shared resolve_api_key closure that still consults .agent-relay/connection.json after --api-key and RELAY_BROKER_API_KEY are unset. No more accidental 401s when the URL is overridden but the key lives in connection.json.
  • (P2) snapshot_timeout mis-classified as 400 — Fixed. classify_error now maps snapshot_timeout to 504 Gateway Timeout (worker stalled / died is a server-side fault) and unsupported_runtime to 409 Conflict. Two new tests in listen_api.rs lock in both mappings.

@cubic-dev-ai

  • (P2) projectId leaks /Users/will/... absolute path — Fixed. Trajectory JSON now uses "projectId": "relay" per the suggested edit.
  • (P2) .trajectories/index.json entry pointing into the temp worktree — Fixed. Replaced with a repo-relative path matching the prevailing style for other recent entries.
  • (P2) Comment lies about pending_snapshots cleanup — Fixed. Updated the comment to point at the reap_tick deadline sweep (the actual mechanism) and reference broker: unify worker request/response correlation (factor out the pending_snapshots pattern) #871 for the planned generalisation of the request/response correlation pattern.
  • (P2) Snapshot vs. headless worker — Same fix as the CodeRabbit Major above.
  • (P2) cwd.clone() in search_roots — Same fix as the CodeRabbit Major above.
  • (P2) snapshot_timeout → 5xx — Same fix as Codex's P2 above (504).
  • (P2) --broker-url skips API-key fallback — Same fix as Codex's P2 above.
  • (P2) Wide-char spacer cells emitted in to_ansi — Fixed. snapshot::to_ansi now skips cells carrying Flags::WIDE_CHAR_SPACER, mirroring what alacritty's own Term::line_content does. Without this, a wide glyph (CJK / emoji) pushed every subsequent column one to the right on replay. New test ansi_skips_wide_char_spacer_so_layout_round_trips round-trips "文X" through a fresh Term to lock it down.
  • (P2) named_color_sgr dead None branch — Fixed. The function now actually returns None for semantic defaults (Foreground / Background / Cursor / Bright|DimForeground). The surrounding \x1b[0 reset covers those cases, so we save the redundant ;39 / ;49 bytes and the caller's if let Some(code) is no longer dead code. New test named_color_sgr_returns_none_for_semantic_defaults.

Docs (reference-broker-api.md + .mdx) updated to document the new 409 / 504 status codes.

CI

  • The failing Preview Web (SST) check is a CloudFront account-cap on the AWS preview account (TooManyCachePolicies / EntityLimitExceeded creating WebKvStore), not anything in this PR. Every other check — Clippy, Format, Rust Tests on both runners, E2E, Stress, CodeQL, License, Secrets — is green.

cargo fmt / cargo clippy --all-targets --all-features -- -D warnings / cargo test all clean. Full broker suite: 604 passed, 0 failed, 2 ignored (was 600 + 4 new tests for the fixes above).

@chatgpt-codex-connector
Copy link
Copy Markdown

To use Codex here, create an environment for this repo.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 17, 2026

@willwashburn Excellent work — every item in the recap is addressed cleanly and the test count confirms the new coverage landed. A few highlights worth calling out:

  • The unsupported_runtime → 409 / snapshot_timeout → 504 HTTP status mapping is now semantically correct and locked in with dedicated tests.
  • The wide-char spacer skip in to_ansi is an easy thing to miss but would have caused subtle layout corruption on CJK/emoji content — glad it has a round-trip test.
  • The named_color_sgr None fix eliminates dead code and trims unnecessary SGR bytes in the common case.

The Preview Web (SST) CloudFront cap is clearly an infra-level issue unrelated to this PR. Everything else is green.

🐇 ✨

✅ Actions performed

Comments resolved. Approval is disabled; enable reviews.request_changes_workflow to allow explicit top-level @coderabbitai resolve or @coderabbitai approve commands.

Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 7 files (changes from recent commits).

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="src/main.rs">

<violation number="1" location="src/main.rs:854">
P2: The `current_dir().ok()?` early-returns `None` from the closure even when `state_dir` is `Some(...)` and `cwd` isn't needed. Move the `current_dir()` call into the `None` arm so that an explicit `state_dir` isn't silently ignored when cwd is unavailable.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
Fix all with cubic | Re-trigger cubic

Comment thread src/main.rs
Comment on lines +854 to +858
let cwd = std::env::current_dir().ok()?;
let roots: Vec<PathBuf> = match state_dir {
Some(dir) => vec![dir.to_path_buf()],
None => vec![cwd.join(".agent-relay")],
};
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot May 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: The current_dir().ok()? early-returns None from the closure even when state_dir is Some(...) and cwd isn't needed. Move the current_dir() call into the None arm so that an explicit state_dir isn't silently ignored when cwd is unavailable.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At src/main.rs, line 854:

<comment>The `current_dir().ok()?` early-returns `None` from the closure even when `state_dir` is `Some(...)` and `cwd` isn't needed. Move the `current_dir()` call into the `None` arm so that an explicit `state_dir` isn't silently ignored when cwd is unavailable.</comment>

<file context>
@@ -832,22 +832,58 @@ struct BrokerConnection {
+    // pull out a stored `api_key`. Lets `--broker-url` reuse the broker's
+    // saved key when the env var and `--api-key` are both unset.
+    let api_key_from_connection_file = || -> Option<String> {
+        let cwd = std::env::current_dir().ok()?;
+        let roots: Vec<PathBuf> = match state_dir {
+            Some(dir) => vec![dir.to_path_buf()],
</file context>
Suggested change
let cwd = std::env::current_dir().ok()?;
let roots: Vec<PathBuf> = match state_dir {
Some(dir) => vec![dir.to_path_buf()],
None => vec![cwd.join(".agent-relay")],
};
let roots: Vec<PathBuf> = match state_dir {
Some(dir) => vec![dir.to_path_buf()],
None => {
let cwd = std::env::current_dir().ok()?;
vec![cwd.join(".agent-relay")]
}
};
Fix with Cubic

`/docs/*.md` is a legacy mirror of the public site that's gone stale
— only 5 pages survive there vs 40+ in `web/content/docs/*.mdx`.
New pages don't belong in it. The `.claude/rules/docs-sync.md` rule
that asked for the mirror is being updated separately.

This drops the `docs/reference-broker-api.md` file PR #870 added; the
canonical reference at `web/content/docs/reference-broker-api.mdx`
is untouched and already covers the snapshot endpoint.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The previous version of `web/content/docs/reference-broker-api.mdx`
on this branch only contained the new Snapshots section — it
assumed the full broker API reference (which exists on PR #865's
branch as 247 lines covering auth, routes, /ws, error envelope)
was already on main. It isn't.

If this PR merges first, the published docs page would have only a
header + the snapshot section, missing the rest of the API surface.

Folds PR #865's full reference content in and slots the snapshot
section under "Routes" between "PTY interaction" and "Event stream"
(promoted from `##` to `###` to match the section's nesting). Adds
a snapshot step to the end-to-end worked example so the new
endpoint is exercised there too.

Also brings the `.claude/rules/docs-sync.md` rewrite from PR #865
onto this branch — the old "mirror to /docs/*.md" rule was what
sent the first version of this doc down the wrong path.

Either merge order now works: if PR #865 lands first there's a
trivial conflict on the duplicated base content; if this PR lands
first the doc is already complete.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@web/content/docs/reference-broker-api.mdx`:
- Line 230: Update the docs for the `--state-dir` flag: remove the “(or the
current directory)” phrase and clarify that broker discovery looks for
`.agent-relay/connection.json` under the working directory by default unless an
explicit `--state-dir` is provided; update the table entry that currently reads
"`--state-dir`  | Directory containing `connection.json` when discovering the
broker. Defaults to `.agent-relay/` (or the current directory).`" to instead
state the default as `.agent-relay/connection.json` under the working directory
and mention that providing `--state-dir` overrides this behavior.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 7eb9ab3e-8387-48f1-af62-0076d9120bb2

📥 Commits

Reviewing files that changed from the base of the PR and between d771a6e and 3ed3530.

📒 Files selected for processing (2)
  • .claude/rules/docs-sync.md
  • web/content/docs/reference-broker-api.mdx

| `--format` | `plain` (default) or `ansi`. `plain` writes the rendered screen as UTF-8; `ansi` writes the decoded reproduction bytes. |
| `--broker-url` | Override the broker base URL. Falls back to `RELAY_BROKER_URL`, then to `.agent-relay/connection.json` in the current directory. |
| `--api-key` | Override the API key. Falls back to `RELAY_BROKER_API_KEY`, then to the value in `.agent-relay/connection.json`. |
| `--state-dir` | Directory containing `connection.json` when discovering the broker. Defaults to `.agent-relay/` (or the current directory). |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Clarify --state-dir default to avoid implying bare-CWD probing.

Line 230 currently says the default is .agent-relay/ “(or the current directory)”, which conflicts with the tightened autodiscovery behavior. Please remove the “or the current directory” phrasing and state that discovery uses .agent-relay/connection.json under the working directory unless --state-dir is explicitly provided.

Suggested doc tweak
-| `--state-dir`  | Directory containing `connection.json` when discovering the broker. Defaults to `.agent-relay/` (or the current directory).      |
+| `--state-dir`  | Directory containing `connection.json` when discovering the broker. Defaults to `.agent-relay/` under the current working directory. |
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
| `--state-dir` | Directory containing `connection.json` when discovering the broker. Defaults to `.agent-relay/` (or the current directory). |
| `--state-dir` | Directory containing `connection.json` when discovering the broker. Defaults to `.agent-relay/` under the current working directory. |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@web/content/docs/reference-broker-api.mdx` at line 230, Update the docs for
the `--state-dir` flag: remove the “(or the current directory)” phrase and
clarify that broker discovery looks for `.agent-relay/connection.json` under the
working directory by default unless an explicit `--state-dir` is provided;
update the table entry that currently reads "`--state-dir`  | Directory
containing `connection.json` when discovering the broker. Defaults to
`.agent-relay/` (or the current directory).`" to instead state the default as
`.agent-relay/connection.json` under the working directory and mention that
providing `--state-dir` overrides this behavior.

…p-pty

# Conflicts:
#	web/content/docs/reference-broker-api.mdx
#	web/lib/docs-nav.ts
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

broker: snapshot module + agent-relay-broker dump-pty debug command

1 participant