Skip to content

CLI: send screenshots from studio code to Telegram remote sessions#3272

Merged
gcsecsey merged 10 commits intotrunkfrom
gcsecsey/screenshot-support
May 1, 2026
Merged

CLI: send screenshots from studio code to Telegram remote sessions#3272
gcsecsey merged 10 commits intotrunkfrom
gcsecsey/screenshot-support

Conversation

@gcsecsey
Copy link
Copy Markdown
Contributor

@gcsecsey gcsecsey commented Apr 28, 2026

Related issues

How AI was used in this PR

Claude wrote the bulk of the implementation and the tests. I reviewed and tested the changes end-to-end in my sandbox using the companion PR.

Proposed Changes

The Telegram remote-session bridge is currently text-only. When the agent finishes a visible task, the user gets a prose summary but no image. This PR lets the local agent deliver screenshots inline:

  • New share_screenshot tool. Captures a 16:9 above-the-fold view of a URL by default and emits a media.share JSON event. fullPage: true is opt-in for the rare case where the user wants the whole scroll length. take_screenshot stays unchanged as the model-internal reasoning tool.
  • Remote-session controller (turn-runner + poll-loop) collects media.share events from the spawned studio code --json child and posts each photo before the text reply.
  • respondMessage now picks transport based on payload. Photo present means multipart/form-data with a photo file part (matches the wpcom contract); text-only stays on the existing JSON path.
  • Spawned child gets STUDIO_REMOTE_SESSION=1 so the system prompt knows to keep replies short, deliver visible work via share_screenshot, follow up with a "Want me to publish this as a preview site?" line, and stop fabricating "gist stored / preview link saved" epilogues that aren't backed by any actual storage.

Testing Instructions

Prerequisites

  • Be an Automattician (backend gates on is_automattician())
  • Be logged in via studio auth login so the bearer falls through from ~/.studio/shared.json
  • Follow the testing steps on 213611-ghe-Automattic/wpcom to apply the backend changes on your sandbox

Testing

  • Build the CLI: STUDIO_ENABLE_REMOTE_SESSION=true npm run cli:build
  • Start the bridge: STUDIO_ENABLE_REMOTE_SESSION=true node apps/cli/dist/cli/main.mjs code --remote-session
  • In a second terminal, tail the log: tail -F ~/.studio/remote-session.log
  • From Telegram, send: "send to my local agent: take a screenshot of my site and show me"
  • Check in Telegram:
    • A 1280x720 above-the-fold screenshot arrives inline
    • A follow-up text message asks about publishing a preview site
  • Test for the text-only regressions by sending a non-visual request like "list my local sites"
  • Ask explicitly for a full page screenshot and confirm share_screenshot is called with fullPage: true and the long capture is delivered

Pre-merge Checklist

  • Have you checked for TypeScript, React or other console errors?

…ions

Adds a `share_screenshot` tool that captures a 16:9 above-the-fold view of
a URL and emits a `media.share` JSON event the remote-session controller
forwards to Telegram via the existing `/local-agent-respond` endpoint as
multipart/form-data with a `photo` part. The agent uses this to deliver
visible results back to the user; `take_screenshot` stays internal for
visual reasoning.

Also threads `STUDIO_REMOTE_SESSION=1` to the spawned child so the system
prompt can favor short, visual replies and steers the agent away from
fabricating "gist stored / preview link saved" epilogues.
@gcsecsey gcsecsey changed the title apps/cli: send screenshots from studio code to Telegram remote sessions CLI: send screenshots from studio code to Telegram remote sessions Apr 28, 2026
@gcsecsey gcsecsey requested a review from Copilot April 29, 2026 10:06
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends the existing Telegram remote-session bridge for studio code to support inline screenshot delivery by introducing a new user-facing screenshot tool and a new JSON event type that the remote-session controller forwards as Telegram photos.

Changes:

  • Add a new share_screenshot tool that emits a media.share JSON event (plus returns the image to the agent) for user-visible screenshot delivery.
  • Extend the remote-session controller to collect media.share events and post photos before the text reply.
  • Update Telegram response transport to use multipart/form-data when a photo is present (JSON for text-only), plus add remote-session-specific system prompt guidance toggled by STUDIO_REMOTE_SESSION=1.

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
tools/common/ai/tools.ts Adds display name + URL detail extraction for the new share_screenshot tool.
tools/common/ai/json-events.ts Introduces MediaShareEvent and extends JsonEvent union with media.share.
apps/cli/ai/tools.ts Implements share_screenshot, refactors screenshot capture into captureScreenshotPng, and registers the new tool.
apps/cli/ai/system-prompt.ts Adds Telegram remote-session guidance addendum (including share_screenshot usage expectations).
apps/cli/ai/agent.ts Enables the remote-session system prompt addendum when STUDIO_REMOTE_SESSION=1.
apps/cli/remote-session/turn-runner.ts Collects media.share events from the subprocess and returns them in TurnOutcome.
apps/cli/remote-session/poll-loop.ts Posts collected media shares to Telegram before posting the text reply; avoids “no result” warning when media exists.
apps/cli/remote-session/telegram-client.ts Updates respondMessage to support multipart photo uploads + caption; logs partial failures without throwing.
apps/cli/remote-session/tests/* Adds/updates unit tests for media collection, ordering, and multipart photo transport behavior.
apps/cli/remote-session/tests/fixtures/mock-studio-code.mjs Adds a media-share fixture scenario emitting media.share events.
apps/cli/ai/tests/system-prompt.test.ts New tests verifying remote-session prompt addendum is included/excluded appropriately.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread apps/cli/ai/tools.ts Outdated
Comment thread apps/cli/remote-session/turn-runner.ts Outdated
gcsecsey and others added 2 commits April 29, 2026 13:40
- Skip the full-document scroll in `captureScreenshotPng` when
  `fullPage: false` and only wait on images intersecting the first
  viewport, so above-the-fold `share_screenshot` captures stay quick on
  long pages.
- Rename the `bytes` field on the `media.share` debug log to
  `base64_chars` since it's the base64 character count, not the decoded
  byte length.

Both fixes from Copilot review on PR #3272.
@gcsecsey gcsecsey marked this pull request as ready for review April 29, 2026 13:46
@wpmobilebot
Copy link
Copy Markdown
Collaborator

wpmobilebot commented Apr 29, 2026

📊 Performance Test Results

Comparing 7a331d0 vs trunk

app-size

Metric trunk 7a331d0 Diff Change
App Size (Mac) 1511.23 MB 1511.29 MB +0.06 MB ⚪ 0.0%

site-editor

Metric trunk 7a331d0 Diff Change
load 1478 ms 1775 ms +297 ms 🔴 20.1%

site-startup

Metric trunk 7a331d0 Diff Change
siteCreation 8102 ms 8093 ms 9 ms ⚪ 0.0%
siteStartup 4953 ms 4960 ms +7 ms ⚪ 0.0%

Results are median values from multiple test runs.

Legend: 🟢 Improvement (faster) | 🔴 Regression (slower) | ⚪ No change (<50ms diff)

…-time delivery

- `share_screenshot` returns a confirmation string instead of the image
  bytes, so the model doesn't waste tokens reasoning over a screenshot it
  just sent. Tool description and remote-session system prompt now lead
  with the fire-and-forget framing and mandate calling the tool after any
  visible change.
- New `MediaStreamer` posts photos to Telegram as soon as the
  `media.share` event arrives, in parallel with the model writing its
  follow-up text. Drains in-flight POSTs at turn end so the text reply
  still lands after the photo. Removes the post-turn collection step and
  the `mediaShares` field on `TurnOutcome` — turn-runner just forwards
  events now.
- Truncate captions to 1024 chars at the client to match the wpcom
  endpoint's hard limit (matches Automattic/wpcom#213611).
- Compact the remote-session system prompt addendum.
- Drop `system-prompt.test.ts`; add `media-streamer.test.ts` covering
  serial ordering, failure isolation, and event filtering.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 14 out of 14 changed files in this pull request and generated 2 comments.

Comments suppressed due to low confidence (1)

apps/cli/remote-session/poll-loop.ts:190

  • handleTurn() always awaits mediaStreamer.drain() before checking signal.aborted. This means a user-initiated detach can be delayed by in-flight photo uploads (and MediaStreamer doesn’t pass the abort signal into respondMessage, so those POSTs can’t be cancelled). Consider short-circuiting the drain when signal.aborted/timeout, or threading the abort signal into MediaStreamer so pending uploads are cancelled and detach can complete promptly.
	// Wait for any in-flight photos to finish posting so a text reply that
	// follows them lands in chat order, even if the photo POST is still
	// running when the turn ends.
	const mediaSummary = await mediaStreamer.drain();

	if ( outcome.sessionId && outcome.sessionId !== sessionId ) {
		await deps.writeSession( target.chatId, outcome.sessionId );
	}

	const duration = Date.now() - started;
	deps.logger.info( 'Turn finished', {
		chat_id: target.chatId,
		duration_ms: duration,
		status: outcome.status,
		exit_code: outcome.exitCode,
		chars_out: outcome.replyText?.length ?? 0,
		session_id: outcome.sessionId,
		aborted: signal.aborted,
		media_posted: mediaSummary.posted,
		media_failed: mediaSummary.failed,
	} );

	// Detach was requested mid-turn. Skip posting any reply — the detach flow
	// will announce "🔴 Local agent detached." on its own.
	if ( signal.aborted ) {
		return;

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread apps/cli/remote-session/media-streamer.ts
Comment thread apps/cli/remote-session/telegram-client.ts Outdated
Pass `deviceScaleFactor: 2` to Playwright for `share_screenshot` so the
captured PNG has retina pixel density (2560x1440 raw on the desktop
viewport) without changing the CSS layout the page renders against.
Telegram compresses on display, but starting from 2x means noticeably
more detail survives — especially text and icons on desktop clients.

`take_screenshot` keeps the default 1x DPR — the model doesn't benefit
from retina rendering for visual reasoning.
- `MediaStreamer.onEvent` now validates the `media.share` event shape
  before using fields. A malformed line (missing/wrong-typed
  `dataBase64` or `mimeType`) is logged and skipped instead of throwing
  out of the readline `'line'` listener and risking the poll loop.
- `respondMessage` normalizes empty-string `text` and `photo` to
  "absent" up front so the early guard, body-builder branch, and debug
  log all agree on what counts as present. No realistic input path
  produces empty strings today, but the inconsistency was misleading.

Both fixes from Copilot review on PR #3272.
@gcsecsey gcsecsey requested a review from epeicher April 30, 2026 14:37
The tool is only useful when the remote-session controller drives the
agent over Telegram, and its system-prompt guidance is already gated by
that flow. Excluding it from the toolset for everyone else keeps the
default tool surface focused on what the prompt actually documents.
Copy link
Copy Markdown
Contributor

@epeicher epeicher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this @gcsecsey! I have tested it, and it works great 🙌 I have also added a commit to gate the new tool mcp__studio__share_screenshot behind the existing feature flag so it is not exposed until we release the changes.

Feature flag enabled Feature flag disabled
Image Image

And I want to share here my testing, it works like a charm!

Image

@gcsecsey gcsecsey enabled auto-merge (squash) April 30, 2026 19:31
- `media-streamer.test.ts`: `RemoteSessionLogger`'s constructor signature
  changed in trunk (#3282) from `(logPath: string)` to
  `(options: RemoteSessionLoggerOptions)`. Update the five test
  call sites to use `{ logPath: '/dev/null' }`.
- `top-bar.tsx`: drop unused `useEffect`, `useRef`, `useAppDispatch`,
  and `openWapuuWorld` imports (leftover from #3270).
@gcsecsey gcsecsey merged commit c7dbc6f into trunk May 1, 2026
10 checks passed
@gcsecsey gcsecsey deleted the gcsecsey/screenshot-support branch May 1, 2026 10:10
youknowriad added a commit that referenced this pull request May 4, 2026
Brings in 53 commits since the last merge. Conflict resolution + ports:

- `apps/cli/ai/agent.ts`: kept the dual-runtime dispatch from this branch;
  absorbed trunk's drop of `maxTurns` (no longer destructured, no longer
  threaded to runtimes). Body contention with #3272's inline `query()`
  changes (mcpServers, remoteSession, STUDIO_REMOTE_SESSION env detection)
  was resolved by deferring those to the Anthropic runtime layer where the
  `query()` call now lives.
- `apps/cli/ai/runtimes/anthropic/index.ts`: ported #3272's improvements —
  `STUDIO_REMOTE_SESSION` env → `remoteSession` flag → systemPromptOptions
  (the Telegram bridge sets this when it spawns `studio code --json`).
  Dropped maxTurns from query() options to match trunk.
- `apps/cli/ai/runtimes/types.ts`, `apps/cli/ai/runtimes/openai/index.ts`,
  tests: dropped maxTurns from `AgentRuntimeConfig` and all callers.
- `apps/cli/ai/tools.ts` deleted on our side, modified on trunk: deleted
  in resolution. Trunk's #3272 additions were ported into our split-tools
  layout:
    - `apps/cli/ai/tools/screenshot-helpers.ts` (new): shared
      `captureScreenshotPng` + viewport constants.
    - `apps/cli/ai/tools/share-screenshot.ts` (new): the `share_screenshot`
      tool that emits a `media.share` event for the remote-session bridge.
    - `apps/cli/ai/tools/take-screenshot.ts`: refactored onto the shared
      helper; updated description to point at share_screenshot.
    - `apps/cli/ai/tools/index.ts`: registry includes shareScreenshotTool;
      `resolveStudioToolDefinitions` now uses an exclusion-set model so
      both preview-steering and (when not enabled) share_screenshot get
      filtered consistently. `createRemoteSiteTools` includes
      shareScreenshotTool when `isRemoteSessionEnabled()`.
- `apps/cli/ai/tools/utils.ts`: ported #3286's progress-update coalescing
  — `update: true` now overwrites the last progress message in
  `captureCommandOutput` instead of appending.
- `apps/cli/ai/tools/wp-cli.ts`: ported #3264's smarter `--post_content=`
  quote handling (flags after a quoted post_content are now parsed
  correctly) and the typographic-dash rejection that catches
  `‐porcelain` / `–color` etc. before they hit WP-CLI as silent garbage.
- `apps/ui/src/components/session-view/composer/index.tsx`: trunk
  converted Composer to `forwardRef` with a `ComposerHandle.appendDraft`
  imperative method (used by trunk's annotate-toolbar hand-off). Merged
  the forwardRef structure with our composer-owned cross-family swap
  state, dialog, and hooks. Kept trunk's Escape-to-interrupt keydown
  handler and the textareaRef.
- `apps/ui/src/components/session-view/index.tsx`: absorbed trunk's
  annotation handler (`handleAnnotationsDone`), the Annotation type
  import, and the SitePreview `onAnnotationsDone` prop. Dropped the
  unused-by-the-merged-body `useConnector` import.

Verified: typecheck has only the pre-existing #3320 (`@wp-playground` bump)
error in `apply-blueprint-form-values.ts` that's also present on plain
trunk; 1532 tests pass; lint clean; CLI builds.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
youknowriad added a commit that referenced this pull request May 4, 2026
Brings in 53 commits since the last merge. Conflict resolution + ports:

- `apps/cli/ai/agent.ts`: kept the dual-runtime dispatch from this branch;
  absorbed trunk's drop of `maxTurns` (no longer destructured, no longer
  threaded to runtimes). Body contention with #3272's inline `query()`
  changes (mcpServers, remoteSession, STUDIO_REMOTE_SESSION env detection)
  was resolved by deferring those to the Anthropic runtime layer where the
  `query()` call now lives.
- `apps/cli/ai/runtimes/anthropic/index.ts`: ported #3272's improvements —
  `STUDIO_REMOTE_SESSION` env → `remoteSession` flag → systemPromptOptions
  (the Telegram bridge sets this when it spawns `studio code --json`).
  Dropped maxTurns from query() options to match trunk.
- `apps/cli/ai/runtimes/types.ts`, `apps/cli/ai/runtimes/openai/index.ts`,
  tests: dropped maxTurns from `AgentRuntimeConfig` and all callers.
- `apps/cli/ai/tools.ts` deleted on our side, modified on trunk: deleted
  in resolution. Trunk's #3272 additions ported into our split-tools
  layout:
    - `apps/cli/ai/tools/screenshot-helpers.ts` (new): shared
      `captureScreenshotPng` + viewport constants.
    - `apps/cli/ai/tools/share-screenshot.ts` (new): the `share_screenshot`
      tool that emits a `media.share` event for the remote-session bridge.
    - `apps/cli/ai/tools/take-screenshot.ts`: refactored onto the shared
      helper; updated description to point at share_screenshot.
    - `apps/cli/ai/tools/index.ts`: registry includes shareScreenshotTool;
      `resolveStudioToolDefinitions` now uses an exclusion-set model so
      both preview-steering and (when not enabled) share_screenshot get
      filtered consistently. `createRemoteSiteTools` includes
      shareScreenshotTool when `isRemoteSessionEnabled()`.
- `apps/cli/ai/tools/utils.ts`: ported #3286's progress-update coalescing
  — `update: true` now overwrites the last progress message in
  `captureCommandOutput` instead of appending.
- `apps/cli/ai/tools/wp-cli.ts`: ported #3264's smarter `--post_content=`
  quote handling (flags after a quoted post_content are now parsed
  correctly) and the typographic-dash rejection that catches
  `‐porcelain` / `–color` etc. before they hit WP-CLI as silent garbage.
- `apps/ui/src/components/session-view/composer/index.tsx`: trunk
  converted Composer to `forwardRef` with a `ComposerHandle.appendDraft`
  imperative method (used by trunk's annotate-toolbar hand-off). Merged
  the forwardRef structure with our composer-owned cross-family swap
  state, dialog, and hooks. Kept trunk's Escape-to-interrupt keydown
  handler and the textareaRef.
- `apps/ui/src/components/session-view/index.tsx`: absorbed trunk's
  annotation handler (`handleAnnotationsDone`), the Annotation type
  import, and the SitePreview `onAnnotationsDone` prop. Dropped the
  unused-by-the-merged-body `useConnector` import.

Verified: typecheck has only the pre-existing #3320 (`@wp-playground` bump)
error in `apply-blueprint-form-values.ts` that's also present on plain
trunk; 1532 tests pass; lint clean; CLI builds.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
chubes4 added a commit that referenced this pull request May 4, 2026
After PR #3272 (CLI: send screenshots from studio code to Telegram
remote sessions, merged 2026-05-01) the codebase has two screenshot
tools that capture a URL: take_screenshot for agent-internal visual
reasoning, and share_screenshot for fire-and-forget delivery to the
user. This branch's resolveScreenshotUrl helper was originally only
applied to take_screenshot, but share_screenshot has the same
URL-resolution gap: agents that know a local Studio site by name
must already know its public URL to share a screenshot of it.

Apply the same nameOrPath/path schema and resolveScreenshotUrl call
to share_screenshot. Both tools now accept the same shape, both
errors out cleanly when neither url nor nameOrPath is provided.

Slim the test diff to a single regression test. The original branch
added four behavior tests plus a mockScreenshotBrowser helper, but
the regression we're guarding against is specifically 'agent passed
nameOrPath, screenshot resolved correctly'. The other cases (explicit
URL, path composition, error path) are nice-to-haves and can land in
follow-up coverage if needed; types and existing trunk tests cover
the gating and tool-list shape.

## AI assistance
- **AI assistance:** Yes
- **Tool(s):** Claude Code (Sonnet 4.5)
- **Used for:** Drafted the share_screenshot extension and slimmed
  the test diff under Chris's direction. Chris reviewed the scope
  trade-off (extend to both tools vs leave share_screenshot for
  later) and chose the unconditional both-tools fix per the
  fix-upstream-first rule.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants