CLI: send screenshots from studio code to Telegram remote sessions#3272
CLI: send screenshots from studio code to Telegram remote sessions#3272
studio code to Telegram remote sessions#3272Conversation
…ions Adds a `share_screenshot` tool that captures a 16:9 above-the-fold view of a URL and emits a `media.share` JSON event the remote-session controller forwards to Telegram via the existing `/local-agent-respond` endpoint as multipart/form-data with a `photo` part. The agent uses this to deliver visible results back to the user; `take_screenshot` stays internal for visual reasoning. Also threads `STUDIO_REMOTE_SESSION=1` to the spawned child so the system prompt can favor short, visual replies and steers the agent away from fabricating "gist stored / preview link saved" epilogues.
studio code to Telegram remote sessionsstudio code to Telegram remote sessions
There was a problem hiding this comment.
Pull request overview
This PR extends the existing Telegram remote-session bridge for studio code to support inline screenshot delivery by introducing a new user-facing screenshot tool and a new JSON event type that the remote-session controller forwards as Telegram photos.
Changes:
- Add a new
share_screenshottool that emits amedia.shareJSON event (plus returns the image to the agent) for user-visible screenshot delivery. - Extend the remote-session controller to collect
media.shareevents and post photos before the text reply. - Update Telegram response transport to use
multipart/form-datawhen a photo is present (JSON for text-only), plus add remote-session-specific system prompt guidance toggled bySTUDIO_REMOTE_SESSION=1.
Reviewed changes
Copilot reviewed 13 out of 13 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| tools/common/ai/tools.ts | Adds display name + URL detail extraction for the new share_screenshot tool. |
| tools/common/ai/json-events.ts | Introduces MediaShareEvent and extends JsonEvent union with media.share. |
| apps/cli/ai/tools.ts | Implements share_screenshot, refactors screenshot capture into captureScreenshotPng, and registers the new tool. |
| apps/cli/ai/system-prompt.ts | Adds Telegram remote-session guidance addendum (including share_screenshot usage expectations). |
| apps/cli/ai/agent.ts | Enables the remote-session system prompt addendum when STUDIO_REMOTE_SESSION=1. |
| apps/cli/remote-session/turn-runner.ts | Collects media.share events from the subprocess and returns them in TurnOutcome. |
| apps/cli/remote-session/poll-loop.ts | Posts collected media shares to Telegram before posting the text reply; avoids “no result” warning when media exists. |
| apps/cli/remote-session/telegram-client.ts | Updates respondMessage to support multipart photo uploads + caption; logs partial failures without throwing. |
| apps/cli/remote-session/tests/* | Adds/updates unit tests for media collection, ordering, and multipart photo transport behavior. |
| apps/cli/remote-session/tests/fixtures/mock-studio-code.mjs | Adds a media-share fixture scenario emitting media.share events. |
| apps/cli/ai/tests/system-prompt.test.ts | New tests verifying remote-session prompt addendum is included/excluded appropriately. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Skip the full-document scroll in `captureScreenshotPng` when `fullPage: false` and only wait on images intersecting the first viewport, so above-the-fold `share_screenshot` captures stay quick on long pages. - Rename the `bytes` field on the `media.share` debug log to `base64_chars` since it's the base64 character count, not the decoded byte length. Both fixes from Copilot review on PR #3272.
📊 Performance Test ResultsComparing 7a331d0 vs trunk app-size
site-editor
site-startup
Results are median values from multiple test runs. Legend: 🟢 Improvement (faster) | 🔴 Regression (slower) | ⚪ No change (<50ms diff) |
…-time delivery - `share_screenshot` returns a confirmation string instead of the image bytes, so the model doesn't waste tokens reasoning over a screenshot it just sent. Tool description and remote-session system prompt now lead with the fire-and-forget framing and mandate calling the tool after any visible change. - New `MediaStreamer` posts photos to Telegram as soon as the `media.share` event arrives, in parallel with the model writing its follow-up text. Drains in-flight POSTs at turn end so the text reply still lands after the photo. Removes the post-turn collection step and the `mediaShares` field on `TurnOutcome` — turn-runner just forwards events now. - Truncate captions to 1024 chars at the client to match the wpcom endpoint's hard limit (matches Automattic/wpcom#213611). - Compact the remote-session system prompt addendum. - Drop `system-prompt.test.ts`; add `media-streamer.test.ts` covering serial ordering, failure isolation, and event filtering.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 14 out of 14 changed files in this pull request and generated 2 comments.
Comments suppressed due to low confidence (1)
apps/cli/remote-session/poll-loop.ts:190
handleTurn()always awaitsmediaStreamer.drain()before checkingsignal.aborted. This means a user-initiated detach can be delayed by in-flight photo uploads (andMediaStreamerdoesn’t pass the abortsignalintorespondMessage, so those POSTs can’t be cancelled). Consider short-circuiting the drain whensignal.aborted/timeout, or threading the abort signal intoMediaStreamerso pending uploads are cancelled and detach can complete promptly.
// Wait for any in-flight photos to finish posting so a text reply that
// follows them lands in chat order, even if the photo POST is still
// running when the turn ends.
const mediaSummary = await mediaStreamer.drain();
if ( outcome.sessionId && outcome.sessionId !== sessionId ) {
await deps.writeSession( target.chatId, outcome.sessionId );
}
const duration = Date.now() - started;
deps.logger.info( 'Turn finished', {
chat_id: target.chatId,
duration_ms: duration,
status: outcome.status,
exit_code: outcome.exitCode,
chars_out: outcome.replyText?.length ?? 0,
session_id: outcome.sessionId,
aborted: signal.aborted,
media_posted: mediaSummary.posted,
media_failed: mediaSummary.failed,
} );
// Detach was requested mid-turn. Skip posting any reply — the detach flow
// will announce "🔴 Local agent detached." on its own.
if ( signal.aborted ) {
return;
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Pass `deviceScaleFactor: 2` to Playwright for `share_screenshot` so the captured PNG has retina pixel density (2560x1440 raw on the desktop viewport) without changing the CSS layout the page renders against. Telegram compresses on display, but starting from 2x means noticeably more detail survives — especially text and icons on desktop clients. `take_screenshot` keeps the default 1x DPR — the model doesn't benefit from retina rendering for visual reasoning.
- `MediaStreamer.onEvent` now validates the `media.share` event shape before using fields. A malformed line (missing/wrong-typed `dataBase64` or `mimeType`) is logged and skipped instead of throwing out of the readline `'line'` listener and risking the poll loop. - `respondMessage` normalizes empty-string `text` and `photo` to "absent" up front so the early guard, body-builder branch, and debug log all agree on what counts as present. No realistic input path produces empty strings today, but the inconsistency was misleading. Both fixes from Copilot review on PR #3272.
The tool is only useful when the remote-session controller drives the agent over Telegram, and its system-prompt guidance is already gated by that flow. Excluding it from the toolset for everyone else keeps the default tool surface focused on what the prompt actually documents.
epeicher
left a comment
There was a problem hiding this comment.
Thanks for this @gcsecsey! I have tested it, and it works great 🙌 I have also added a commit to gate the new tool mcp__studio__share_screenshot behind the existing feature flag so it is not exposed until we release the changes.
| Feature flag enabled | Feature flag disabled |
|---|---|
![]() |
![]() |
And I want to share here my testing, it works like a charm!
- `media-streamer.test.ts`: `RemoteSessionLogger`'s constructor signature changed in trunk (#3282) from `(logPath: string)` to `(options: RemoteSessionLoggerOptions)`. Update the five test call sites to use `{ logPath: '/dev/null' }`. - `top-bar.tsx`: drop unused `useEffect`, `useRef`, `useAppDispatch`, and `openWapuuWorld` imports (leftover from #3270).
Brings in 53 commits since the last merge. Conflict resolution + ports: - `apps/cli/ai/agent.ts`: kept the dual-runtime dispatch from this branch; absorbed trunk's drop of `maxTurns` (no longer destructured, no longer threaded to runtimes). Body contention with #3272's inline `query()` changes (mcpServers, remoteSession, STUDIO_REMOTE_SESSION env detection) was resolved by deferring those to the Anthropic runtime layer where the `query()` call now lives. - `apps/cli/ai/runtimes/anthropic/index.ts`: ported #3272's improvements — `STUDIO_REMOTE_SESSION` env → `remoteSession` flag → systemPromptOptions (the Telegram bridge sets this when it spawns `studio code --json`). Dropped maxTurns from query() options to match trunk. - `apps/cli/ai/runtimes/types.ts`, `apps/cli/ai/runtimes/openai/index.ts`, tests: dropped maxTurns from `AgentRuntimeConfig` and all callers. - `apps/cli/ai/tools.ts` deleted on our side, modified on trunk: deleted in resolution. Trunk's #3272 additions were ported into our split-tools layout: - `apps/cli/ai/tools/screenshot-helpers.ts` (new): shared `captureScreenshotPng` + viewport constants. - `apps/cli/ai/tools/share-screenshot.ts` (new): the `share_screenshot` tool that emits a `media.share` event for the remote-session bridge. - `apps/cli/ai/tools/take-screenshot.ts`: refactored onto the shared helper; updated description to point at share_screenshot. - `apps/cli/ai/tools/index.ts`: registry includes shareScreenshotTool; `resolveStudioToolDefinitions` now uses an exclusion-set model so both preview-steering and (when not enabled) share_screenshot get filtered consistently. `createRemoteSiteTools` includes shareScreenshotTool when `isRemoteSessionEnabled()`. - `apps/cli/ai/tools/utils.ts`: ported #3286's progress-update coalescing — `update: true` now overwrites the last progress message in `captureCommandOutput` instead of appending. - `apps/cli/ai/tools/wp-cli.ts`: ported #3264's smarter `--post_content=` quote handling (flags after a quoted post_content are now parsed correctly) and the typographic-dash rejection that catches `‐porcelain` / `–color` etc. before they hit WP-CLI as silent garbage. - `apps/ui/src/components/session-view/composer/index.tsx`: trunk converted Composer to `forwardRef` with a `ComposerHandle.appendDraft` imperative method (used by trunk's annotate-toolbar hand-off). Merged the forwardRef structure with our composer-owned cross-family swap state, dialog, and hooks. Kept trunk's Escape-to-interrupt keydown handler and the textareaRef. - `apps/ui/src/components/session-view/index.tsx`: absorbed trunk's annotation handler (`handleAnnotationsDone`), the Annotation type import, and the SitePreview `onAnnotationsDone` prop. Dropped the unused-by-the-merged-body `useConnector` import. Verified: typecheck has only the pre-existing #3320 (`@wp-playground` bump) error in `apply-blueprint-form-values.ts` that's also present on plain trunk; 1532 tests pass; lint clean; CLI builds. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Brings in 53 commits since the last merge. Conflict resolution + ports: - `apps/cli/ai/agent.ts`: kept the dual-runtime dispatch from this branch; absorbed trunk's drop of `maxTurns` (no longer destructured, no longer threaded to runtimes). Body contention with #3272's inline `query()` changes (mcpServers, remoteSession, STUDIO_REMOTE_SESSION env detection) was resolved by deferring those to the Anthropic runtime layer where the `query()` call now lives. - `apps/cli/ai/runtimes/anthropic/index.ts`: ported #3272's improvements — `STUDIO_REMOTE_SESSION` env → `remoteSession` flag → systemPromptOptions (the Telegram bridge sets this when it spawns `studio code --json`). Dropped maxTurns from query() options to match trunk. - `apps/cli/ai/runtimes/types.ts`, `apps/cli/ai/runtimes/openai/index.ts`, tests: dropped maxTurns from `AgentRuntimeConfig` and all callers. - `apps/cli/ai/tools.ts` deleted on our side, modified on trunk: deleted in resolution. Trunk's #3272 additions ported into our split-tools layout: - `apps/cli/ai/tools/screenshot-helpers.ts` (new): shared `captureScreenshotPng` + viewport constants. - `apps/cli/ai/tools/share-screenshot.ts` (new): the `share_screenshot` tool that emits a `media.share` event for the remote-session bridge. - `apps/cli/ai/tools/take-screenshot.ts`: refactored onto the shared helper; updated description to point at share_screenshot. - `apps/cli/ai/tools/index.ts`: registry includes shareScreenshotTool; `resolveStudioToolDefinitions` now uses an exclusion-set model so both preview-steering and (when not enabled) share_screenshot get filtered consistently. `createRemoteSiteTools` includes shareScreenshotTool when `isRemoteSessionEnabled()`. - `apps/cli/ai/tools/utils.ts`: ported #3286's progress-update coalescing — `update: true` now overwrites the last progress message in `captureCommandOutput` instead of appending. - `apps/cli/ai/tools/wp-cli.ts`: ported #3264's smarter `--post_content=` quote handling (flags after a quoted post_content are now parsed correctly) and the typographic-dash rejection that catches `‐porcelain` / `–color` etc. before they hit WP-CLI as silent garbage. - `apps/ui/src/components/session-view/composer/index.tsx`: trunk converted Composer to `forwardRef` with a `ComposerHandle.appendDraft` imperative method (used by trunk's annotate-toolbar hand-off). Merged the forwardRef structure with our composer-owned cross-family swap state, dialog, and hooks. Kept trunk's Escape-to-interrupt keydown handler and the textareaRef. - `apps/ui/src/components/session-view/index.tsx`: absorbed trunk's annotation handler (`handleAnnotationsDone`), the Annotation type import, and the SitePreview `onAnnotationsDone` prop. Dropped the unused-by-the-merged-body `useConnector` import. Verified: typecheck has only the pre-existing #3320 (`@wp-playground` bump) error in `apply-blueprint-form-values.ts` that's also present on plain trunk; 1532 tests pass; lint clean; CLI builds. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
After PR #3272 (CLI: send screenshots from studio code to Telegram remote sessions, merged 2026-05-01) the codebase has two screenshot tools that capture a URL: take_screenshot for agent-internal visual reasoning, and share_screenshot for fire-and-forget delivery to the user. This branch's resolveScreenshotUrl helper was originally only applied to take_screenshot, but share_screenshot has the same URL-resolution gap: agents that know a local Studio site by name must already know its public URL to share a screenshot of it. Apply the same nameOrPath/path schema and resolveScreenshotUrl call to share_screenshot. Both tools now accept the same shape, both errors out cleanly when neither url nor nameOrPath is provided. Slim the test diff to a single regression test. The original branch added four behavior tests plus a mockScreenshotBrowser helper, but the regression we're guarding against is specifically 'agent passed nameOrPath, screenshot resolved correctly'. The other cases (explicit URL, path composition, error path) are nice-to-haves and can land in follow-up coverage if needed; types and existing trunk tests cover the gating and tool-list shape. ## AI assistance - **AI assistance:** Yes - **Tool(s):** Claude Code (Sonnet 4.5) - **Used for:** Drafted the share_screenshot extension and slimmed the test diff under Chris's direction. Chris reviewed the scope trade-off (extend to both tools vs leave share_screenshot for later) and chose the unconditional both-tools fix per the fix-upstream-first rule.


Related issues
studio code(PoC) #3196How AI was used in this PR
Claude wrote the bulk of the implementation and the tests. I reviewed and tested the changes end-to-end in my sandbox using the companion PR.
Proposed Changes
The Telegram remote-session bridge is currently text-only. When the agent finishes a visible task, the user gets a prose summary but no image. This PR lets the local agent deliver screenshots inline:
share_screenshottool. Captures a 16:9 above-the-fold view of a URL by default and emits amedia.shareJSON event.fullPage: trueis opt-in for the rare case where the user wants the whole scroll length.take_screenshotstays unchanged as the model-internal reasoning tool.turn-runner+poll-loop) collectsmedia.shareevents from the spawnedstudio code --jsonchild and posts each photo before the text reply.respondMessagenow picks transport based on payload. Photo present meansmultipart/form-datawith aphotofile part (matches the wpcom contract); text-only stays on the existing JSON path.STUDIO_REMOTE_SESSION=1so the system prompt knows to keep replies short, deliver visible work viashare_screenshot, follow up with a "Want me to publish this as a preview site?" line, and stop fabricating "gist stored / preview link saved" epilogues that aren't backed by any actual storage.Testing Instructions
Prerequisites
is_automattician())studio auth loginso the bearer falls through from~/.studio/shared.jsonTesting
STUDIO_ENABLE_REMOTE_SESSION=true npm run cli:buildSTUDIO_ENABLE_REMOTE_SESSION=true node apps/cli/dist/cli/main.mjs code --remote-sessiontail -F ~/.studio/remote-session.logshare_screenshotis called withfullPage: trueand the long capture is deliveredPre-merge Checklist