Skip to content

Improve agent discovery and text expansion for large text surfaces #253

@thymikee

Description

@thymikee

Summary

Large text surfaces such as code editors, document views, chats, logs, and rich text panes are currently awkward for agent use.

Today snapshot -i may expose these surfaces by dumping a partial text blob, which is expensive in tokens and still incomplete for discovery. At the same time, get text is snapshot-derived, so if the snapshot text is truncated there is no reliable way to expand that same surface into the full visible text.

A concrete example is Android Studio on macOS: the editor is exposed as a TextView, but the interactive snapshot shows only a partial code fragment, while important visible content may still be missing.

Problem

We need a cleaner split between:

  • discovery: what visible text surfaces exist and which one should the agent inspect next
  • extraction: retrieving the text for a chosen surface after discovery

Without that split, snapshots either:

  • spend too many tokens on giant text blobs, or
  • truncate text in a way that makes the hidden content unreachable

Goals

  • Keep large visible text surfaces discoverable in snapshot -i
  • Make snapshot -i summarize those surfaces semantically instead of dumping long text bodies
  • Add a reliable way to expand a selected text surface after discovery
  • Keep the design transferable across macOS, iOS, and Android

Proposed Direction

  1. Add runner-backed text extraction for element-targeted reads instead of relying only on the stored snapshot node.
  2. Update snapshot -i rendering for large text surfaces (TextView, TextField, editor-like panes, etc.) to prefer semantic labels plus a short preview.
  3. Mark truncation explicitly so agents know more content exists.
  4. Include useful metadata when available, such as editable, scrollable, focused, or similar state.

Example desired shape:

@e32 [text-view] "Editor for MainActivity.kt" [editable] [scrollable] [preview:"package com.example..."] [truncated]

Acceptance Criteria

  • snapshot -i shows large visible text surfaces as first-class nodes without dumping the full body by default.
  • Agents can follow up with an element-targeted text read and retrieve the visible or full text for that surface.
  • Truncation is explicit in discovery output.
  • Behavior works consistently enough to support desktop editors on macOS and similar large text surfaces on iOS/Android.

Non-Goals

  • OCR fallback for now. Agents can use screenshots separately when needed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions