GitHub - Bambushu/screenread: Your AI agent doesn't need screenshots to read text. ScreenRead gives it the macOS accessibility tree in ~100ms. CLI + MCP server.

Read what's on screen without taking a screenshot.

ScreenRead gives AI agents access to the macOS accessibility tree — the same structured data that powers VoiceOver and other screen readers. Instead of capturing pixels and feeding them through vision models, your agent gets instant, structured text describing every UI element on screen.

~100ms instead of 1-3 seconds. Zero hallucination — it reads what the OS knows, not what a model thinks it sees.

Why

Most AI agent tooling uses screenshots to "see" the screen:

Capture PNG (~200ms)
Base64 encode and transfer (~500KB-2MB)
Vision model processes pixels (expensive, slow)
Model describes what it thinks it sees (sometimes wrong)

But ~90% of agent tasks are text-based: "what does the error say?", "is this button visible?", "what's the page title?". Screenshots are overkill.

ScreenRead skips all of that. It asks macOS directly: "What UI elements exist in this window?" and returns structured text instantly.

	Screenshots	ScreenRead
Speed	1-3 seconds	~100ms
Token cost	High (vision model)	Low (text)
Accuracy	Can hallucinate text	Exact (reads from OS)
Scope	Web only (Playwright) or full screen	Any macOS app
Good for	Visual checks (layout, colors)	Content verification, UI state

Use ScreenRead for the 90% of tasks that are about content and structure. Keep screenshots for the 10% that need pixels.

Install

Build from source

git clone https://github.com/Bambushu/screenread.git
cd screenread
swift build -c release
cp .build/release/screenread ~/.local/bin/
cp .build/release/screenread-mcp ~/.local/bin/

Requirements

macOS 13+ (Ventura or later)
Accessibility permission (System Settings > Privacy & Security > Accessibility)

Usage

CLI

# Read the frontmost app
screenread

# Read a specific app
screenread --app Safari

# Fuzzy match a window title
screenread --window "inbox"

# Text only (no structure)
screenread --app Warp --text-only

# Shallow read (depth 2)
screenread --app Finder --shallow

# Full text, no truncation
screenread --app Terminal --full

# JSON output
screenread --app Safari --json

# Search for text across all open windows
screenread --find "error"
screenread --find "Submit"

# List all open windows
screenread --list

# Filter by role
screenread --app Safari --role AXButton,AXLink

# Exclude roles
screenread --app Safari --ignore AXGroup,AXScrollArea

# List interactive elements with click coordinates
screenread --clickable --app Safari
screenread --clickable --app Finder --json

# Watch for UI changes (poll every 2s, Ctrl+C to stop)
screenread --watch --app Safari
screenread --watch --app Safari --interval 5

# Stream JSONL (one JSON object per node per line)
screenread --stream --app Safari
screenread --stream --app Safari | jq 'select(.role == "AXButton")'

MCP Server

Add to your MCP config:

Claude Code (project-scoped .mcp.json in your project root):

{
  "mcpServers": {
    "screenread": {
      "command": "screenread-mcp"
    }
  }
}

Claude Desktop (~/Library/Application Support/Claude/claude_desktop_config.json):

{
  "mcpServers": {
    "screenread": {
      "command": "/path/to/screenread-mcp"
    }
  }
}

This exposes five tools to any MCP-compatible client:

`screenread_snapshot`

Read the accessibility tree of a specific app or window. With no parameters, reads the frontmost (active) app.

Parameter	Type	Description
`app`	string	App name (e.g. `"Safari"`)
`window`	string	Fuzzy match on window title
`pid`	integer	Target by process ID
`depth`	integer	Max tree depth (default: 5). Use 0 for unlimited — may be slow on large apps.
`textOnly`	boolean	Text only, no structure
`roles`	string	Comma-separated AX roles to include (e.g. `"AXButton,AXLink"`)
`ignore`	string	Comma-separated AX roles to exclude (e.g. `"AXGroup,AXScrollArea"`)

`screenread_list`

List all open windows. Returns one line per window in the format: AppName [PID] — Window Title. No parameters.

`screenread_find_text`

Search for visible text across all open windows. Plain substring match (no regex).

Parameter	Type	Description
`query`	string	Plain text substring to search for (required)
`caseSensitive`	boolean	Case-sensitive (default: false)

Results are capped at 100 matches. Use screenread_snapshot with a specific app for more targeted results.

`screenread_clickable`

List interactive elements (buttons, links, text fields) with their click coordinates.

Parameter	Type	Description
`app`	string	App name (e.g. `"Safari"`)
`window`	string	Fuzzy match on window title
`pid`	integer	Target by process ID
`roles`	string	Override default interactive roles (e.g. `"AXButton,AXLink"`)

Returns a table with role, label, center x/y coordinates, and state (enabled/disabled/focused/selected).

`screenread_watch`

Watch an app for UI changes over a duration.

Parameter	Type	Description
`app`	string	App name (e.g. `"Safari"`)
`window`	string	Fuzzy match on window title
`pid`	integer	Target by process ID
`duration`	integer	How long to watch in seconds (default: 10, max: 60)
`interval`	integer	Poll interval in seconds (default: 2, min: 1)
`textOnly`	boolean	Compare text content only (default: false)

Polls the accessibility tree at the given interval and reports additions, removals, and value/state changes.

Architecture

screenread/
├── Sources/
│   ├── ScreenReadCore/     # Shared library
│   │   ├── AXHelpers.swift       # Shared AX attribute accessors
│   │   ├── AXTreeWalker.swift    # Recursive accessibility tree traversal + streaming callback
│   │   ├── Formatter.swift       # Text tree, text-only, JSON, clickable output
│   │   ├── MCPProtocol.swift     # JSON-RPC types, tool dispatch, parameter validation
│   │   ├── StreamFormatter.swift # JSONL single-node encoder
│   │   ├── TargetResolver.swift  # App/window/PID resolution with fuzzy matching
│   │   ├── TreeDiffer.swift      # Compare two tree snapshots for changes
│   │   └── Types.swift           # AXNode, WalkResult, WindowInfo, errors
│   ├── screenread/         # CLI (uses ArgumentParser)
│   └── screenread-mcp/     # MCP server (Content-Length framed stdio)
└── Tests/
    └── ScreenReadCoreTests/  # 27 tests across 4 suites

The core library (ScreenReadCore) does all the work. Both the CLI and MCP server are thin wrappers around it.

Platform

macOS only. ScreenRead uses Apple's AXUIElement accessibility API, which has no equivalent on other platforms. Linux would need AT-SPI, Windows needs UI Automation — fundamentally different APIs.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
Sources		Sources
Tests/ScreenReadCoreTests		Tests/ScreenReadCoreTests
assets		assets
docs/superpowers		docs/superpowers
.gitignore		.gitignore
LICENSE		LICENSE
Package.resolved		Package.resolved
Package.swift		Package.swift
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Why

Install

Build from source

Requirements

Usage

CLI

MCP Server

`screenread_snapshot`

`screenread_list`

`screenread_find_text`

`screenread_clickable`

`screenread_watch`

Architecture

Platform

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Why

Install

Build from source

Requirements

Usage

CLI

MCP Server

screenread_snapshot

screenread_list

screenread_find_text

screenread_clickable

screenread_watch

Architecture

Platform

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`screenread_snapshot`

`screenread_list`

`screenread_find_text`

`screenread_clickable`

`screenread_watch`

Packages