A tiny screen-aware assistant for macOS. Press ⌘⇧Space, type a question about what's on your screen, and Claude answers.
- Press ⌘⇧Space anywhere.
- ScreenPilot snapshots the full screen.
- A minimal floating input appears. Type your question, hit return.
- The screenshot + question goes to Claude (
claude-sonnet-4-6). - The answer appears in a floating panel you can dismiss with the ✕.
Press ⌘⇧Space again to dismiss any open overlay without asking anything.
- macOS 13 or later
- Xcode command line tools (
xcode-select --install) - An Anthropic API key
-
Add your API key. Either export it:
export ANTHROPIC_API_KEY="sk-ant-..."
or edit
Sources/ScreenPilot/API/Config.swiftand replace theYOUR_ANTHROPIC_API_KEY_HEREplaceholder. -
Build the app bundle.
./build-app.sh
This produces
.build/ScreenPilot.app. -
Run it.
open .build/ScreenPilot.app
The first time you launch, macOS will ask for two permissions:
- Accessibility — required so ScreenPilot can listen for the global ⌘⇧Space hotkey via
CGEventTap. - Screen Recording — required so ScreenPilot can grab a full-screen snapshot via
CGWindowListCreateImage.
ScreenPilot will show a prompt with a button that jumps straight to the relevant System Settings pane. After enabling each permission you must quit and relaunch (macOS doesn't reload these live for running processes).
If the hotkey doesn't fire, the Accessibility permission is almost always the cause.
Sources/ScreenPilot/
├── main.swift NSApplication bootstrap
├── AppDelegate.swift permissions + wiring
├── AssistantCoordinator.swift core loop orchestrator
├── System/
│ ├── HotkeyManager.swift CGEventTap for ⌘⇧Space
│ ├── ScreenshotCapture.swift CGWindowListCreateImage → PNG
│ └── PermissionsManager.swift Accessibility + Screen Recording
├── API/
│ ├── Config.swift model + API key + system prompt
│ ├── ChatMessage.swift message/content types + RequestContext
│ └── ClaudeClient.swift Anthropic messages API client
└── Overlay/
├── OverlayPanel.swift frameless always-on-top NSPanel
├── InputOverlayController.swift panel lifecycle for input
├── InputView.swift SwiftUI text field
├── ResponseOverlayController.swift
└── ResponseView.swift SwiftUI response panel
AppKit runs the plumbing (windows, event taps, permissions). SwiftUI renders the two overlays. That split keeps each concern in whichever framework is best at it.
- Voice input. Input enters through
InputOverlayController. AVoiceOverlayControllerwould satisfy the same contract (onSubmit(String)). The coordinator doesn't care which one fed it text. - Conversation memory.
ClaudeClient.askalready takes ahistory: [ChatMessage]array. V1 passes[]each call; wiring persistent session history is a field onAssistantCoordinator. - Active app / window context.
ClaudeClient.askaccepts aRequestContext?struct. Populating it (viaNSWorkspace.shared.frontmostApplication, AX APIs, etc.) and passing it through is an additive change. - Pointing / highlighting.
OverlayPanelis the shared base for any floating layer. AHighlightOverlayControllercan sit alongside the input/response ones and draw boxes/arrows on screen coordinates Claude returns. - Automation / click actions. These go in a new module (
Automation/) and are triggered by the coordinator, not the client. The API layer stays pure.