Your invisible desktop AI assistant. Powered by Backboard.
git clone [repository-url]
cd caddy
npm installCreate a .env file in the root:
BACKBOARD_API_KEY=your_api_key_hereThe app requires these permissions (System Settings > Privacy & Security):
| Permission | Why | Grant via |
|---|---|---|
| Screen Recording | Screenshot capture and screen watch OCR | Privacy & Security > Screen Recording |
| Microphone | Live audio transcription | Prompted on first use |
| Accessibility | Global keyboard shortcuts (Cmd+B, Cmd+H) | Privacy & Security > Accessibility |
If shortcuts or screenshots aren't working, check that the Electron app (or Terminal, during development) is listed and enabled under each permission.
By default the app records your microphone. To capture system audio (e.g., a Zoom call, a recorded meeting), install BlackHole and create a Multi-Output Device so you can hear audio AND the app can capture it simultaneously.
1. Install BlackHole
brew install blackhole-2ch2. Create a Multi-Output Device
- Open Audio MIDI Setup (Spotlight search or
/Applications/Utilities/) - Click + in the bottom-left corner > Create Multi-Output Device
- Check both:
- Your headphones or speakers
- BlackHole 2ch
- Rename it to something like "Headphones + BlackHole" (right-click the name)
3. Route audio
- Set your Mac's Sound Output to the new Multi-Output Device (System Settings > Sound > Output)
- In the app, click the ▾ dropdown next to the mic button and select BlackHole 2ch as the audio input
Now audio plays through your headphones (so you can hear) and is simultaneously routed to BlackHole (so the app can transcribe it).
./start.shThis starts the transcription server on port 5111 and the Electron app together.
Caddy is an Electron overlay that sits invisibly on top of your screen. It captures screenshots, transcribes audio, and routes everything through Backboard for LLM analysis and memory.
| Component | Role |
|---|---|
electron/ |
Invisible overlay, screenshot capture, keyboard shortcuts |
api/ |
Flask backend — all LLM + memory via Backboard SDK |
transcribe/ |
Local Whisper transcription + Tesseract OCR server |
src/ |
React frontend (queue, solutions, chat) |
| Shortcut | Action |
|---|---|
Cmd+B |
Toggle window visibility |
Cmd+H |
Take screenshot |
Cmd+Enter |
Get solution |
Cmd+Arrow Keys |
Move window |
Cmd+Q |
Quit |
- Invisible overlay — translucent, always-on-top, undetectable
- Screenshot analysis — capture anything on screen, get instant AI answers
- Audio intelligence — real-time transcription and analysis
- Contextual chat — ask follow-up questions with full conversation memory
- Model selector — switch between any LLM provider available on Backboard
- Screen watch — periodic OCR + LLM analysis of what's on screen
- Cross-platform — macOS, Windows, Linux
If the app doesn't start, make sure nothing is using ports 5111 (transcription) or 5180 (vite dev server):
lsof -i :5111 :5180For Sharp/Python build errors:
rm -rf node_modules package-lock.json
SHARP_IGNORE_GLOBAL_LIBVIPS=1 npm install --ignore-scripts
npm rebuild sharpISC