Shadow is a floating macOS AI agent powered by Gemini Live API that helps you control your computer with voice and vision. Talk to Shadow, and it sees your screen, opens apps, organizes files, searches the web, creates documents, and more.
Watch the 4-minute demo video →
- Voice Control — Tap the character to talk. Shadow listens via Gemini's native audio streaming.
- Screen Vision — Shares your screen in real-time. Shadow can see and interact with any app.
- 30 Tools — File management, app control, mouse/keyboard automation, Google Search, image generation, PDF/DOCX creation & editing, and more.
- Two Modes:
- Guided — Asks before every action, highlights targets on screen
- Auto — Acts immediately, narrates what it's doing
- Multilingual — Switch languages on the fly (English, Spanish, French, German, Portuguese, Arabic, Chinese, Japanese, Korean, Hindi)
- Document Creation — Creates and edits Word documents (.docx) with AI-generated images, fills PDF forms
- Persistent Memory — Remembers your preferences across sessions
| Component | Technology |
|---|---|
| Live Voice + Vision | gemini-2.5-flash-native-audio via Gemini Live API |
| Tool Planning + Search | gemini-3-flash-preview with Google Search grounding |
| Image Generation | gemini-3.1-flash-image-preview |
| Screen Vision | gemini-3.1-flash-lite-preview |
| macOS App | Swift, AppKit, AVFoundation, ScreenCaptureKit |
| Backend | Python, FastAPI, WebSocket, Google GenAI SDK |
| Deployment | Google Cloud Run + Artifact Registry |
- macOS 14+ (Sonoma or later)
- Xcode 15+
- Python 3.11+
- Google AI API key (get one here)
git clone https://github.com/AnassKartit/shadow.git
cd shadow/backend
# Create virtual environment
python3 -m venv venv
source venv/bin/activate
# Install dependencies
pip install -r requirements.txt
# Set your API key
echo "GOOGLE_API_KEY=your-key-here" > .env
# Start the backend
python main.pyThe backend runs at ws://localhost:8000/ws. Health check: curl http://localhost:8000/health
# Open in Xcode
open Shadow/Shadow.xcodeproj
# Or build from command line
cd Shadow && xcodebuild -scheme Shadow -configuration Debug build- Grant Microphone and Screen Recording permissions when prompted
- The Shadow character appears as a floating panel on the right side of your screen
- Click the character to start talking
Or deploy manually:
cd backend
gcloud run deploy shadow-backend \
--source . \
--region us-central1 \
--allow-unauthenticated \
--set-env-vars "GOOGLE_API_KEY=your-key-here,SHADOW_AUTH_TOKEN=your-secret-token"Click the gear icon in Shadow to open settings:
- Gemini API Key — Optional if your backend already has
GOOGLE_API_KEYset - Auth Token — Must match the
SHADOW_AUTH_TOKENon your backend (protects your endpoint from unauthorized use) - Backend — Choose Cloud (default), Local, or Custom URL for your own deployment
shadow/
├── Shadow/Shadow/ # Swift macOS app
│ ├── ShadowApp.swift # App delegate, setup, message routing
│ ├── FloatingPanel.swift # Floating character UI
│ ├── OverlayWindow.swift # Transparent highlight overlay
│ ├── AudioManager.swift # Mic capture + audio playback
│ ├── ScreenCapture.swift # Screen frame capture
│ ├── BackendClient.swift # WebSocket client
│ └── HotkeyManager.swift # Global keyboard shortcuts
├── backend/
│ ├── main.py # FastAPI WebSocket + Gemini Live session
│ ├── shadow_agent/
│ │ ├── agent.py # System prompt + tool registration
│ │ └── tools.py # 30 computer-use tools
│ ├── Dockerfile # Cloud Run deployment
│ └── requirements.txt
└── README.md
| Category | Tools |
|---|---|
| Files (9) | list_files, search_files, read_file, create_folder, move_files, organize_desktop_files, find_duplicates, find_large_files, find_recent_file |
| Apps & System (6) | open_app, open_file, open_url, focus_app, applescript, run_command |
| Screen & Input (5) | computer_action, click_mouse, type_text, press_key, analyze_screen |
| Documents (7) | read_pdf, read_docx, create_docx, edit_docx, fill_pdf, get_pdf_fields, create_note_with_images |
| Search (1) | google_search_and_open (with grounding) |
| Images (2) | generate_explainer, generate_and_paste |
| Requirement | How Shadow Meets It |
|---|---|
| Gemini Live API | gemini-2.5-flash-native-audio for real-time voice + screen streaming |
| Gemini model | gemini-3-flash-preview for tool planning and search |
| Google GenAI SDK | Direct SDK integration for Live API + all tool calls |
| Google Cloud service | Backend deployed on Cloud Run (us-central1) |
| Multimodal | Voice in + screen vision + voice out + file/app actions |
| Beyond text box | Floating character, screen highlights, direct computer control |
MIT
