Skip to content

AnassKartit/shadow

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Shadow — Your AI Coworker for macOS

Shadow is a floating macOS AI agent powered by Gemini Live API that helps you control your computer with voice and vision. Talk to Shadow, and it sees your screen, opens apps, organizes files, searches the web, creates documents, and more.

Demo

Watch the 4-minute demo video →

Features

  • Voice Control — Tap the character to talk. Shadow listens via Gemini's native audio streaming.
  • Screen Vision — Shares your screen in real-time. Shadow can see and interact with any app.
  • 30 Tools — File management, app control, mouse/keyboard automation, Google Search, image generation, PDF/DOCX creation & editing, and more.
  • Two Modes:
    • Guided — Asks before every action, highlights targets on screen
    • Auto — Acts immediately, narrates what it's doing
  • Multilingual — Switch languages on the fly (English, Spanish, French, German, Portuguese, Arabic, Chinese, Japanese, Korean, Hindi)
  • Document Creation — Creates and edits Word documents (.docx) with AI-generated images, fills PDF forms
  • Persistent Memory — Remembers your preferences across sessions

Architecture

Shadow Architecture

Tech Stack

Component Technology
Live Voice + Vision gemini-2.5-flash-native-audio via Gemini Live API
Tool Planning + Search gemini-3-flash-preview with Google Search grounding
Image Generation gemini-3.1-flash-image-preview
Screen Vision gemini-3.1-flash-lite-preview
macOS App Swift, AppKit, AVFoundation, ScreenCaptureKit
Backend Python, FastAPI, WebSocket, Google GenAI SDK
Deployment Google Cloud Run + Artifact Registry

Quick Start

Prerequisites

  • macOS 14+ (Sonoma or later)
  • Xcode 15+
  • Python 3.11+
  • Google AI API key (get one here)

1. Clone & Setup Backend

git clone https://github.com/AnassKartit/shadow.git
cd shadow/backend

# Create virtual environment
python3 -m venv venv
source venv/bin/activate

# Install dependencies
pip install -r requirements.txt

# Set your API key
echo "GOOGLE_API_KEY=your-key-here" > .env

# Start the backend
python main.py

The backend runs at ws://localhost:8000/ws. Health check: curl http://localhost:8000/health

2. Build & Run the Swift App

# Open in Xcode
open Shadow/Shadow.xcodeproj

# Or build from command line
cd Shadow && xcodebuild -scheme Shadow -configuration Debug build
  1. Grant Microphone and Screen Recording permissions when prompted
  2. The Shadow character appears as a floating panel on the right side of your screen
  3. Click the character to start talking

3. Deploy Backend to Google Cloud (One Click)

Run on Google Cloud

Or deploy manually:

cd backend
gcloud run deploy shadow-backend \
  --source . \
  --region us-central1 \
  --allow-unauthenticated \
  --set-env-vars "GOOGLE_API_KEY=your-key-here,SHADOW_AUTH_TOKEN=your-secret-token"

4. Configure the App

Click the gear icon in Shadow to open settings:

  • Gemini API Key — Optional if your backend already has GOOGLE_API_KEY set
  • Auth Token — Must match the SHADOW_AUTH_TOKEN on your backend (protects your endpoint from unauthorized use)
  • Backend — Choose Cloud (default), Local, or Custom URL for your own deployment

Project Structure

shadow/
├── Shadow/Shadow/           # Swift macOS app
│   ├── ShadowApp.swift      # App delegate, setup, message routing
│   ├── FloatingPanel.swift  # Floating character UI
│   ├── OverlayWindow.swift  # Transparent highlight overlay
│   ├── AudioManager.swift   # Mic capture + audio playback
│   ├── ScreenCapture.swift  # Screen frame capture
│   ├── BackendClient.swift  # WebSocket client
│   └── HotkeyManager.swift  # Global keyboard shortcuts
├── backend/
│   ├── main.py              # FastAPI WebSocket + Gemini Live session
│   ├── shadow_agent/
│   │   ├── agent.py         # System prompt + tool registration
│   │   └── tools.py         # 30 computer-use tools
│   ├── Dockerfile           # Cloud Run deployment
│   └── requirements.txt
└── README.md

Tools (30)

Category Tools
Files (9) list_files, search_files, read_file, create_folder, move_files, organize_desktop_files, find_duplicates, find_large_files, find_recent_file
Apps & System (6) open_app, open_file, open_url, focus_app, applescript, run_command
Screen & Input (5) computer_action, click_mouse, type_text, press_key, analyze_screen
Documents (7) read_pdf, read_docx, create_docx, edit_docx, fill_pdf, get_pdf_fields, create_note_with_images
Search (1) google_search_and_open (with grounding)
Images (2) generate_explainer, generate_and_paste

Hackathon Requirements

Requirement How Shadow Meets It
Gemini Live API gemini-2.5-flash-native-audio for real-time voice + screen streaming
Gemini model gemini-3-flash-preview for tool planning and search
Google GenAI SDK Direct SDK integration for Live API + all tool calls
Google Cloud service Backend deployed on Cloud Run (us-central1)
Multimodal Voice in + screen vision + voice out + file/app actions
Beyond text box Floating character, screen highlights, direct computer control

License

MIT

About

Shadow — AI Desktop Assistant for macOS. Voice + vision powered by Gemini Live API. Built for Gemini Live Agent Challenge 2026.

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages