PixelPilot is a Windows desktop AI agent for real computer work. It combines a desktop shell, a Python runtime, provider-aware live/request model adapters, and optional hosted backend services so users can automate tasks through natural language.
- Runs as a Windows desktop agent with a compact overlay UI.
- Uses PixelPilot Live for typed and voice-driven interaction with native realtime providers (Gemini, OpenAI) plus local Ollama live mode with local ASR, Kokoro ONNX speech output, and low-FPS visual context refresh.
- Automates desktop tasks with keyboard, mouse, UI Automation, OCR, and vision fallbacks.
- Supports "Hey Pixie" Wake Word detection for hands-free interaction.
- Supports Speaker Identification (Voiceprint) for personalized/secure control.
- Supports browser-first account login for hosted backend mode.
- Supports direct mode with provider keys such as
GEMINI_API_KEY,OPENAI_API_KEY,ANTHROPIC_API_KEY,XAI_API_KEY,OPENROUTER_API_KEY,VERCEL_AI_GATEWAY_API_KEY, or a local Ollama endpoint.
GUIDANCE: read-only assistance, no desktop mutations.SAFE: mutating actions require confirmation.AUTO: mutating actions run without per-step confirmation.
PixelPilot uses a granular permission system via settings.json. You can define rules for specific tools:
{
"toolPolicy": {
"allow": ["Browser(open)", "Media(*)"],
"deny": ["System(lock)"],
"ask": ["*"]
}
}Rules support exact matches or wildcards (*).
PixelPilot supports two primary ways to start:
-
Direct modeAdd a provider key locally, or configurePIXELPILOT_MODEL_PROVIDER=ollamaandPIXELPILOT_LIVE_PROVIDER=ollama, and launch the app without a login gate. -
Hosted backend modeConfigureBACKEND_URL, launch the app, and sign in from the browser. The browser returns to the desktop app through thepixelpilot://deep-link flow.
Desktop backend sessions are stored in Windows Credential Manager.
Use the Windows installer and launch PixelPilot.
-
Configure Environment Variables Create a root
.envfromenv.example:PIXELPILOT_MODEL_PROVIDER=gemini PIXELPILOT_LIVE_PROVIDER=gemini PIXELPILOT_MODEL=gemini-3-flash-preview PIXELPILOT_LIVE_MODEL=gemini-3.1-flash-live-preview GEMINI_API_KEY=your_api_key_here BACKEND_URL=http://localhost:8000 WEB_URL=http://localhost:5173
For local Ollama live mode:
PIXELPILOT_MODEL_PROVIDER=ollama PIXELPILOT_LIVE_PROVIDER=ollama PIXELPILOT_MODEL=gemma4:e2b-it-bf16 PIXELPILOT_LIVE_MODEL=gemma4:e2b-it-bf16 OLLAMA_BASE_URL=http://localhost:11434 LOCAL_ASR_MODEL=base.en LOCAL_TTS_ENABLED=true OLLAMA_LIVE_FRAME_LOOP_ENABLED=true OLLAMA_LIVE_FRAME_LOOP_FPS=1.0
Download the Kokoro model assets to the
models/directory: -
Setup Python Runtime The desktop app requires a local Python virtual environment in the root directory:
python -m venv venv .\venv\Scripts\activate pip install -r requirements.txt
-
Install and Build Desktop App
cd desktop npm install npm run build npm start
Backend services live in backend/.
cd backend
pip install -r requirements.txt
uvicorn main:app --host 0.0.0.0 --port 8000 --reloadMinimal backend/.env:
PIXELPILOT_MODEL_PROVIDER=gemini
PIXELPILOT_LIVE_PROVIDER=gemini
GEMINI_API_KEY=your_backend_key
MONGODB_URI=your_mongodb_uri
REDIS_URI=redis://localhost:6379
JWT_SECRET=your_jwt_secret
GOOGLE_CLIENT_ID=your_google_client_id
GOOGLE_CLIENT_SECRET=your_google_client_secret
GOOGLE_REDIRECT_URI=http://localhost:8000/auth/google/callback
WEB_URL=http://localhost:5173
LIVE_SESSION_SECONDS_PER_DAY=600The public site and hosted auth pages live in web/.
cd web
npm install
npm run devUse web/.env.local:
VITE_BACKEND_URL=http://localhost:8000desktop/: Electron shell, renderer UI, preload bridge, Windows packaging.src/: Python runtime containing:live/: Gemini Live, OpenAI Realtime, and local Ollama live orchestration.tools/: Automation tools, OCR, and app indexing.uac/: SYSTEM-level orchestrator for Secure Desktop / UAC prompts.wakeword/: OpenWakeWord detection logic ("Hey Pixie").runtime/: Bootstrap and service glue for packaged binaries.
backend/: FastAPI auth, Google OAuth, OCR services, rate limits, Live relay.web/: landing site, hosted auth pages, public docs.
# Desktop tests
cd desktop
npm test
# Web production build
cd web
npm run build
# Python diagnostics
python src/main.py doctor- No login-free startup: check
PIXELPILOT_MODEL_PROVIDERand the matching provider key orOLLAMA_BASE_URL. - Ollama local live mode: install
gemma4:e2b-it-bf16, keepOLLAMA_BASE_URL=http://localhost:11434, installfaster-whisperandkokoro-onnxfromrequirements.txt. Ensurekokoro-v1.0.onnxandvoices-v1.0.binare in themodels/directory. If audio input fails on GPU systems, ensure you have the required CUDA/cuDNN DLLs (e.g.,cublas64_12.dll); the app will automatically fallback to CPU mode if they are missing. Runpython src/main.py doctorto verify. - Hosted sign-in issues: check
BACKEND_URL,WEB_URL, MongoDB, Redis, and Google OAuth config. - Runtime issues: check
logs/pixelpilot.log. - UAC issues: reinstall from the MSI as Administrator.