Skip to content

YonatanBest/Pixel-Pilot

 
 

Repository files navigation

PixelPilot

PixelPilot Logo

PixelPilot is a Windows desktop AI agent for real computer work. It combines a desktop shell, a Python runtime, provider-aware live/request model adapters, and optional hosted backend services so users can automate tasks through natural language.

Tech Stack

Electron React TypeScript Tailwind CSS Python FastAPI MongoDB Redis

What PixelPilot Does

  • Runs as a Windows desktop agent with a compact overlay UI.
  • Uses PixelPilot Live for typed and voice-driven interaction with native realtime providers (Gemini, OpenAI) plus local Ollama live mode with local ASR, Kokoro ONNX speech output, and low-FPS visual context refresh.
  • Automates desktop tasks with keyboard, mouse, UI Automation, OCR, and vision fallbacks.
  • Supports "Hey Pixie" Wake Word detection for hands-free interaction.
  • Supports Speaker Identification (Voiceprint) for personalized/secure control.
  • Supports browser-first account login for hosted backend mode.
  • Supports direct mode with provider keys such as GEMINI_API_KEY, OPENAI_API_KEY, ANTHROPIC_API_KEY, XAI_API_KEY, OPENROUTER_API_KEY, VERCEL_AI_GATEWAY_API_KEY, or a local Ollama endpoint.

Main Modes

  • GUIDANCE: read-only assistance, no desktop mutations.
  • SAFE: mutating actions require confirmation.
  • AUTO: mutating actions run without per-step confirmation.

Operation & Permissions

PixelPilot uses a granular permission system via settings.json. You can define rules for specific tools:

{
  "toolPolicy": {
    "allow": ["Browser(open)", "Media(*)"],
    "deny": ["System(lock)"],
    "ask": ["*"]
  }
}

Rules support exact matches or wildcards (*).

Login Model

PixelPilot supports two primary ways to start:

  1. Direct mode Add a provider key locally, or configure PIXELPILOT_MODEL_PROVIDER=ollama and PIXELPILOT_LIVE_PROVIDER=ollama, and launch the app without a login gate.

  2. Hosted backend mode Configure BACKEND_URL, launch the app, and sign in from the browser. The browser returns to the desktop app through the pixelpilot:// deep-link flow.

Desktop backend sessions are stored in Windows Credential Manager.

Quick Start

End users

Use the Windows installer and launch PixelPilot.

Local desktop development

  1. Configure Environment Variables Create a root .env from env.example:

    PIXELPILOT_MODEL_PROVIDER=gemini
    PIXELPILOT_LIVE_PROVIDER=gemini
    PIXELPILOT_MODEL=gemini-3-flash-preview
    PIXELPILOT_LIVE_MODEL=gemini-3.1-flash-live-preview
    GEMINI_API_KEY=your_api_key_here
    BACKEND_URL=http://localhost:8000
    WEB_URL=http://localhost:5173

    For local Ollama live mode:

    PIXELPILOT_MODEL_PROVIDER=ollama
    PIXELPILOT_LIVE_PROVIDER=ollama
    PIXELPILOT_MODEL=gemma4:e2b-it-bf16
    PIXELPILOT_LIVE_MODEL=gemma4:e2b-it-bf16
    OLLAMA_BASE_URL=http://localhost:11434
    LOCAL_ASR_MODEL=base.en
    LOCAL_TTS_ENABLED=true
    OLLAMA_LIVE_FRAME_LOOP_ENABLED=true
    OLLAMA_LIVE_FRAME_LOOP_FPS=1.0

    Download the Kokoro model assets to the models/ directory:

  2. Setup Python Runtime The desktop app requires a local Python virtual environment in the root directory:

    python -m venv venv
    .\venv\Scripts\activate
    pip install -r requirements.txt
  3. Install and Build Desktop App

    cd desktop
    npm install
    npm run build
    npm start

Optional backend development

Backend services live in backend/.

cd backend
pip install -r requirements.txt
uvicorn main:app --host 0.0.0.0 --port 8000 --reload

Minimal backend/.env:

PIXELPILOT_MODEL_PROVIDER=gemini
PIXELPILOT_LIVE_PROVIDER=gemini
GEMINI_API_KEY=your_backend_key
MONGODB_URI=your_mongodb_uri
REDIS_URI=redis://localhost:6379
JWT_SECRET=your_jwt_secret
GOOGLE_CLIENT_ID=your_google_client_id
GOOGLE_CLIENT_SECRET=your_google_client_secret
GOOGLE_REDIRECT_URI=http://localhost:8000/auth/google/callback
WEB_URL=http://localhost:5173
LIVE_SESSION_SECONDS_PER_DAY=600

Optional web development

The public site and hosted auth pages live in web/.

cd web
npm install
npm run dev

Use web/.env.local:

VITE_BACKEND_URL=http://localhost:8000

Project Structure

  • desktop/: Electron shell, renderer UI, preload bridge, Windows packaging.
  • src/: Python runtime containing:
    • live/: Gemini Live, OpenAI Realtime, and local Ollama live orchestration.
    • tools/: Automation tools, OCR, and app indexing.
    • uac/: SYSTEM-level orchestrator for Secure Desktop / UAC prompts.
    • wakeword/: OpenWakeWord detection logic ("Hey Pixie").
    • runtime/: Bootstrap and service glue for packaged binaries.
  • backend/: FastAPI auth, Google OAuth, OCR services, rate limits, Live relay.
  • web/: landing site, hosted auth pages, public docs.

Useful Commands

# Desktop tests
cd desktop
npm test

# Web production build
cd web
npm run build

# Python diagnostics
python src/main.py doctor

Troubleshooting

  • No login-free startup: check PIXELPILOT_MODEL_PROVIDER and the matching provider key or OLLAMA_BASE_URL.
  • Ollama local live mode: install gemma4:e2b-it-bf16, keep OLLAMA_BASE_URL=http://localhost:11434, install faster-whisper and kokoro-onnx from requirements.txt. Ensure kokoro-v1.0.onnx and voices-v1.0.bin are in the models/ directory. If audio input fails on GPU systems, ensure you have the required CUDA/cuDNN DLLs (e.g., cublas64_12.dll); the app will automatically fallback to CPU mode if they are missing. Run python src/main.py doctor to verify.
  • Hosted sign-in issues: check BACKEND_URL, WEB_URL, MongoDB, Redis, and Google OAuth config.
  • Runtime issues: check logs/pixelpilot.log.
  • UAC issues: reinstall from the MSI as Administrator.

Repository

About

PixelPilot Windows AI Automation

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 73.0%
  • TypeScript 23.4%
  • CSS 2.4%
  • JavaScript 1.1%
  • Other 0.1%