Run large AI models across your existing devices. No cloud, no new hardware, no cost.
git clone --recurse-submodules https://github.com/YOUR_USERNAME/YOUR_REPO
cd YOUR_REPO
chmod +x setup.sh start.sh
./setup.sh # one-time setup (~5 min)
./start.sh # run every timeOpen http://localhost:5500 in your browser.
- Splits AI models across all devices on your WiFi network
- Each device holds a shard of the model in RAM
- Run 20B+ parameter models that wouldn't fit on a single machine
- Tracks CO₂, water, and energy saved vs cloud GPU equivalents
On any other Mac on the same WiFi:
git clone --recurse-submodules https://github.com/YOUR_USERNAME/YOUR_REPO
cd YOUR_REPO
./setup.sh
./start.shIt joins automatically. No configuration needed.
- macOS (Apple Silicon or Intel)
- Python 3.13 (installed automatically)
- Rust (installed automatically)
- 8GB+ RAM recommended
- Same WiFi network as other nodes
| Model | Size | Status |
|---|---|---|
| Llama 3.2 1B 4bit | 696MB | Downloaded on setup |
| Qwen3 0.6B 8bit | 666MB | Available |
| Llama 3.2 3B 4bit | 1.7GB | Download via UI |
| Llama 3.1 8B 4bit | 4.3GB | Download via UI |
[Your Mac] [Friend's Mac] [Any Mac on WiFi]
exo + frontend + exo worker + exo worker
└──────────────────────────────────────┘
Auto-discovered via mDNS
│
http://localhost:5500
Every token processed locally avoids:
- ~0.18g CO₂ per 1000 tokens (vs AWS GPU)
- ~2.4mL water per 1000 tokens (vs data center cooling)
- ~0.38Wh energy per 1000 tokens
- exo — distributed inference engine
- FastAPI — CORS proxy
- Vanilla JS — frontend