-
Notifications
You must be signed in to change notification settings - Fork 2
Description
Vision
A dedicated AI persona that lives on the Grid and manages all resource allocation across nodes. Not a hardcoded scheduler — an intelligent agent that sees the whole system, makes routing decisions, and learns from outcomes.
Like an air traffic controller: sees every plane (task), every runway (GPU), every flight path (network route), and orchestrates safe, efficient operations.
What It Enables
- MacBook Air gets quality voices: TTS/STT routed to tower with better model, audio streamed back. Air stays responsive.
- Lower latency live conversations: Governor pre-loads models on the best node BEFORE they're needed (predictive)
- Larger models than any single machine: MoE experts distributed across nodes, governor routes inference to the node with the right expert
- Natural load balancing: 5 personas all need inference simultaneously → governor spreads across 5090 + 3090 + 5050
- Graceful degradation: Node drops off → governor re-routes in-flight tasks to remaining nodes
The Persona
Name: Governor
Type: System persona (not user-facing)
Skills: resource monitoring, task scheduling, model placement, latency prediction
Inputs: telemetry from all nodes (CPU, MEM, GPU, VRAM, latency, queue depth)
Outputs: routing decisions (which node handles which task)
Learns: from outcomes (did the routing decision improve latency? reduce cost?)
Decision Examples
| Situation | Decision | Reasoning |
|---|---|---|
| Air needs TTS | Route to 5090 | Tower has large TTS model loaded, 12ms network hop |
| Training job submitted | Route to 5090 | Most VRAM, fastest GPU |
| 5090 at 90% VRAM | Shed inference to 3090 | Protect training job, 3090 is idle |
| Coding task for persona | Route to 3090 | Code expert loaded there, 5090 busy training |
| 5050 laptop joins | Assign lightweight inference | Small VRAM, but can handle 3B model |
| Live call starting | Pre-load STT on closest node | Latency-critical, predict demand |
Not Hardcoded
The governor doesn't have fixed rules. It has:
- Telemetry (real-time data from every node)
- History (what worked before in similar situations)
- Constraints (VRAM limits, network latency, model sizes)
- Goals (minimize latency, maximize throughput, stay within budget)
It learns to balance these through experience. Early versions use simple heuristics. Later versions use the trained model's reasoning.
Architecture
Governor Persona
├─ Subscribes to: gpu:*, training:*, inference:*, grid:node:*
├─ Reads: node telemetry, model registry, task queue
├─ Writes: routing decisions → Grid dispatcher
└─ Trains on: decision outcomes (latency achieved, success rate)
The Governor IS a persona. It uses the same Academy, same genome, same tool system. It just has a specialized skill: resource orchestration.
Hardware (5 nodes)
- MacBook Pro M1 Pro 16GB — current dev, coordinator
- MacBook Air 8-16GB — minimum target, delegates everything heavy
- BigMama (5090) 32GB VRAM — heavy training, large inference
- Toby's 3090 ~24GB VRAM — parallel training, inference
- Toby's 5050 laptop ~8GB VRAM — mobile inference
Combined: ~80GB+ VRAM. Managed by one intelligent persona.
Related
- GPU governor: expand from 3 subsystems to full consumer management #380 (GPU governor — single-node precursor)
- Distributed inference and LoRA training across towers via reticulum #337 (distributed inference)
- MoE expert paging: load only the needed expert on demand, page rest from HF cache #433 (MoE expert paging — per-node expert placement)
- Genome paging: activateSkill/evictLRU not wired end-to-end #382 (genome paging — skill activation by demand)
- Persona response latency: 2+ minutes from message to reply under normal load #399 (persona latency — governor reduces by smart routing)