-
Notifications
You must be signed in to change notification settings - Fork 2
Open
Description
Problem
Inference and LoRA fine-tuning are locked to the local machine. Can't use the 5090 tower's GPU from a MacBook at work. Can't distribute training across multiple towers.
Vision
Commands.execute('ai/inference', { model: 'coder-32b' }) transparently routes to whichever tower has the model loaded and VRAM available. Same for genome/train. The user doesn't care WHERE it runs — reticulum handles routing.
Architecture
- Discovery: towers announce capabilities (GPU type, VRAM, loaded models) via reticulum
- Routing: inference requests route to best available tower (lowest latency, has model, has VRAM)
- Training:
genome/traincan target a remote tower (--tower=5090or auto-select by VRAM) - Transport: Tailscale (Tailscale mesh network for multi-tower remote inference #323) provides the mesh network, reticulum provides the command routing
- Streaming: inference tokens stream back to caller in real-time (not batch)
Use cases
- MacBook at work → 5090 at home for inference (via Tailscale)
- MacBook starts LoRA training on 5090 (32GB VRAM) while continuing to use local system
- Multiple towers: 5090 handles training, Mac handles UI, phone handles voice
- Load balancing: if 5090 is training, route inference to Mac's Metal GPU
Implementation path
- Tailscale mesh (Tailscale mesh network for multi-tower remote inference #323) — network connectivity
- Tower capability announcement (GPU, models, VRAM available)
Commands.execute()remote routing (already designed in remote command delegation)- Inference streaming over WebSocket/gRPC
- Training job dispatch + progress events across towers
- Model sync: ensure same GGUF/adapter files available on target tower
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels