A distributed inference framework for running large language models across multiple nodes using tensor parallelism.
NOTE FROM THE DEVELOPER
Hi this project is a hobby project of mine, and its still highly in experimental stage. There are so many bugs to fix, and optimizations to do. So if you want to help out, please do !
SkyNet splits the MLP (feed-forward) layers of transformer models across multiple worker nodes, enabling distributed inference. Each node computes a portion of the neurons, and results are aggregated on the server.
SkyServer (Attention + Coordination)
│
┌───────────────┼───────────────┐
▼ ▼ ▼
SkyNode 0 SkyNode 1 ... SkyNode N
(neurons 0-k) (neurons k-2k) (neurons mk-n)
│ │ │
└───────────────┼───────────────┘
▼
Sum outputs
- Server: Runs attention layers locally, coordinates MLP distribution
- SkyNodes: Each holds a slice of MLP weights, computes partial outputs
- Aggregation: Server sums all node outputs to get final MLP result
- Model Loading: Server loads the full model (e.g., Qwen2.5-0.5B)
- Node Registration: SkyNodes connect and register with server
- Weight Distribution: Server slices MLP weights and sends to each node
- Inference:
- Server computes attention locally
- Broadcasts MLP input to all nodes
- Each node computes its neuron slice
- Server aggregates results
python server.pypython skynode.py 0
python skynode.py 1
python skynode.py 2
# ... add more nodes as needed (minimum 6)python client.py "Hello world" 10| Component | Value |
|---|---|
| Model | Qwen2.5-0.5B |
| Layers | 24 |
| Hidden Size | 896 |
| MLP Neurons | 4,864 per layer |
| Max Nodes | 4,864 |
skynet/
├── server.py # Main server
├── client.py # Inference client
├── skynode.py # Distributed compute node
├── core/
│ ├── skycore.py # Core logic (model loading, splitting, inference)
│ └── logger.py # Logging
├── docs/ # Full documentation
- Python 3.10+
- PyTorch
- Transformers (Hugging Face)
- Tensor Parallelism: Splits MLP neurons across nodes
- Dynamic Scaling: Add/remove nodes on the fly
- Automatic Rebalancing: Redistributes weights when nodes change
- Pretrained Weight Distribution: Loads actual model weights to nodes