Skip to content

Zerone-Laboratories/SkyNet

Repository files navigation

SkyNet - Distributed LLM Inference System

A distributed inference framework for running large language models across multiple nodes using tensor parallelism.

NOTE FROM THE DEVELOPER

Hi this project is a hobby project of mine, and its still highly in experimental stage. There are so many bugs to fix, and optimizations to do. So if you want to help out, please do !

What is SkyNet?

SkyNet splits the MLP (feed-forward) layers of transformer models across multiple worker nodes, enabling distributed inference. Each node computes a portion of the neurons, and results are aggregated on the server.

Architecture Overview

                     SkyServer (Attention + Coordination)
                                    │
                    ┌───────────────┼───────────────┐
                    ▼               ▼               ▼
               SkyNode 0       SkyNode 1   ...  SkyNode N
              (neurons 0-k)  (neurons k-2k)   (neurons mk-n)
                    │               │               │
                    └───────────────┼───────────────┘
                                    ▼
                              Sum outputs
  • Server: Runs attention layers locally, coordinates MLP distribution
  • SkyNodes: Each holds a slice of MLP weights, computes partial outputs
  • Aggregation: Server sums all node outputs to get final MLP result

How It Works

  1. Model Loading: Server loads the full model (e.g., Qwen2.5-0.5B)
  2. Node Registration: SkyNodes connect and register with server
  3. Weight Distribution: Server slices MLP weights and sends to each node
  4. Inference:
    • Server computes attention locally
    • Broadcasts MLP input to all nodes
    • Each node computes its neuron slice
    • Server aggregates results

Quick Start

Start Server

python server.py

Start SkyNodes (in separate terminals)

python skynode.py 0
python skynode.py 1
python skynode.py 2
# ... add more nodes as needed (minimum 6)

Run Inference

python client.py "Hello world" 10

Current Configuration

Component Value
Model Qwen2.5-0.5B
Layers 24
Hidden Size 896
MLP Neurons 4,864 per layer
Max Nodes 4,864

Project Structure

skynet/
├── server.py          # Main server
├── client.py          # Inference client  
├── skynode.py         # Distributed compute node
├── core/
│   ├── skycore.py     # Core logic (model loading, splitting, inference)
│   └── logger.py      # Logging
├── docs/              # Full documentation

Documentation

Requirements

  • Python 3.10+
  • PyTorch
  • Transformers (Hugging Face)

Features

  • Tensor Parallelism: Splits MLP neurons across nodes
  • Dynamic Scaling: Add/remove nodes on the fly
  • Automatic Rebalancing: Redistributes weights when nodes change
  • Pretrained Weight Distribution: Loads actual model weights to nodes

About

A distributed inference experiment

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages