SkyNet - Distributed LLM Inference System

A distributed inference framework for running large language models across multiple nodes using tensor parallelism.

NOTE FROM THE DEVELOPER

Hi this project is a hobby project of mine, and its still highly in experimental stage. There are so many bugs to fix, and optimizations to do. So if you want to help out, please do !

What is SkyNet?

SkyNet splits the MLP (feed-forward) layers of transformer models across multiple worker nodes, enabling distributed inference. Each node computes a portion of the neurons, and results are aggregated on the server.

Architecture Overview

                     SkyServer (Attention + Coordination)
                                    │
                    ┌───────────────┼───────────────┐
                    ▼               ▼               ▼
               SkyNode 0       SkyNode 1   ...  SkyNode N
              (neurons 0-k)  (neurons k-2k)   (neurons mk-n)
                    │               │               │
                    └───────────────┼───────────────┘
                                    ▼
                              Sum outputs

Server: Runs attention layers locally, coordinates MLP distribution
SkyNodes: Each holds a slice of MLP weights, computes partial outputs
Aggregation: Server sums all node outputs to get final MLP result

How It Works

Model Loading: Server loads the full model (e.g., Qwen2.5-0.5B)
Node Registration: SkyNodes connect and register with server
Weight Distribution: Server slices MLP weights and sends to each node
Inference:
- Server computes attention locally
- Broadcasts MLP input to all nodes
- Each node computes its neuron slice
- Server aggregates results

Quick Start

Start Server

python server.py

Start SkyNodes (in separate terminals)

python skynode.py 0
python skynode.py 1
python skynode.py 2
# ... add more nodes as needed (minimum 6)

Run Inference

python client.py "Hello world" 10

Current Configuration

Component	Value
Model	Qwen2.5-0.5B
Layers	24
Hidden Size	896
MLP Neurons	4,864 per layer
Max Nodes	4,864

Project Structure

skynet/
├── server.py          # Main server
├── client.py          # Inference client  
├── skynode.py         # Distributed compute node
├── core/
│   ├── skycore.py     # Core logic (model loading, splitting, inference)
│   └── logger.py      # Logging
├── docs/              # Full documentation

Documentation

docs/README.md

Requirements

Python 3.10+
PyTorch
Transformers (Hugging Face)

Features

Tensor Parallelism: Splits MLP neurons across nodes
Dynamic Scaling: Add/remove nodes on the fly
Automatic Rebalancing: Redistributes weights when nodes change
Pretrained Weight Distribution: Loads actual model weights to nodes

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
core		core
docs		docs
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
api_client.py		api_client.py
api_server.py		api_server.py
client.py		client.py
requirements.txt		requirements.txt
server.py		server.py
skynode.py		skynode.py
visualization.py		visualization.py
worker.py		worker.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SkyNet - Distributed LLM Inference System

What is SkyNet?

Architecture Overview

How It Works

Quick Start

Start Server

Start SkyNodes (in separate terminals)

Run Inference

Current Configuration

Project Structure

Documentation

Requirements

Features

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SkyNet - Distributed LLM Inference System

What is SkyNet?

Architecture Overview

How It Works

Quick Start

Start Server

Start SkyNodes (in separate terminals)

Run Inference

Current Configuration

Project Structure

Documentation

Requirements

Features

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages