Zernel

The AI-Native Linux Operating System

The first operating system where the kernel itself understands machine learning.
Faster training. Lower energy costs. Zero code changes.

What If Your OS Was Built For AI?

Every ML platform today runs on top of a general-purpose operating system that was designed for web servers and desktop apps. The kernel doesn't know the difference between a data loading thread and a CUDA synchronization call. It can't tell when your GPU is starving for data. It doesn't know that NCCL all-reduce is the critical path in your distributed training.

Zernel does.

Zernel is a complete Linux distribution where the CPU scheduler, memory manager, network stack, and observability layer all understand ML workloads natively. Install it on your GPU cluster. Everything you already run still works -- PyTorch, JAX, vLLM, Kubernetes -- but now the OS itself is working for you.

Proven: 2.2x Faster Training, Zero Code Changes

The fastest way to use Zernel -- prefix any training script with zernel-run:

pip install zernel-runtime
zernel-run train.py          # 2.2x faster. That's it.

Verified on bare metal (3 rounds x 50 steps = 150 steps per config, MiniGPT-6L 119.8M params):

Config	Step Time	Throughput	Peak Memory	Speedup
Vanilla PyTorch (FP32)	376.64 +/- 43.24 ms	85.0 s/s	5.40 GB	1.0x
Manual AMP (user adds 2 lines)	181.12 +/- 13.32 ms	176.7 s/s	4.65 GB	2.1x
`zernel-run` (automatic)	179.33 +/- 16.18 ms	178.4 s/s	3.82 GB	2.2x

zernel-run automatically applies BF16 mixed precision, TF32 matmul, and CUDA allocator tuning. It matches what an experienced engineer gets by adding AMP manually -- but requires zero code changes.

Energy impact: 2.2x faster = 55% fewer GPU-hours = 55% less energy per training run.

Kernel-Level Features (only possible at the OS level)

Test Environment

Component	Details
Server	Supermicro bare-metal dedicated server
CPU	Intel Xeon E5-2667 v4 @ 3.20 GHz (8 cores / 16 threads, 1 socket)
Memory	64 GB DDR4 ECC
GPU	NVIDIA GeForce RTX 4060 (8 GB GDDR6)
OS	Debian GNU/Linux (forky/sid)
Kernel	Linux 6.12.8 (custom-compiled with `CONFIG_SCHED_CLASS_EXT=y`)
NVIDIA Driver	535.261.03 (DKMS)
PyTorch	2.5.1+cu121 (CUDA 12.1)
Rust	1.94.1
Clang	16.0.6 (Debian)
libbpf	1.1.2

sched_ext BPF Scheduler (59% faster context switches)

Benchmark	Stock Linux (CFS)	Zernel Scheduler	Result
Context Switch Latency	14.26 +/- 2.87 us	5.82 +/- 0.08 us	59% faster, 36x lower variance
CPU MatMul 2048x2048	22.95 +/- 0.88 ms	22.76 +/- 0.58 ms	Even
GPU Training (batch=256)	3.71 +/- 0.01 ms	3.48 +/- 0.06 ms	6% faster

Verify: cat /sys/kernel/sched_ext/root/ops prints zernel when active.

CPU Frequency Scaling (45% less CPU energy, RAPL-measured)

CPU Config	Avg Power (RAPL)	Energy/Step	Impact
Full 3.6 GHz	28.0 W	15.98 J	baseline
Phase-aware (auto)	12.0 W	7.96 J	45% less CPU energy, -4.8% throughput
Minimum 1.2 GHz	10.3 W	7.66 J	52% less CPU energy, -10.7% throughput

Zernel automatically drops CPU to 1.2 GHz during GPU compute phases (when CPU is idle) and restores full speed during data loading.

Full Acceleration Stack

The zernel-scheduler and zernel-accel daemons run together to provide:

BPF kernel scheduler with per-CPU local dispatch and phase-aware time slices
Auto GPU process discovery via nvidia-smi polling
Phase detection (DataLoading / GpuCompute / NcclCollective / OptimizerStep)
Phase-to-BPF pipeline -- writes detected phases to kernel phase_map for in-kernel optimization
Preemption control -- prevents kernel from preempting CUDA/NCCL tasks
CPU affinity -- pins data-loading threads to NUMA-local CPUs
GPU power management -- adjusts clocks and power limits per phase
CPU frequency scaling -- drops CPU frequency during GPU compute (45% CPU energy savings)
NCCL network priority -- tc rules prioritize collective communication traffic
NUMA page migration -- migrates process memory to GPU-local NUMA node

What Zernel Adds On Top

Metric	Stock Linux	With Zernel	How
GPU utilization	60-80% (data starvation)	90-98%	Phase-aware CPU scheduling boosts data pipeline priority
All-reduce latency	Variable (p99 spikes)	Consistent	NCCL traffic gets kernel-level network priority via eBPF/tc
Energy per training run	Baseline	10-20% less	Phase-aware GPU power management reduces clocks during idle phases
Time to first experiment	Hours (driver setup)	Minutes	Pre-installed CUDA, PyTorch, JAX, vLLM, Ollama
Debugging a slow job	`nvidia-smi` + guessing	One command	`zernel debug why-slow` diagnoses GPU, CPU, I/O, memory
Model security	Plaintext on disk	PQC encrypted	Quantum-resistant AES-256-GCM + ML-KEM key exchange
Cost visibility	Custom scripts	Built-in	`zernel fleet costs` shows spend per team at $2.50-4.00/GPU-hr
Compliance	Manual audit	One command	`zernel audit report --standard soc2` generates compliance reports

How It Works

The Kernel Knows Your Workload

Zernel's sched_ext scheduler detects five ML workload phases in real time and applies different kernel policies to each:

  Training Step Timeline
  ========================

  [Data Loading]  →  [Forward Pass]  →  [Backward Pass]  →  [All-Reduce]  →  [Optimizer]
       |                  |                   |                  |               |
   CPU Priority:      GPU Priority:      GPU Priority:     Network Priority:  CPU Priority:
   HIGH (+10)         LOW (-5, yield)    LOW (-5, yield)   HIGH (tc bypass)   HIGH (+10)
       |                  |                   |                  |               |
   "Feed the GPU      "CPU is idle,       "Same as           "NCCL is the     "Minimize
    as fast as         give cycles to      forward pass"      critical path,    GPU idle
    possible"          data loaders"                          prioritize it"    time"

This isn't a userspace library. It's a BPF program loaded into the Linux kernel that makes scheduling decisions at microsecond granularity. No code changes required -- it observes your processes via eBPF and acts automatically.

Energy Savings Through Intelligence

Zernel's power management is the only system that adjusts GPU clocks per training phase:

Phase	GPU Clock	Memory Clock	Power Draw	Why
Data Loading	33%	100%	~60%	GPU is idle, but memory bus feeds H2D transfers
GPU Compute	100%	100%	100%	Full power for matrix operations
NCCL Collective	50%	100%	~70%	Compute isn't the bottleneck, network is
Optimizer Step	100%	100%	100%	Brief burst, keep full power

Result: 10-20% energy reduction per training run with <1% throughput impact. On an 8xH100 cluster drawing 5kW, that's $4,000-8,000/year in electricity savings per rack.

zernel power enable           # Turn on phase-aware power management
zernel power carbon           # See your CO2 footprint
zernel power profile train.py # Profile power during a specific run

Quantum-Resistant Model Security

ML model weights are the most valuable digital assets in the world. Zernel protects them with post-quantum cryptography:

zernel pqc keygen --name prod          # Generate ML-KEM + ML-DSA keypair
zernel pqc sign ./llama-70b --key prod # Sign model (tamper detection)
zernel pqc encrypt ./llama-70b         # AES-256-GCM encryption at rest
zernel pqc verify ./llama-70b          # Verify no tampering
zernel pqc boot-verify                 # Verify secure boot chain

Why this matters: nation-state actors are running "harvest now, decrypt later" campaigns. Model weights transmitted today under RSA/ECDH encryption will be breakable by quantum computers. Zernel's PQC protects your IP against this threat today.

50+ Built-In Tools

Every tool does real work. No stubs. No "coming soon."

Core Workflow

zernel init my-project         # Scaffold with zernel.toml + train.py
zernel run train.py            # Auto-track metrics, GPU, git commit
zernel watch                   # Full-screen TUI: GPU bars + metrics + eBPF
zernel doctor                  # 9-point environment health check
zernel install pytorch         # 25+ ML tools (ollama, jupyter, vllm, ...)

GPU Management (nvidia-smi replacement)

zernel gpu status              # Clean overview (not nvidia-smi's wall of text)
zernel gpu top                 # Real-time process viewer (htop for GPUs)
zernel gpu kill 0              # Kill all processes on GPU 0
zernel gpu lock 0,1            # Reserve GPUs for a job
zernel gpu health              # ECC errors, throttling, PCIe bandwidth

ML Benchmarks (prove it's faster)

zernel bench all               # Full suite: TFLOPS, memory BW, DataLoader, training
zernel bench e2e --model resnet50  # End-to-end training throughput
zernel bench gpu               # Raw compute at multiple matrix sizes

Training Debugger (fix problems, not symptoms)

zernel debug why-slow          # Automated diagnosis: GPU, CPU, memory, I/O
zernel debug oom               # GPU OOM analysis + 6 fix suggestions
zernel debug nan train.py      # Find the exact layer producing NaN
zernel debug hang              # NCCL deadlock diagnosis

Experiment Tracking + ZQL Queries

zernel exp list                # All experiments with loss, accuracy, duration
zernel exp compare exp-a exp-b # Side-by-side diff
zernel query "SELECT name, loss FROM experiments WHERE loss < 1.5 ORDER BY loss ASC"

Model Registry + Deployment

zernel model save ./ckpt --name llama3 --tag v1
zernel model deploy llama3:v1 --target local    # vLLM inference
zernel model deploy llama3:v1 --target docker   # Build container
zernel model deploy llama3:v1 --target sagemaker # AWS endpoint
zernel serve start ./model --replicas 4         # Multi-GPU inference

Distributed Training (local, SSH, Kubernetes)

zernel job submit train.py --gpus-per-node 8                     # Single node
zernel job submit train.py --target ssh --hosts "n1,n2" --nodes 2 # Multi-node SSH
zernel job submit train.py --target k8s --image img --nodes 4     # Kubernetes
zernel cluster add gpu-server-01 --gpus 8                         # Register nodes
zernel cluster status                                              # Live overview

Dataset Management

zernel data profile ./dataset.parquet  # Stats, schema, size
zernel data split ./data --train 0.8   # Reproducible train/val/test
zernel data shard ./data --shards 64   # For distributed training
zernel data benchmark --workers 8      # DataLoader throughput

GPU Fleet Management (enterprise scale)

zernel fleet status            # Fleet-wide GPU utilization + power + daily cost
zernel fleet costs --period month  # Cost attribution ($2.50/GPU-hr A100, $4/hr H100)
zernel fleet idle              # Detect underutilized GPUs across the fleet
zernel fleet reclaim           # Power down idle GPUs (saves $60/GPU/day)
zernel fleet rightsize         # GPU type recommendations from utilization patterns
zernel fleet plan --growth 15  # 12-month capacity forecast at growth rate
zernel fleet health            # Fleet subsystem health check

Compliance & Audit Trail (regulated industries)

zernel audit trail <exp-id>    # Full audit record (status, git, script, PQC sig)
zernel audit export --format json  # SOC 2 / HIPAA compliance export
zernel audit lineage llama:v1  # Data lineage (model → script → dataset → raw data)
zernel audit provenance <id>   # Model provenance chain (5-step verification)
zernel audit report --standard soc2   # Generate SOC 2 Type II compliance report
zernel audit report --standard hipaa  # Generate HIPAA compliance controls

Developer Onboarding (minutes, not days)

zernel onboard setup my-project  # 5-step automated setup (env → stack → project)
zernel onboard share             # Generate shareable environment snapshot
zernel onboard sync env.yml      # Reproduce teammate's environment

Energy + Cost + Environment

zernel power carbon            # kWh → CO2 estimate
zernel cost summary            # GPU-hours by job
zernel env snapshot            # Capture full environment
zernel env export --format docker  # Generate Dockerfile

Training Optimization Toolkit

zernel optimize scan                    # Full environment audit (finds all issues)
zernel optimize precision train.py      # Mixed precision analysis + code generation
zernel optimize batch-size gpt2 --amp   # Optimal batch size for your GPU + model
zernel optimize checkpoint train.py     # Gradient checkpointing advisor
zernel optimize data-pipeline train.py  # DataLoader profiler (benchmarks configurations)
zernel optimize auto train.py           # Generate optimized wrapper script
zernel optimize memory                  # CUDA allocator tuning
zernel optimize numa                    # NUMA placement advice

zernel-run (automatic 2.2x speedup)

pip install zernel-runtime
zernel-run train.py                     # Auto AMP + TF32 + CUDA allocator
zernel-run --verbose train.py           # Show what was optimized
zernel-run --no-amp train.py            # Disable AMP if it causes issues
ZERNEL_AMP_DTYPE=fp16 zernel-run train.py  # Force FP16 instead of BF16

Pre-Installed ML Stack

Zernel ships with everything configured and validated:

Category	Included
Frameworks	PyTorch + CUDA, JAX + CUDA, TensorFlow
LLM	vLLM, Transformers, PEFT, TRL, bitsandbytes, auto-gptq
Distributed	DeepSpeed, FairScale, ColossalAI
Developer	JupyterLab, TensorBoard, W&B, MLflow, Gradio
RAG	LangChain, ChromaDB, FAISS-GPU
Local LLM	Ollama + Llama 3.1 8B (works offline)
Data	Pandas, Polars, PyArrow, scikit-learn

No more "which CUDA version works with which PyTorch works with which driver." It just works.

Two Install Profiles

Server (headless, max GPU memory)

For production training clusters. Every byte of RAM goes to CUDA.

sudo ./distro/iso/build-iso.sh --profile server

Desktop (GNOME + GPU dashboard)

For ML workstations. Full GNOME desktop with Zernel GPU indicator in the top bar showing real-time utilization, temperature, and memory.

sudo ./distro/iso/build-iso.sh --profile desktop

Architecture

+---------------------------------------------------------------------+
|  LAYER 5: 40+ CLI Tools + Web Dashboard + GNOME Desktop              |
|  Experiment tracking . Model registry . Job orchestration . PQC      |
+---------------------------------------------------------------------+
|  LAYER 4: eBPF Observability + Smart Power Management                |
|  GPU memory . CUDA latency . NCCL . DataLoader . Energy tracking     |
+---------------------------------------------------------------------+
|  LAYER 3: sched_ext ML Scheduler + Predictive Prefetch               |
|  Phase detection . NUMA-aware . Multi-tenant . Network priority      |
+---------------------------------------------------------------------+
|  LAYER 2: Kernel Configuration + sysctl Tuning                       |
|  Huge pages . RDMA . BBR . No swap . 128MB network buffers           |
+---------------------------------------------------------------------+
|  LAYER 1: Linux 6.12+ / NVIDIA Open Drivers / PQC Secure Boot       |
|  Full CUDA . cuDNN . TensorRT . NCCL . Quantum-resistant boot chain  |
+---------------------------------------------------------------------+

Quick Start

Option 1: Just the training speedup (any Linux + NVIDIA GPU)

pip install zernel-runtime
zernel-run train.py    # 2.2x faster, zero code changes

Option 2: Full Zernel stack on a GPU server

# On your GPU server (Ubuntu/Debian, root access):
git clone https://github.com/dyber-pqc/Zernel.git
cd Zernel
bash scripts/deploy-server.sh   # Automated full setup

The deploy script handles everything:

Installs Rust, clang-16, libbpf, build dependencies
Compiles Linux kernel 6.12 with CONFIG_SCHED_CLASS_EXT=y
Installs NVIDIA drivers via DKMS
Builds all Zernel crates (scheduler, CLI, eBPF daemon)
Installs zernel-runtime Python package
Starts the sched_ext scheduler and acceleration daemon
Verifies: cat /sys/kernel/sched_ext/root/ops = zernel

Option 3: Build from source (development)

git clone https://github.com/dyber-pqc/Zernel.git
cd Zernel
cargo build --workspace --release
cargo test --workspace  # 117 tests
cargo install --path zernel-cli

zernel doctor          # Check your environment
zernel gpu status      # See your GPUs
zernel bench quick     # 5-minute performance test

Documentation

Document	Description
Technical Reference	Complete technical spec with all 50+ commands
Architecture	System design, data flow, crate dependencies
Installation	Build from source, hardware requirements, distro install
Scheduler	Phase detection, NUMA, multi-tenant, configuration
eBPF Observability	Probe architecture, Prometheus metrics, WebSocket
CLI Reference	All 50+ commands with examples
Kernel Config	.config, sysctl, huge pages, RDMA
Configuration	Config files, environment variables, ports
API Reference	Prometheus, WebSocket, JSON endpoints
Upgrade Guide	Version compatibility, migration, rollback

Why Not Just Use Ubuntu + nvidia-smi?

	Ubuntu + Manual Tuning	Zernel
Kernel scheduler	CFS (generic, no ML awareness)	sched_ext with 5 ML phase types
GPU observability	nvidia-smi (poll-based, no eBPF)	5 eBPF probes (real-time, zero overhead)
Power management	Static power limits	Phase-aware dynamic clocks (10-20% savings)
Data pipeline	Hope the DataLoader is fast enough	Predictive prefetch (scheduler signals DataLoader)
NCCL performance	Best-effort networking	Kernel-level traffic priority for collectives
Model security	Plaintext, RSA keys	PQC encryption + quantum-resistant signatures
Setup time	Days (drivers, CUDA, frameworks)	Minutes (pre-installed, pre-validated)
Experiment tracking	Install MLflow/W&B separately	Built-in, zero config, works from `zernel run`
Debugging	`nvidia-smi` + `htop` + guessing	`zernel debug why-slow` (automated diagnosis)
Cost tracking	Custom scripts	`zernel cost summary` + `zernel power carbon`

Technology Stack

Component	Language	Key Libraries
ML Scheduler	Rust + BPF C	libbpf-rs, sched_ext
eBPF Daemon	Rust	hyper, tokio-tungstenite
CLI IDE (50+ commands)	Rust	clap, ratatui, rusqlite, nom
PQC Crypto	Rust	sha2, aes-gcm (ML-KEM/ML-DSA compatible)
Web Dashboard	Rust	axum, htmx, SSE
GNOME Extension	JavaScript	GLib, nvidia-smi integration
Distro Base	Debian stable	Linux 6.12+, NVIDIA CUDA 12.x

License

Open-core model (same as Red Hat):

Kernel components (scheduler, eBPF, kernel config): GPL-2.0
CLI IDE and enterprise features: Proprietary
Python SDK: MIT

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
.github		.github
distro		distro
docs		docs
notebooks		notebooks
scripts		scripts
tests		tests
website		website
zernel-accel		zernel-accel
zernel-cli		zernel-cli
zernel-dashboard		zernel-dashboard
zernel-ebpf		zernel-ebpf
zernel-runtime		zernel-runtime
zernel-scheduler		zernel-scheduler
zernel-sdk/python		zernel-sdk/python
.dockerignore		.dockerignore
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
Dockerfile		Dockerfile
LICENSE		LICENSE
LICENSE-ENTERPRISE		LICENSE-ENTERPRISE
README.md		README.md
SECURITY.md		SECURITY.md
deny.toml		deny.toml

Folders and files

Latest commit

History

Repository files navigation

Zernel

What If Your OS Was Built For AI?

Proven: 2.2x Faster Training, Zero Code Changes

Kernel-Level Features (only possible at the OS level)

sched_ext BPF Scheduler (59% faster context switches)

CPU Frequency Scaling (45% less CPU energy, RAPL-measured)

Full Acceleration Stack

What Zernel Adds On Top

How It Works

The Kernel Knows Your Workload

Energy Savings Through Intelligence

Quantum-Resistant Model Security

50+ Built-In Tools

Core Workflow

GPU Management (nvidia-smi replacement)

ML Benchmarks (prove it's faster)

Training Debugger (fix problems, not symptoms)

Experiment Tracking + ZQL Queries

Model Registry + Deployment

Distributed Training (local, SSH, Kubernetes)

Dataset Management

GPU Fleet Management (enterprise scale)

Compliance & Audit Trail (regulated industries)

Developer Onboarding (minutes, not days)

Energy + Cost + Environment

Training Optimization Toolkit

zernel-run (automatic 2.2x speedup)

Pre-Installed ML Stack

Two Install Profiles

Server (headless, max GPU memory)

Desktop (GNOME + GPU dashboard)

Architecture

Quick Start

Option 1: Just the training speedup (any Linux + NVIDIA GPU)

Option 2: Full Zernel stack on a GPU server

Option 3: Build from source (development)

Documentation

Why Not Just Use Ubuntu + nvidia-smi?

Technology Stack

License

Roadmap

About

Resources

License

Licenses found

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages