███████╗ ██████╗ ██████╗ ██████╗ ██████╗ ██╗ ██╗ ██╔════╝██╔════╝ ██╔════╝ ██╔══██╗██╔═══██╗██║ ██║ █████╗ ██║ ███╗██║ ███╗██████╔╝██║ ██║██║ ██║ ██╔══╝ ██║ ██║██║ ██║██╔══██╗██║ ██║██║ ██║ ███████╗╚██████╔╝╚██████╔╝██║ ██║╚██████╔╝███████╗███████╗ ╚══════╝ ╚═════╝ ╚═════╝ ╚═╝ ╚═╝ ╚═════╝ ╚══════╝╚══════╝
PyTorch implementation of EGGROLL (Evolution Strategies at the Hyperscale, NVIDIA + Oxford).
No backprop. No gradients. Just evolution.
Backpropagation requires differentiable objectives, massive memory for activations, and complex distributed training setups. EGGROLL replaces all of that with evolution — mutate, evaluate, keep what works.
| Backprop (LoRA/GRPO) | EGGROLL | |
|---|---|---|
| Gradients needed | Yes | No |
| Memory (activations) | O(layers) | O(1) |
| Differentiable reward | Required | Any function |
| Works on quantized models | Limited | Native |
| Throughput | Training speed | ~91% of inference speed |
pip install eggroll-esOr from source:
git clone https://github.com/ShipItAndPray/eggroll.git
cd eggroll
pip install -e ".[dev]"# Evolve GPT-2 to minimize perplexity
eggroll tune gpt2 --reward perplexity --generations 50
# Evolve Llama with custom reward, only attention layers
eggroll tune meta-llama/Llama-3.1-8B \
--reward score.py \
--target-modules q_proj v_proj \
--population 128 \
--rank 8
# Use an inline lambda as reward
eggroll tune gpt2 --reward "lambda text: 1.0 if 'yes' in text else 0.0"
# Show model info + EGGROLL memory estimates
eggroll info meta-llama/Llama-3.1-8Bfrom transformers import AutoModelForCausalLM, AutoTokenizer
from eggroll import EggrollTrainer, EggrollConfig, PerplexityReward
model = AutoModelForCausalLM.from_pretrained("gpt2")
tokenizer = AutoTokenizer.from_pretrained("gpt2")
config = EggrollConfig(
population_size=64, # mutations per generation
rank=8, # low-rank perturbation rank
sigma=0.01, # noise magnitude
lr=0.001, # learning rate
generations=100, # evolution steps
target_modules=["c_attn", "c_proj"], # only evolve these layers
)
reward = PerplexityReward(tokenizer)
trainer = EggrollTrainer(model, config, reward, tokenizer)
trainer.evolve(dataloader)
trainer.save("./evolved-model")The killer feature — optimize for anything, not just differentiable losses:
from eggroll import EggrollTrainer, EggrollConfig, TextReward, CustomReward
# Score generated text (non-differentiable!)
def my_scorer(text: str) -> float:
if "correct answer" in text:
return 1.0
return 0.0
reward = TextReward(tokenizer, scorer=my_scorer)
# Or use any function of (model, inputs) -> float
reward = CustomReward(lambda model, inputs: run_tests(model, inputs))
# Combine multiple rewards
from eggroll import MultiReward
reward = MultiReward([
(PerplexityReward(tokenizer), 0.3),
(TextReward(tokenizer, code_scorer), 0.7),
])Based on NVIDIA + Oxford's EGGROLL paper:
- Mutate — Generate low-rank perturbations of model weights (A × B.T instead of full-rank noise)
- Evaluate — Run each mutated model on your reward function
- Select — Fitness-weighted combination of best mutations updates the parameters
- Repeat — Each generation gets closer to optimal
Generation 0 → Generation N
θ₀ θ*
┌──── mutate ────┐
│ θ + ε₁ → 0.3 │ Rank perturbations:
│ θ + ε₂ → 0.8 │ ε = σ · A · Bᵀ / √r
│ θ + ε₃ → 0.1 │
│ θ + ε₄ → 0.9 │ Update:
└──── select ────┘ θ ← θ + lr · Σ fᵢεᵢ / nσ
↓
θ + weighted avg
Why low-rank? Full-rank ES requires O(D) memory per population member. EGGROLL uses O(2Dr) where r << D, achieving 100x speedup while the approximation error drops as O(1/r).
The killer feature. Instead of evaluating mutations one at a time, the vLLM backend converts each EGGROLL perturbation into a LoRA adapter and evaluates the entire population in one batched vLLM call.
# CLI
eggroll tune meta-llama/Llama-3.1-8B-Instruct \
--backend vllm \
--reward score.py \
--population 128 \
--target-modules q_proj v_proj
# Python
from eggroll.vllm_backend import VllmEggrollTrainer
from eggroll import EggrollConfig
config = EggrollConfig(population_size=128, rank=8, generations=50)
def reward_fn(outputs: list[str], prompts: list[str]) -> list[float]:
return [1.0 if "correct" in o else 0.0 for o in outputs]
trainer = VllmEggrollTrainer(
model_id="meta-llama/Llama-3.1-8B-Instruct",
config=config,
reward_fn=reward_fn,
)
results = trainer.evolve(prompts=["What is 2+2?", "Explain gravity."])
trainer.save_best_adapter("./evolved-adapter")How it works:
- Each EGGROLL perturbation is factorized via SVD into LoRA A/B matrices
- Saved as PEFT-compatible adapter directories on disk
- vLLM loads all adapters via
LoRARequestand evaluates them in parallel - Fitnesses collected, parameters updated, repeat
Speed: ~91% of pure inference throughput (from the EGGROLL paper). With vLLM's batching, a population of 128 evaluates nearly as fast as a single inference pass.
Requirements: pip install eggroll-es[vllm]
| Parameter | Default | Description |
|---|---|---|
population_size |
256 | Mutations per generation (higher = better gradient estimate) |
rank |
8 | Low-rank perturbation rank (higher = more accurate, more memory) |
sigma |
0.01 | Noise magnitude (too high = chaos, too low = no exploration) |
lr |
0.001 | Learning rate for parameter updates |
generations |
100 | Number of evolution steps |
antithetic |
True | Mirror perturbations to halve variance |
fitness_shaping |
"centered_rank" | "centered_rank", "normalized", or "raw" |
target_modules |
None | Only evolve layers matching these patterns |
elite_k |
0 | Keep only top-k members (0 = use all) |
weight_decay |
0.0 | L2 regularization |
eggroll tune MODEL [OPTIONS]
MODEL HuggingFace model ID or local path
--reward, -r REWARD perplexity, path/to/score.py, or lambda
--generations, -g N Evolution generations (default: 100)
--population, -p N Population size (default: 64)
--rank N Low-rank perturbation rank (default: 8)
--sigma F Noise std dev (default: 0.01)
--lr F Learning rate (default: 0.001)
--output, -o DIR Output directory
--dataset, -d DATASET HuggingFace dataset (default: wikitext-2)
--target-modules M [M..] Only evolve matching layers
--seed N Random seed (default: 42)
eggroll info MODEL Show model info + memory estimates
Create a Python file with either:
# Option 1: Score generated text
def score(text: str) -> float:
return 1.0 if "correct" in text else 0.0
# Option 2: Full model access
def reward_fn(model, inputs) -> float:
output = model(**inputs)
return -output.loss.item()Then: eggroll tune gpt2 --reward my_reward.py
| This library | HyperscaleES (official) | egg.c | eggroll-embedding-trainer | |
|---|---|---|---|---|
| Language | PyTorch | JAX | CUDA/C | PyTorch |
| HuggingFace integration | Yes | No | No | No |
| vLLM multi-LoRA backend | Yes | No | No | No |
| CLI | Yes | No | No | No |
| Custom rewards | Any function | Hardcoded | Hardcoded | NDCG only |
| Install | pip install | Manual | Compile | Manual |
| Use case | General fine-tuning | Research | Edge/embedded | Retrieval |
pip install -e ".[dev]"
pytest tests/ -vGajane et al. "Evolution Strategies at the Hyperscale" (2025) NVIDIA + University of Oxford + MILA arxiv.org/abs/2511.16652 | Project Page
MIT