Skip to content

ShipItAndPray/eggroll

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

 ███████╗ ██████╗  ██████╗ ██████╗  ██████╗ ██╗     ██╗
 ██╔════╝██╔════╝ ██╔════╝ ██╔══██╗██╔═══██╗██║     ██║
 █████╗  ██║  ███╗██║  ███╗██████╔╝██║   ██║██║     ██║
 ██╔══╝  ██║   ██║██║   ██║██╔══██╗██║   ██║██║     ██║
 ███████╗╚██████╔╝╚██████╔╝██║  ██║╚██████╔╝███████╗███████╗
 ╚══════╝ ╚═════╝  ╚═════╝ ╚═╝  ╚═╝ ╚═════╝ ╚══════╝╚══════╝
  

Gradient-Free Fine-Tuning for Any HuggingFace Model

PyPI CI License Python

PyTorch implementation of EGGROLL (Evolution Strategies at the Hyperscale, NVIDIA + Oxford).
No backprop. No gradients. Just evolution.


Why EGGROLL?

Backpropagation requires differentiable objectives, massive memory for activations, and complex distributed training setups. EGGROLL replaces all of that with evolution — mutate, evaluate, keep what works.

Backprop (LoRA/GRPO) EGGROLL
Gradients needed Yes No
Memory (activations) O(layers) O(1)
Differentiable reward Required Any function
Works on quantized models Limited Native
Throughput Training speed ~91% of inference speed

Install

pip install eggroll-es

Or from source:

git clone https://github.com/ShipItAndPray/eggroll.git
cd eggroll
pip install -e ".[dev]"

Quick Start

CLI — One Command

# Evolve GPT-2 to minimize perplexity
eggroll tune gpt2 --reward perplexity --generations 50

# Evolve Llama with custom reward, only attention layers
eggroll tune meta-llama/Llama-3.1-8B \
  --reward score.py \
  --target-modules q_proj v_proj \
  --population 128 \
  --rank 8

# Use an inline lambda as reward
eggroll tune gpt2 --reward "lambda text: 1.0 if 'yes' in text else 0.0"

# Show model info + EGGROLL memory estimates
eggroll info meta-llama/Llama-3.1-8B

Python API

from transformers import AutoModelForCausalLM, AutoTokenizer
from eggroll import EggrollTrainer, EggrollConfig, PerplexityReward

model = AutoModelForCausalLM.from_pretrained("gpt2")
tokenizer = AutoTokenizer.from_pretrained("gpt2")

config = EggrollConfig(
    population_size=64,   # mutations per generation
    rank=8,               # low-rank perturbation rank
    sigma=0.01,           # noise magnitude
    lr=0.001,             # learning rate
    generations=100,      # evolution steps
    target_modules=["c_attn", "c_proj"],  # only evolve these layers
)

reward = PerplexityReward(tokenizer)
trainer = EggrollTrainer(model, config, reward, tokenizer)
trainer.evolve(dataloader)
trainer.save("./evolved-model")

Custom Reward Functions

The killer feature — optimize for anything, not just differentiable losses:

from eggroll import EggrollTrainer, EggrollConfig, TextReward, CustomReward

# Score generated text (non-differentiable!)
def my_scorer(text: str) -> float:
    if "correct answer" in text:
        return 1.0
    return 0.0

reward = TextReward(tokenizer, scorer=my_scorer)

# Or use any function of (model, inputs) -> float
reward = CustomReward(lambda model, inputs: run_tests(model, inputs))

# Combine multiple rewards
from eggroll import MultiReward
reward = MultiReward([
    (PerplexityReward(tokenizer), 0.3),
    (TextReward(tokenizer, code_scorer), 0.7),
])

How It Works

Based on NVIDIA + Oxford's EGGROLL paper:

  1. Mutate — Generate low-rank perturbations of model weights (A × B.T instead of full-rank noise)
  2. Evaluate — Run each mutated model on your reward function
  3. Select — Fitness-weighted combination of best mutations updates the parameters
  4. Repeat — Each generation gets closer to optimal
Generation 0    →    Generation N
  θ₀                    θ*

  ┌──── mutate ────┐
  │  θ + ε₁  → 0.3 │    Rank perturbations:
  │  θ + ε₂  → 0.8 │    ε = σ · A · Bᵀ / √r
  │  θ + ε₃  → 0.1 │
  │  θ + ε₄  → 0.9 │    Update:
  └──── select ────┘    θ ← θ + lr · Σ fᵢεᵢ / nσ
         ↓
    θ + weighted avg

Why low-rank? Full-rank ES requires O(D) memory per population member. EGGROLL uses O(2Dr) where r << D, achieving 100x speedup while the approximation error drops as O(1/r).


vLLM Backend (Massively Parallel)

The killer feature. Instead of evaluating mutations one at a time, the vLLM backend converts each EGGROLL perturbation into a LoRA adapter and evaluates the entire population in one batched vLLM call.

# CLI
eggroll tune meta-llama/Llama-3.1-8B-Instruct \
  --backend vllm \
  --reward score.py \
  --population 128 \
  --target-modules q_proj v_proj

# Python
from eggroll.vllm_backend import VllmEggrollTrainer
from eggroll import EggrollConfig

config = EggrollConfig(population_size=128, rank=8, generations=50)

def reward_fn(outputs: list[str], prompts: list[str]) -> list[float]:
    return [1.0 if "correct" in o else 0.0 for o in outputs]

trainer = VllmEggrollTrainer(
    model_id="meta-llama/Llama-3.1-8B-Instruct",
    config=config,
    reward_fn=reward_fn,
)
results = trainer.evolve(prompts=["What is 2+2?", "Explain gravity."])
trainer.save_best_adapter("./evolved-adapter")

How it works:

  1. Each EGGROLL perturbation is factorized via SVD into LoRA A/B matrices
  2. Saved as PEFT-compatible adapter directories on disk
  3. vLLM loads all adapters via LoRARequest and evaluates them in parallel
  4. Fitnesses collected, parameters updated, repeat

Speed: ~91% of pure inference throughput (from the EGGROLL paper). With vLLM's batching, a population of 128 evaluates nearly as fast as a single inference pass.

Requirements: pip install eggroll-es[vllm]


Configuration

Parameter Default Description
population_size 256 Mutations per generation (higher = better gradient estimate)
rank 8 Low-rank perturbation rank (higher = more accurate, more memory)
sigma 0.01 Noise magnitude (too high = chaos, too low = no exploration)
lr 0.001 Learning rate for parameter updates
generations 100 Number of evolution steps
antithetic True Mirror perturbations to halve variance
fitness_shaping "centered_rank" "centered_rank", "normalized", or "raw"
target_modules None Only evolve layers matching these patterns
elite_k 0 Keep only top-k members (0 = use all)
weight_decay 0.0 L2 regularization

CLI Reference

eggroll tune MODEL [OPTIONS]

  MODEL                      HuggingFace model ID or local path

  --reward, -r REWARD        perplexity, path/to/score.py, or lambda
  --generations, -g N        Evolution generations (default: 100)
  --population, -p N         Population size (default: 64)
  --rank N                   Low-rank perturbation rank (default: 8)
  --sigma F                  Noise std dev (default: 0.01)
  --lr F                     Learning rate (default: 0.001)
  --output, -o DIR           Output directory
  --dataset, -d DATASET      HuggingFace dataset (default: wikitext-2)
  --target-modules M [M..]   Only evolve matching layers
  --seed N                   Random seed (default: 42)

eggroll info MODEL           Show model info + memory estimates

Custom Reward File Format

Create a Python file with either:

# Option 1: Score generated text
def score(text: str) -> float:
    return 1.0 if "correct" in text else 0.0

# Option 2: Full model access
def reward_fn(model, inputs) -> float:
    output = model(**inputs)
    return -output.loss.item()

Then: eggroll tune gpt2 --reward my_reward.py


Comparison to Existing EGGROLL Implementations

This library HyperscaleES (official) egg.c eggroll-embedding-trainer
Language PyTorch JAX CUDA/C PyTorch
HuggingFace integration Yes No No No
vLLM multi-LoRA backend Yes No No No
CLI Yes No No No
Custom rewards Any function Hardcoded Hardcoded NDCG only
Install pip install Manual Compile Manual
Use case General fine-tuning Research Edge/embedded Retrieval

Development

pip install -e ".[dev]"
pytest tests/ -v

Paper

Gajane et al. "Evolution Strategies at the Hyperscale" (2025) NVIDIA + University of Oxford + MILA arxiv.org/abs/2511.16652 | Project Page


License

MIT

About

Gradient-free fine-tuning for any HuggingFace model. EGGROLL Evolution Strategies in PyTorch. No backprop. No gradients. Just evolution.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages