GitHub - ShipItAndPray/eggroll: Gradient-free fine-tuning for any HuggingFace model. EGGROLL Evolution Strategies in PyTorch. No backprop. No gradients. Just evolution.

 ███████╗ ██████╗  ██████╗ ██████╗  ██████╗ ██╗     ██╗
 ██╔════╝██╔════╝ ██╔════╝ ██╔══██╗██╔═══██╗██║     ██║
 █████╗  ██║  ███╗██║  ███╗██████╔╝██║   ██║██║     ██║
 ██╔══╝  ██║   ██║██║   ██║██╔══██╗██║   ██║██║     ██║
 ███████╗╚██████╔╝╚██████╔╝██║  ██║╚██████╔╝███████╗███████╗
 ╚══════╝ ╚═════╝  ╚═════╝ ╚═╝  ╚═╝ ╚═════╝ ╚══════╝╚══════╝

Gradient-Free Fine-Tuning for Any HuggingFace Model

PyTorch implementation of EGGROLL (Evolution Strategies at the Hyperscale, NVIDIA + Oxford).
No backprop. No gradients. Just evolution.

Why EGGROLL?

Backpropagation requires differentiable objectives, massive memory for activations, and complex distributed training setups. EGGROLL replaces all of that with evolution — mutate, evaluate, keep what works.

	Backprop (LoRA/GRPO)	EGGROLL
Gradients needed	Yes	No
Memory (activations)	O(layers)	O(1)
Differentiable reward	Required	Any function
Works on quantized models	Limited	Native
Throughput	Training speed	~91% of inference speed

Install

pip install eggroll-es

Or from source:

git clone https://github.com/ShipItAndPray/eggroll.git
cd eggroll
pip install -e ".[dev]"

Quick Start

CLI — One Command

# Evolve GPT-2 to minimize perplexity
eggroll tune gpt2 --reward perplexity --generations 50

# Evolve Llama with custom reward, only attention layers
eggroll tune meta-llama/Llama-3.1-8B \
  --reward score.py \
  --target-modules q_proj v_proj \
  --population 128 \
  --rank 8

# Use an inline lambda as reward
eggroll tune gpt2 --reward "lambda text: 1.0 if 'yes' in text else 0.0"

# Show model info + EGGROLL memory estimates
eggroll info meta-llama/Llama-3.1-8B

Python API

from transformers import AutoModelForCausalLM, AutoTokenizer
from eggroll import EggrollTrainer, EggrollConfig, PerplexityReward

model = AutoModelForCausalLM.from_pretrained("gpt2")
tokenizer = AutoTokenizer.from_pretrained("gpt2")

config = EggrollConfig(
    population_size=64,   # mutations per generation
    rank=8,               # low-rank perturbation rank
    sigma=0.01,           # noise magnitude
    lr=0.001,             # learning rate
    generations=100,      # evolution steps
    target_modules=["c_attn", "c_proj"],  # only evolve these layers
)

reward = PerplexityReward(tokenizer)
trainer = EggrollTrainer(model, config, reward, tokenizer)
trainer.evolve(dataloader)
trainer.save("./evolved-model")

Custom Reward Functions

The killer feature — optimize for anything, not just differentiable losses:

from eggroll import EggrollTrainer, EggrollConfig, TextReward, CustomReward

# Score generated text (non-differentiable!)
def my_scorer(text: str) -> float:
    if "correct answer" in text:
        return 1.0
    return 0.0

reward = TextReward(tokenizer, scorer=my_scorer)

# Or use any function of (model, inputs) -> float
reward = CustomReward(lambda model, inputs: run_tests(model, inputs))

# Combine multiple rewards
from eggroll import MultiReward
reward = MultiReward([
    (PerplexityReward(tokenizer), 0.3),
    (TextReward(tokenizer, code_scorer), 0.7),
])

How It Works

Based on NVIDIA + Oxford's EGGROLL paper:

Mutate — Generate low-rank perturbations of model weights (A × B.T instead of full-rank noise)
Evaluate — Run each mutated model on your reward function
Select — Fitness-weighted combination of best mutations updates the parameters
Repeat — Each generation gets closer to optimal

Generation 0    →    Generation N
  θ₀                    θ*

  ┌──── mutate ────┐
  │  θ + ε₁  → 0.3 │    Rank perturbations:
  │  θ + ε₂  → 0.8 │    ε = σ · A · Bᵀ / √r
  │  θ + ε₃  → 0.1 │
  │  θ + ε₄  → 0.9 │    Update:
  └──── select ────┘    θ ← θ + lr · Σ fᵢεᵢ / nσ
         ↓
    θ + weighted avg

Why low-rank? Full-rank ES requires O(D) memory per population member. EGGROLL uses O(2Dr) where r << D, achieving 100x speedup while the approximation error drops as O(1/r).

vLLM Backend (Massively Parallel)

The killer feature. Instead of evaluating mutations one at a time, the vLLM backend converts each EGGROLL perturbation into a LoRA adapter and evaluates the entire population in one batched vLLM call.

# CLI
eggroll tune meta-llama/Llama-3.1-8B-Instruct \
  --backend vllm \
  --reward score.py \
  --population 128 \
  --target-modules q_proj v_proj

# Python
from eggroll.vllm_backend import VllmEggrollTrainer
from eggroll import EggrollConfig

config = EggrollConfig(population_size=128, rank=8, generations=50)

def reward_fn(outputs: list[str], prompts: list[str]) -> list[float]:
    return [1.0 if "correct" in o else 0.0 for o in outputs]

trainer = VllmEggrollTrainer(
    model_id="meta-llama/Llama-3.1-8B-Instruct",
    config=config,
    reward_fn=reward_fn,
)
results = trainer.evolve(prompts=["What is 2+2?", "Explain gravity."])
trainer.save_best_adapter("./evolved-adapter")

How it works:

Each EGGROLL perturbation is factorized via SVD into LoRA A/B matrices
Saved as PEFT-compatible adapter directories on disk
vLLM loads all adapters via LoRARequest and evaluates them in parallel
Fitnesses collected, parameters updated, repeat

Speed: ~91% of pure inference throughput (from the EGGROLL paper). With vLLM's batching, a population of 128 evaluates nearly as fast as a single inference pass.

Requirements: pip install eggroll-es[vllm]

Configuration

Parameter	Default	Description
`population_size`	256	Mutations per generation (higher = better gradient estimate)
`rank`	8	Low-rank perturbation rank (higher = more accurate, more memory)
`sigma`	0.01	Noise magnitude (too high = chaos, too low = no exploration)
`lr`	0.001	Learning rate for parameter updates
`generations`	100	Number of evolution steps
`antithetic`	True	Mirror perturbations to halve variance
`fitness_shaping`	"centered_rank"	"centered_rank", "normalized", or "raw"
`target_modules`	None	Only evolve layers matching these patterns
`elite_k`	0	Keep only top-k members (0 = use all)
`weight_decay`	0.0	L2 regularization

CLI Reference

eggroll tune MODEL [OPTIONS]

  MODEL                      HuggingFace model ID or local path

  --reward, -r REWARD        perplexity, path/to/score.py, or lambda
  --generations, -g N        Evolution generations (default: 100)
  --population, -p N         Population size (default: 64)
  --rank N                   Low-rank perturbation rank (default: 8)
  --sigma F                  Noise std dev (default: 0.01)
  --lr F                     Learning rate (default: 0.001)
  --output, -o DIR           Output directory
  --dataset, -d DATASET      HuggingFace dataset (default: wikitext-2)
  --target-modules M [M..]   Only evolve matching layers
  --seed N                   Random seed (default: 42)

eggroll info MODEL           Show model info + memory estimates

Custom Reward File Format

Create a Python file with either:

# Option 1: Score generated text
def score(text: str) -> float:
    return 1.0 if "correct" in text else 0.0

# Option 2: Full model access
def reward_fn(model, inputs) -> float:
    output = model(**inputs)
    return -output.loss.item()

Then: eggroll tune gpt2 --reward my_reward.py

Comparison to Existing EGGROLL Implementations

	This library	HyperscaleES (official)	egg.c	eggroll-embedding-trainer
Language	PyTorch	JAX	CUDA/C	PyTorch
HuggingFace integration	Yes	No	No	No
vLLM multi-LoRA backend	Yes	No	No	No
CLI	Yes	No	No	No
Custom rewards	Any function	Hardcoded	Hardcoded	NDCG only
Install	pip install	Manual	Compile	Manual
Use case	General fine-tuning	Research	Edge/embedded	Retrieval

Development

pip install -e ".[dev]"
pytest tests/ -v

Paper

Gajane et al. "Evolution Strategies at the Hyperscale" (2025) NVIDIA + University of Oxford + MILA arxiv.org/abs/2511.16652 | Project Page

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.github/workflows		.github/workflows
eggroll		eggroll
examples		examples
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Gradient-Free Fine-Tuning for Any HuggingFace Model

Why EGGROLL?

Install

Quick Start

CLI — One Command

Python API

Custom Reward Functions

How It Works

vLLM Backend (Massively Parallel)

Configuration

CLI Reference

Custom Reward File Format

Comparison to Existing EGGROLL Implementations

Development

Paper

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Gradient-Free Fine-Tuning for Any HuggingFace Model

Why EGGROLL?

Install

Quick Start

CLI — One Command

Python API

Custom Reward Functions

How It Works

vLLM Backend (Massively Parallel)

Configuration

CLI Reference

Custom Reward File Format

Comparison to Existing EGGROLL Implementations

Development

Paper

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages