From f8fe184236edd1dd79e3e4062cfa7859a3387bbf Mon Sep 17 00:00:00 2001 From: dariocazzani Date: Wed, 7 Jan 2026 11:25:42 -0500 Subject: [PATCH] =?UTF-8?q?=F0=9F=93=9D=20Add=20comprehensive=20README=20w?= =?UTF-8?q?ith=20algorithm=20explanation?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Explain diffusion evolution concept with accessible foggy room analogy - Include quick start guide with code examples and configuration options - Document benchmark suite usage for parameter tuning - Add technical deep dive with step-by-step algorithm breakdown and equations - Reference the source paper (arXiv:2410.02543) --- README.md | 221 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 221 insertions(+) diff --git a/README.md b/README.md index e69de29..ca92227 100644 --- a/README.md +++ b/README.md @@ -0,0 +1,221 @@ +# Devol + +**Diffusion Evolution** - What if evolution worked like image generation? + +## What Is This? + +Traditional evolutionary algorithms create new solutions by *copying and mutating* successful ones. Devol does something different: it starts with pure noise and *denoises* toward good solutions, guided by fitness. + +The core idea: instead of asking "what should the children of good solutions look like?", we ask "given this random noise, what good solution could it have come from?" + +This reframing gives us an algorithm that naturally transitions from broad exploration to precise optimization - without any special tuning. + +**The intuition**: Imagine you're in a foggy room full of people, each standing at a different elevation. You can only see your immediate neighbors through the fog. To find the highest point, you don't just copy the person next to you - you look at everyone nearby, weight them by height, and move toward the weighted average. As the fog clears (denoising), your steps become smaller and more precise. + +## Quick Start + +```python +import numpy as np +from devol import DiffusionEvolution, DiffusionConfig + +# Define what you're optimizing +def sphere(x: np.ndarray) -> float: + """Simple sphere function - maximum at origin.""" + return -np.sum(x ** 2) + +# Configure the algorithm +config = DiffusionConfig( + population_size=128, + num_steps=100, + param_dim=10, + sigma_m=0.5, +) + +# Run evolution +algo = DiffusionEvolution(config, sphere) +algo.run(initial_population=None) + +# Get results +best_solution, best_fitness = algo.get_best_individual() +print(f"Best fitness: {best_fitness:.6f}") +``` + +### Configuration Options + +| Parameter | Description | Default | +|-----------|-------------|---------| +| `population_size` | Number of candidate solutions | 512 | +| `num_steps` | Denoising iterations | 50 | +| `param_dim` | Dimensionality of search space | (required) | +| `sigma_m` | Mutation scale [0, 1] | 1.0 | +| `schedule.type` | `linear`, `cosine`, or `ddpm` | `cosine` | +| `fitness.mapping` | How fitness converts to weights | `direct` | +| `fitness.temperature` | Sharpness of fitness weighting | 1.0 | + +### Using YAML Configuration + +```python +from devol import DiffusionEvolution +from devol.config import DiffusionConfig +from pydantic_yaml import parse_yaml_file_as + +config = parse_yaml_file_as(DiffusionConfig, "config.yaml") +algo = DiffusionEvolution(config, your_fitness_function) +``` + +Example `config.yaml`: +```yaml +population_size: 256 +num_steps: 200 +param_dim: 32 +sigma_m: 0.5 +schedule: + type: cosine + epsilon: 0.0001 +fitness: + mapping: exponential + temperature: 2.0 + normalize: min_max +``` + +## Benchmarks + +The benchmark suite helps you understand how different configurations perform on challenging optimization landscapes. + +### Why Benchmark? + +Different problems favor different settings: +- **Multimodal landscapes** (many local optima): May need higher `sigma_m` and more steps +- **High-dimensional spaces**: May need larger populations +- **Smooth landscapes**: Can often use fewer steps with aggressive schedules + +The benchmarks use the **Rastrigin function** - a notoriously difficult test case with a global optimum surrounded by a regular grid of local optima. If your configuration works on Rastrigin, it has a fighting chance on real problems. + +### Running Benchmarks + +```bash +uv run -m benchmark.main +``` + +This runs a grid search across: +- **Schedule types**: linear, cosine, ddpm +- **Population sizes**: 64, 128, 256 +- **Steps**: 64, 128, 256, 512 +- **Dimensions**: 8, 16, 32, 64 +- **Mutation scales**: 0.2, 0.5, 0.8, 1.0 + +Results are saved to `benchmark_results/` with visualizations showing: +- Best fitness achieved per configuration +- How different schedules compare +- The effect of population size vs. steps tradeoffs + +### Custom Benchmarks + +```python +from benchmark import GridSearchRunner + +def your_objective(x): + # Your fitness function here + return score + +runner = GridSearchRunner( + fitness_fn=your_objective, + schedule_types=["cosine", "ddpm"], + population_sizes=[128, 256], + num_steps_list=[100, 200], + param_dims=[16], + sigma_m_values=[0.5, 0.8], + seeds=[42, 123], +) + +results = runner.run(verbose=True) +``` + +The runner uses multiprocessing to parallelize experiments across CPU cores. + +--- + +## How It Actually Works + +If you're curious about the mechanics, here's the full story. + +### The Diffusion Perspective + +The key insight: if you add enough random noise to any population of solutions, they all become indistinguishable - just random static. The magic is in *reversing* that process. + +- **Forward process**: Take good solutions and gradually add noise until they're unrecognizable +- **Reverse process**: Start with pure noise and gradually remove it, guided by fitness + +The reverse process is where evolution happens. At each step, we ask: "Given this noisy solution, what did the *clean* solution probably look like?" And we answer using two signals: + +1. **Fitness**: Better solutions should be more likely origins +2. **Proximity**: Solutions that are closer in parameter space are more relevant + +This is captured in a beautifully simple equation. To estimate what a noisy point `xₜ` originally was, we compute a weighted average: + +``` +x̂₀ = Σ (fitness_weight × proximity_weight × candidate) / normalization +``` + +Each candidate solution contributes based on both how good it is *and* how close it is to the noisy observation. This creates a kind of "gravitational pull" toward high-fitness regions, but filtered through local structure. + +### Why Proximity Matters + +The proximity weighting is the secret sauce. In traditional evolution, a mutation in New York affects a solution in Tokyo with the same probability. In diffusion evolution, the influence is local - solutions only "see" their neighbors. + +This means: +- **Early iterations** (high noise): Large-scale structure emerges, populations cluster toward promising regions +- **Late iterations** (low noise): Fine-grained optimization, solutions converge precisely to peaks + +The algorithm naturally transitions from exploration to exploitation without any explicit scheduling. + +### The Algorithm Step by Step + +Here's what happens at each denoising step: + +**Step 1: Estimate the clean solution** + +For each noisy solution `xₜ`, we estimate what it was before noise was added: + +``` +x̂₀ = (1/Z) Σ g[f(x)] × N(xₜ; √αₜ·x, 1-αₜ) × x +``` + +where: +- `g[f(x)]` is a fitness-based weight (fitter solutions contribute more) +- `N(...)` is a Gaussian that weights by proximity +- `Z` normalizes everything + +**Step 2: Compute the predicted noise** + +``` +ε̂ = (xₜ - √αₜ · x̂₀) / √(1-αₜ) +``` + +This is the noise we think was added to get from `x̂₀` to `xₜ`. + +**Step 3: Take the evolution step** + +``` +xₜ₋₁ = √αₜ₋₁ · x̂₀ + direction_term · ε̂ + σₜ · noise +``` + +We move toward our estimate of the clean solution, partially preserving the predicted noise direction, and add fresh stochasticity controlled by `σₜ`. + +### The Noise Schedule + +The parameter `αₜ` controls how much "signal" remains at step `t`: +- `αₜ = 1`: Pure signal, no noise +- `αₜ = 0`: Pure noise, no signal + +The schedule (linear, cosine, or DDPM) determines how quickly we transition. Cosine schedules spend more time in the middle range where interesting structure emerges. + +## References + +This implementation is based on: + +> **Diffusion Models are Evolutionary Algorithms** +> arXiv:2410.02543 +> https://arxiv.org/abs/2410.02543 + +The paper establishes the theoretical connection between diffusion models and evolutionary computation, showing that the iterative denoising process can be interpreted as fitness-guided evolution with proximity-aware selection.