Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
189 changes: 149 additions & 40 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,33 +1,94 @@
# OpenEvolve

An open-source implementation of the AlphaEvolve system described in the Google DeepMind paper "AlphaEvolve: A coding agent for scientific and algorithmic discovery" (2025).
An open-source evolutionary coding agent that began as a faithful implementation of AlphaEvolve and has evolved far beyond it, enabling automated scientific and algorithmic discovery.

![OpenEvolve Logo](openevolve-logo.png)

## Overview

OpenEvolve is an evolutionary coding agent that uses Large Language Models to optimize code through an iterative process. It orchestrates a pipeline of LLM-based code generation, evaluation, and selection to continuously improve programs for a variety of tasks.

Key features:
- Evolution of entire code files, not just single functions
- Support for multiple programming languages
- Supports OpenAI-compatible APIs for any LLM
- Multi-objective optimization
- Flexible prompt engineering
- Distributed evaluation
OpenEvolve is an evolutionary coding agent that uses Large Language Models to automatically optimize and discover algorithms through iterative improvement. Starting from the AlphaEvolve research, it incorporates advanced features for reproducibility, multi-language support, sophisticated evaluation pipelines, and integration with cutting-edge LLM optimization techniques. It serves as both a research platform for evolutionary AI and a practical tool for automated code optimization.

### Key Features

OpenEvolve implements a comprehensive evolutionary coding system with:

- **Evolutionary Coding Agent**: LLM-guided evolution of entire code files (not just functions)
- **Distributed Controller Loop**: Asynchronous pipeline coordinating LLMs, evaluators, and databases
- **Program Database**: Storage and sampling of evolved programs with evaluation metrics
- **Prompt Sampling**: Context-rich prompts with past programs, scores, and problem descriptions
- **LLM Ensemble**: Multiple language models working together for code generation
- **Multi-objective Optimization**: Simultaneous optimization of multiple evaluation metrics
- **Checkpoint System**: Automatic saving and resuming of evolution state

#### 🔬 **Scientific Reproducibility**
- **Comprehensive Seeding**: Full deterministic reproduction with hash-based component isolation
- **Default Reproducibility**: Seed=42 by default for immediate reproducible results
- **Granular Control**: Per-component seeding for LLMs, database, and evaluation pipeline

#### 🤖 **Advanced LLM Integration**
- **Ensemble Sophistication**: Weighted model combinations with intelligent fallback strategies
- **Test-Time Compute**: Integration with [optillm](https://github.com/codelion/optillm) for Mixture of Agents (MoA) and enhanced reasoning
- **Universal API Support**: Works with any OpenAI-compatible endpoint (Anthropic, Google, local models)
- **Plugin Ecosystem**: Support for optillm plugins (readurls, executecode, z3_solver, etc.)

#### 🧬 **Evolution Algorithm Innovations**
- **MAP-Elites Implementation**: Quality-diversity algorithm for balanced exploration/exploitation
- **Island-Based Evolution**: Multiple populations with periodic migration for diversity maintenance
- **Inspiration vs Performance**: Sophisticated prompt engineering separating top performers from diverse inspirations
- **Multi-Strategy Selection**: Elite, diverse, and exploratory program sampling strategies

#### 📊 **Evaluation & Feedback Systems**
- **Artifacts Side-Channel**: Capture build errors, profiling data, and execution feedback for LLM improvement
- **Cascade Evaluation**: Multi-stage testing with progressive complexity for efficient resource usage
- **LLM-Based Feedback**: Automated code quality assessment and reasoning capture
- **Comprehensive Error Handling**: Graceful recovery from evaluation failures with detailed diagnostics

#### 🌐 **Multi-Language & Platform Support**
- **Language Agnostic**: Python, Rust, R, Metal shaders, and more
- **Platform Optimization**: Apple Silicon GPU kernels, CUDA optimization, CPU-specific tuning
- **Framework Integration**: MLX, PyTorch, scientific computing libraries

#### 🔧 **Developer Experience & Tooling**
- **Real-Time Visualization**: Interactive web-based evolution tree viewer with performance analytics
- **Advanced CLI**: Rich command-line interface with checkpoint management and configuration override
- **Comprehensive Examples**: 12+ diverse examples spanning optimization, ML, systems programming, and scientific computing
- **Error Recovery**: Robust checkpoint loading with automatic fix for common serialization issues

#### 🚀 **Performance & Scalability**
- **Threaded Parallelism**: High-throughput asynchronous evaluation pipeline
- **Resource Management**: Memory limits, timeouts, and resource monitoring
- **Efficient Storage**: Optimized database with artifact management and cleanup policies

## How It Works

OpenEvolve follows an evolutionary approach with the following components:
OpenEvolve orchestrates a sophisticated evolutionary pipeline:

![OpenEvolve Architecture](openevolve-architecture.png)

1. **Prompt Sampler**: Creates context-rich prompts containing past programs, their scores, and problem descriptions
2. **LLM Ensemble**: Generates code modifications via an ensemble of language models
3. **Evaluator Pool**: Tests generated programs and assigns scores
4. **Program Database**: Stores programs and their evaluation metrics, guiding future evolution
### Core Evolution Loop

1. **Enhanced Prompt Sampler**: Creates rich prompts containing:
- Top-performing programs (for optimization guidance)
- Diverse inspiration programs (for creative exploration)
- Execution artifacts and error feedback
- Dynamic documentation fetching (via optillm plugins)

2. **Intelligent LLM Ensemble**:
- Weighted model combinations for quality/speed tradeoffs
- Test-time compute techniques (MoA, chain-of-thought, reflection)
- Deterministic selection with comprehensive seeding

3. **Advanced Evaluator Pool**:
- Multi-stage cascade evaluation
- Artifact collection for detailed feedback
- LLM-based code quality assessment
- Parallel execution with resource limits

The controller orchestrates interactions between these components in an asynchronous pipeline, maximizing throughput to evaluate as many candidate solutions as possible.
4. **Sophisticated Program Database**:
- MAP-Elites algorithm for quality-diversity balance
- Island-based populations with migration
- Feature map clustering and archive management
- Comprehensive metadata and lineage tracking

## Getting Started

Expand Down Expand Up @@ -192,22 +253,45 @@ docker run --rm -v $(pwd):/app --network="host" openevolve examples/function_min

## Configuration

OpenEvolve is highly configurable. You can specify configuration options in a YAML file:
OpenEvolve is highly configurable with advanced options:

```yaml
# Example configuration
# Example configuration showcasing advanced features
max_iterations: 1000
random_seed: 42 # Full reproducibility by default

llm:
primary_model: "gemini-2.0-flash-lite"
secondary_model: "gemini-2.0-flash"
# Advanced ensemble configuration
models:
- name: "gemini-2.0-flash-lite"
weight: 0.7
- name: "moa&readurls-gemini-2.0-flash" # optillm test-time compute
weight: 0.3
temperature: 0.7

database:
# MAP-Elites configuration
population_size: 500
num_islands: 5
num_islands: 5 # Island-based evolution
migration_interval: 20
feature_dimensions: ["score", "complexity"] # Quality-diversity features

evaluator:
# Advanced evaluation features
enable_artifacts: true # Capture execution feedback
cascade_evaluation: true # Multi-stage testing
use_llm_feedback: true # AI-based code quality assessment

prompt:
# Sophisticated prompt engineering
num_top_programs: 3 # Performance examples
num_diverse_programs: 2 # Creative inspiration
include_artifacts: true # Execution feedback
```

Sample configuration files are available in the `configs/` directory:
- `default_config.yaml`: Comprehensive configuration with all available options
- `island_config_example.yaml`: Advanced island-based evolution setup

See the [Configuration Guide](configs/default_config.yaml) for a full list of options.

Expand Down Expand Up @@ -287,37 +371,60 @@ export ENABLE_ARTIFACTS=false

See the `examples/` directory for complete examples of using OpenEvolve on various problems:

### Symbolic Regression
### Mathematical Optimization

#### [Function Minimization](examples/function_minimization/)
A comprehensive example demonstrating evolution from random search to sophisticated simulated annealing.

A comprehensive example demonstrating OpenEvolve's application to symbolic regression tasks using the LLM-SRBench benchmark. This example shows how OpenEvolve can evolve simple mathematical expressions (like linear models) into complex symbolic formulas that accurately fit scientific datasets.
#### [Circle Packing](examples/circle_packing/)
Our implementation of the circle packing problem. For the n=26 case, we achieve state-of-the-art results matching published benchmarks.

[Explore the Symbolic Regression Example](examples/symbolic_regression/)
Below is the optimal packing found by OpenEvolve after 800 iterations:

Key features:
- Automatic generation of initial programs from benchmark tasks
- Evolution from simple linear models to complex mathematical expressions
- Evaluation on physics, chemistry, biology, and material science datasets
- Competitive results compared to state-of-the-art symbolic regression methods
![circle-packing-result](https://github.com/user-attachments/assets/00100f9e-2ac3-445b-9266-0398b7174193)

### Circle Packing
### Advanced AI & LLM Integration

Our implementation of the circle packing problem from the AlphaEvolve paper. For the n=26 case, where one needs to pack 26 circles in a unit square we also obtain SOTA results.
#### [Web Scraper with optillm](examples/web_scraper_optillm/)
Demonstrates integration with [optillm](https://github.com/codelion/optillm) for test-time compute optimization, including:
- **readurls plugin**: Automatic documentation fetching
- **Mixture of Agents (MoA)**: Multi-response synthesis for improved accuracy
- **Local model optimization**: Enhanced reasoning with smaller models

[Explore the Circle Packing Example](examples/circle_packing/)
#### [LLM Prompt Optimization](examples/llm_prompt_optimazation/)
Evolving prompts themselves for better LLM performance, demonstrating self-improving AI systems.

We have sucessfully replicated the results from the AlphaEvolve paper, below is the packing found by OpenEvolve after 800 iterations
### Systems & Performance Optimization

![alpha-evolve-replication](https://github.com/user-attachments/assets/00100f9e-2ac3-445b-9266-0398b7174193)
#### [MLX Metal Kernel Optimization](examples/mlx_metal_kernel_opt/)
Automated discovery of custom GPU kernels for Apple Silicon, achieving:
- **2-3x speedup** over baseline attention implementations
- **Hardware-aware optimizations** for unified memory architecture
- **Metal shader evolution** with numerical correctness validation

This is exactly the packing reported by AlphaEvolve in their paper (Figure 14):
#### [Rust Adaptive Sort](examples/rust_adaptive_sort/)
Evolution of sorting algorithms that adapt to data patterns, showcasing OpenEvolve's language-agnostic capabilities.

![alpha-evolve-results](https://github.com/user-attachments/assets/0c9affa5-053d-404e-bb2d-11479ab248c9)
### Scientific Computing & Discovery

### Function Minimization
#### [Symbolic Regression](examples/symbolic_regression/)
A comprehensive example demonstrating automated discovery of mathematical expressions from scientific datasets using the LLM-SRBench benchmark.

#### [R Robust Regression](examples/r_robust_regression/)
Developing robust regression methods resistant to outliers using R language support.

#### [Signal Processing](examples/signal_processing/)
Automated design of digital filters with superior performance characteristics.

### Web and Integration Examples

#### [Online Judge Programming](examples/online_judge_programming/)
Automated competitive programming solution generation with external evaluation systems.

#### [LM-Eval Integration](examples/lm_eval/)
Working with standard ML evaluation harnesses for automated benchmark improvement.

An example showing how OpenEvolve can transform a simple random search algorithm into a sophisticated simulated annealing approach.

[Explore the Function Minimization Example](examples/function_minimization/)

## Preparing Your Own Problems

Expand All @@ -334,10 +441,12 @@ If you use OpenEvolve in your research, please cite:

```
@software{openevolve,
title = {OpenEvolve: Open-source implementation of AlphaEvolve},
title = {OpenEvolve: an open-source evolutionary coding agent},
author = {Asankhaya Sharma},
year = {2025},
publisher = {GitHub},
url = {https://github.com/codelion/openevolve}
}
```