algorithmicsuperintelligence · codelion · Aug 15, 2025 · Aug 13, 2025 · Aug 13, 2025 · Aug 13, 2025
diff --git a/README.md b/README.md
@@ -91,6 +91,44 @@ OpenEvolve orchestrates a sophisticated evolutionary pipeline:
    - Feature map clustering and archive management
    - Comprehensive metadata and lineage tracking
 
+### Island-Based Evolution with Worker Pinning
+
+OpenEvolve implements a sophisticated island-based evolutionary architecture that maintains multiple isolated populations to prevent premature convergence and preserve genetic diversity.
+
+#### How Islands Work
+
+- **Multiple Isolated Populations**: Each island maintains its own population of programs that evolve independently
+- **Periodic Migration**: Top-performing programs periodically migrate between adjacent islands (ring topology) to share beneficial mutations
+- **True Population Isolation**: Worker processes are deterministically pinned to specific islands to ensure no cross-contamination during parallel evolution
+
+#### Worker-to-Island Pinning
+
+To ensure true island isolation during parallel execution, OpenEvolve implements automatic worker-to-island pinning:
+
+```python
+# Workers are distributed across islands using modulo arithmetic
+worker_id = 0, 1, 2, 3, 4, 5, ...
+island_id = worker_id % num_islands
+
+# Example with 3 islands and 6 workers:
+# Worker 0, 3 → Island 0  
+# Worker 1, 4 → Island 1
+# Worker 2, 5 → Island 2
+```
+
+**Benefits of Worker Pinning**:
+- **Genetic Isolation**: Prevents accidental population mixing between islands during parallel sampling
+- **Consistent Evolution**: Each island maintains its distinct evolutionary trajectory
+- **Balanced Load**: Workers are evenly distributed across islands automatically
+- **Migration Integrity**: Controlled migration happens only at designated intervals, not due to race conditions
+
+**Automatic Distribution**: The system handles all edge cases automatically:
+- **More workers than islands**: Multiple workers per island with balanced distribution
+- **Fewer workers than islands**: Some islands may not have dedicated workers but still participate in migration
+- **Single island**: All workers sample from the same population (degrades to standard evolution)
+
+This architecture ensures that each island develops unique evolutionary pressures and solutions, while periodic migration allows successful innovations to spread across the population without destroying diversity.
+
 ## Getting Started
 
 ### Installation
@@ -377,6 +415,29 @@ database:
     correctness: 15       # 15 bins for correctness (from YOUR evaluator)
 ```
 
+**CRITICAL: Return Raw Values, Not Bin Indices**: For custom feature dimensions, your evaluator must return **raw continuous values**, not pre-computed bin indices. OpenEvolve handles all scaling and binning internally.
+
+```python
+# ✅ CORRECT: Return raw values
+return {
+    "combined_score": 0.85,
+    "prompt_length": 1247,     # Actual character count
+    "execution_time": 0.234    # Raw time in seconds
+}
+
+# ❌ WRONG: Don't return bin indices
+return {
+    "combined_score": 0.85,
+    "prompt_length": 7,        # Pre-computed bin index
+    "execution_time": 3        # Pre-computed bin index
+}
+```
+
+OpenEvolve automatically handles:
+- Min-max scaling to [0,1] range
+- Binning into the specified number of bins  
+- Adaptive scaling as the value range expands during evolution
+
 **Important**: OpenEvolve will raise an error if a specified feature is not found in the evaluator's metrics. This ensures your configuration is correct. The error message will show available metrics to help you fix the configuration.
 
 See the [Configuration Guide](configs/default_config.yaml) for a full list of options.

diff --git a/examples/README.md b/examples/README.md
@@ -133,6 +133,56 @@ log_level: "INFO"
 ❌ **Wrong:** Multiple EVOLVE-BLOCK sections  
 ✅ **Correct:** Exactly one EVOLVE-BLOCK section
 
+## MAP-Elites Feature Dimensions Best Practices
+
+When using custom feature dimensions, your evaluator must return **raw continuous values**, not pre-computed bin indices:
+
+### ✅ Correct: Return Raw Values
+```python
+def evaluate(program_path: str) -> Dict:
+    # Calculate actual measurements
+    prompt_length = len(generated_prompt)  # Actual character count
+    execution_time = measure_runtime()     # Time in seconds
+    memory_usage = get_peak_memory()       # Bytes used
+
+    return {
+        "combined_score": accuracy_score,
+        "prompt_length": prompt_length,    # Raw count, not bin index
+        "execution_time": execution_time,  # Raw seconds, not bin index  
+        "memory_usage": memory_usage       # Raw bytes, not bin index
+    }
+```
+
+### ❌ Wrong: Return Bin Indices
+```python
+def evaluate(program_path: str) -> Dict:
+    prompt_length = len(generated_prompt)
+
+    # DON'T DO THIS - pre-computing bins
+    if prompt_length < 100:
+        length_bin = 0
+    elif prompt_length < 500:
+        length_bin = 1
+    # ... more binning logic
+
+    return {
+        "combined_score": accuracy_score,
+        "prompt_length": length_bin,  # ❌ This is a bin index, not raw value
+    }
+```
+
+### Why This Matters
+- OpenEvolve uses min-max scaling internally
+- Bin indices get incorrectly scaled as if they were raw values
+- Grid positions become unstable as new programs change the min/max range
+- This violates MAP-Elites principles and leads to poor evolution
+
+### Examples of Good Feature Dimensions
+- **Counts**: Token count, line count, character count
+- **Performance**: Execution time, memory usage, throughput
+- **Quality**: Accuracy, precision, recall, F1 score  
+- **Complexity**: Cyclomatic complexity, nesting depth, function count
+
 ## Running Your Example
 
 ```bash

diff --git a/examples/algotune/GEMINI_FLASH_2.5_EXPERIMENT_REPORT.md b/examples/algotune/GEMINI_FLASH_2.5_EXPERIMENT_REPORT.md
@@ -0,0 +1,239 @@
+# OpenEvolve AlgoTune Benchmark Report: Gemini Flash 2.5 Experiment
+
+## Executive Summary
+
+This report documents the comprehensive evaluation of Google's Gemini Flash 2.5 model using OpenEvolve to optimize code across 8 AlgoTune benchmark tasks. The experiment ran for 114.6 minutes with a 100% success rate, discovering significant algorithmic improvements in 2 out of 8 tasks, including a remarkable 189.94x speedup for 2D convolution operations.
+
+## Experiment Configuration
+
+### Model Settings
+- **Model**: Google Gemini Flash 2.5 (`google/gemini-2.5-flash`)
+- **Temperature**: 0.4 (optimal based on prior tuning)
+- **Max Tokens**: 16,000
+- **Evolution Strategy**: Diff-based evolution
+- **API Provider**: OpenRouter
+
+### Evolution Parameters
+- **Iterations per task**: 100
+- **Checkpoint interval**: Every 10 iterations
+- **Population size**: 1,000 programs
+- **Number of islands**: 4 (for diversity)
+- **Migration interval**: Every 20 generations
+
+### Evaluation Settings
+- **Cascade evaluation**: Enabled with 3 stages
+- **Stage 2 timeout**: 200 seconds
+- **Number of trials**: 5 test cases per evaluation
+- **Timing runs**: 3 runs + 1 warmup per trial
+- **Total executions per evaluation**: 16
+
+## Critical Issue and Resolution
+
+### The Data Size Problem
+Initially, all tasks were timing out during Stage 2 evaluation despite individual runs taking only ~60 seconds. Investigation revealed:
+
+- **Root cause**: Each evaluation actually performs 16 executions (5 trials × 3 timing runs + warmup)
+- **Original calculation**: 60 seconds × 16 = 960 seconds > 200-second timeout
+- **Solution**: Reduced data_size parameters by factor of ~16
+
+### Adjusted Data Sizes
+| Task | Original | Adjusted | Reduction Factor |
+|------|----------|----------|-----------------|
+| affine_transform_2d | 2000 | 100 | 20x |
+| convolve2d_full_fill | 20 | 5 | 4x |
+| eigenvectors_complex | 400 | 25 | 16x |
+| fft_cmplx_scipy_fftpack | 1500 | 95 | 15.8x |
+| fft_convolution | 2000 | 125 | 16x |
+| lu_factorization | 400 | 25 | 16x |
+| polynomial_real | 8000 | 500 | 16x |
+| psd_cone_projection | 600 | 35 | 17.1x |
+
+## Results Overview
+
+### Performance Summary
+| Task | Speedup | Combined Score | Runtime (s) | Status |
+|------|---------|----------------|-------------|---------|
+| convolve2d_full_fill | **189.94x** 🚀 | 0.955 | 643.2 | ✅ |
+| psd_cone_projection | **2.37x** 🔥 | 0.975 | 543.5 | ✅ |
+| eigenvectors_complex | 1.074x | 0.974 | 1213.2 | ✅ |
+| lu_factorization | 1.062x | 0.987 | 727.9 | ✅ |
+| affine_transform_2d | 1.053x | 0.939 | 577.5 | ✅ |
+| polynomial_real | 1.036x | 0.801 | 2181.3 | ✅ |
+| fft_cmplx_scipy_fftpack | 1.017x | 0.984 | 386.5 | ✅ |
+| fft_convolution | 1.014x | 0.987 | 605.6 | ✅ |
+
+### Key Metrics
+- **Total runtime**: 114.6 minutes
+- **Success rate**: 100% (8/8 tasks)
+- **Tasks with significant optimization**: 2/8 (25%)
+- **Tasks with minor improvements**: 6/8 (75%)
+- **Average time per task**: 14.3 minutes
+
+## Detailed Analysis of Optimizations
+
+### 1. convolve2d_full_fill - 189.94x Speedup (Major Success)
+
+**Original Implementation:**
+```python
+def solve(self, problem):
+    a, b = problem
+    result = signal.convolve2d(a, b, mode=self.mode, boundary=self.boundary)
+    return result
+```
+
+**Evolved Implementation:**
+```python
+def solve(self, problem):
+    a_in, b_in = problem
+    # Ensure inputs are float64 and C-contiguous for optimal performance with FFT
+    a = a_in if a_in.flags['C_CONTIGUOUS'] and a_in.dtype == np.float64 else np.ascontiguousarray(a_in, dtype=np.float64)
+    b = b_in if b_in.flags['C_CONTIGUOUS'] and b_in.dtype == np.float64 else np.ascontiguousarray(b_in, dtype=np.float64)
+    result = signal.fftconvolve(a, b, mode=self.mode)
+    return result
+```
+
+**Key Optimizations:**
+- **Algorithmic change**: Switched from `convolve2d` (O(n⁴)) to `fftconvolve` (O(n²log n))
+- **Memory optimization**: Ensured C-contiguous memory layout for FFT efficiency
+- **Type optimization**: Explicit float64 dtype for numerical stability
+
+### 2. psd_cone_projection - 2.37x Speedup (Moderate Success)
+
+**Original Implementation:**
+```python
+def solve(self, problem):
+    A = problem["matrix"]
+    # Standard eigendecomposition
+    eigvals, eigvecs = np.linalg.eig(A)
+    eigvals = np.maximum(eigvals, 0)
+    X = eigvecs @ np.diag(eigvals) @ eigvecs.T
+    return {"projection": X}
+```
+
+**Evolved Implementation:**
+```python
+def solve(self, problem):
+    A = problem["matrix"]
+    # Use eigh for symmetric matrices for better performance and numerical stability
+    eigvals, eigvecs = np.linalg.eigh(A)
+    # Clip negative eigenvalues to zero
+    eigvals = np.maximum(eigvals, 0)
+    # Optimized matrix multiplication: multiply eigvecs with eigvals first
+    X = (eigvecs * eigvals) @ eigvecs.T
+    return {"projection": X}
+```
+
+**Key Optimizations:**
+- **Specialized function**: Used `eigh` instead of `eig` for symmetric matrices
+- **Optimized multiplication**: Changed from `eigvecs @ np.diag(eigvals) @ eigvecs.T` to `(eigvecs * eigvals) @ eigvecs.T`
+- **Better numerical stability**: `eigh` guarantees real eigenvalues for symmetric matrices
+
+### 3. Minor Optimizations (1.01x - 1.07x Speedup)
+
+**affine_transform_2d (1.053x):**
+```python
+# Original
+image = problem["image"]
+matrix = problem["matrix"]
+
+# Evolved
+image = np.asarray(problem["image"], dtype=float)
+matrix = np.asarray(problem["matrix"], dtype=float)
+```
+- Added explicit type conversion to avoid runtime type checking
+
+**Other tasks** showed no visible code changes, suggesting:
+- Speedups likely due to measurement variance
+- Minor internal optimizations not visible in source
+- Statistical noise in timing measurements
+
+## What Worked Well
+
+### 1. Evolution Discovery Capabilities
+- Successfully discovered FFT-based convolution optimization (189x speedup)
+- Found specialized functions for symmetric matrices (2.37x speedup)
+- Identified memory layout optimizations
+
+### 2. Configuration Optimizations
+- Diff-based evolution worked better than full rewrites for Gemini
+- Temperature 0.4 provided good balance between exploration and exploitation
+- Island-based evolution maintained diversity
+
+### 3. System Robustness
+- 100% task completion rate after data size adjustment
+- No crashes or critical failures
+- Checkpoint system allowed progress tracking
+
+## What Didn't Work
+
+### 1. Limited Optimization Discovery
+- 6 out of 8 tasks showed minimal improvements (<7%)
+- Most baseline implementations were already near-optimal
+- Evolution struggled to find improvements for already-optimized code
+
+### 2. Initial Configuration Issues
+- Original data_size values caused timeouts
+- Required manual intervention to adjust parameters
+- Cascade evaluation timing wasn't initially accounted for
+
+### 3. Minor Perturbations vs Real Optimizations
+- Many "improvements" were just measurement noise
+- Small type conversions counted as optimizations
+- Difficult to distinguish real improvements from variance
+
+## Lessons Learned
+
+### 1. Evaluation Complexity
+- Must account for total execution count (trials × runs × warmup)
+- Cascade evaluation adds significant overhead
+- Timeout settings need careful calibration
+
+### 2. Baseline Quality Matters
+- Well-optimized baselines leave little room for improvement
+- AlgoTune baselines already use efficient libraries (scipy, numpy)
+- Major improvements only possible with algorithmic changes
+
+### 3. Evolution Effectiveness
+- Works best when alternative algorithms exist (convolve2d → fftconvolve)
+- Can find specialized functions (eig → eigh)
+- Struggles with micro-optimizations
+
+## Recommendations for Future Experiments
+
+### 1. Task Selection
+- Include tasks with known suboptimal baseline implementations
+- Add problems where multiple algorithmic approaches exist
+- Consider more complex optimization scenarios
+
+### 2. Configuration Tuning
+- Pre-calculate total execution time for data sizing
+- Consider reducing trials/runs for faster iteration
+- Adjust timeout based on actual execution patterns
+
+### 3. Model Comparison Setup
+For comparing with other models (e.g., Claude, GPT-4):
+- Use identical configuration parameters
+- Run on same hardware for fair comparison
+- Track both speedup and code quality metrics
+- Document any model-specific adjustments needed
+
+## Conclusion
+
+The Gemini Flash 2.5 experiment demonstrated OpenEvolve's capability to discover significant algorithmic improvements when they exist. The system achieved a 189.94x speedup on 2D convolution by automatically discovering FFT-based methods and a 2.37x speedup on PSD projection through specialized matrix operations.
+
+However, the experiment also revealed that for well-optimized baseline implementations, evolution produces minimal improvements. The 25% success rate for finding meaningful optimizations suggests that careful task selection is crucial for demonstrating evolutionary code optimization effectiveness.
+
+### Next Steps
+1. Run identical benchmark with alternative LLM models
+2. Compare optimization discovery rates across models
+3. Analyze code quality and correctness across different models
+4. Document model-specific strengths and weaknesses
+
+---
+
+**Experiment Details:**
+- Date: August 14, 2025
+- Duration: 114.6 minutes
+- Hardware: MacOS (Darwin 24.5.0)
+- OpenEvolve Version: Current main branch
+- API Provider: OpenRouter