From 46dfe2d0b12585ede9f625778467e02ba8d16bc8 Mon Sep 17 00:00:00 2001 From: Asankhaya Sharma Date: Tue, 8 Jul 2025 07:34:56 +0800 Subject: [PATCH 1/5] Update contribution and setup docs for LLM configuration Improved instructions in CONTRIBUTING.md and README.md for setting up the development environment, running tests, and configuring LLM providers. Added details on using mock API keys for testing, clarified environment variable requirements, and provided guidance for integrating with alternative LLM providers and optillm. --- CONTRIBUTING.md | 31 ++++++++++++++++++++++++++++--- README.md | 34 +++++++++++++++++++++++++++++++--- 2 files changed, 59 insertions(+), 6 deletions(-) diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index d2620160b..2492b74a2 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -6,8 +6,15 @@ Thank you for your interest in contributing to OpenEvolve! This document provide 1. Fork the repository 2. Clone your fork: `git clone https://github.com/codelion/openevolve.git` -3. Install the package in development mode: `pip install -e .` -4. Run the tests to ensure everything is working: `python -m unittest discover tests` +3. Install the package in development mode: `pip install -e ".[dev]"` +4. Set up environment for testing: + ```bash + # Unit tests don't require a real API key, but the environment variable must be set + export OPENAI_API_KEY=test-key-for-unit-tests + ``` +5. Run the tests to ensure everything is working: `python -m unittest discover tests` + +**Note**: The unit tests do not make actual API calls to OpenAI or any LLM provider. However, the `OPENAI_API_KEY` environment variable must be set to any non-empty value for the tests to run. You can use a placeholder value like `test-key-for-unit-tests`. ## Development Environment @@ -17,14 +24,32 @@ We recommend using a virtual environment for development: python -m venv env source env/bin/activate # On Windows: env\Scripts\activate pip install -e ".[dev]" + +# For running tests (no actual API calls are made) +export OPENAI_API_KEY=test-key-for-unit-tests + +# For testing with real LLMs during development +# export OPENAI_API_KEY=your-actual-api-key ``` +### LLM Configuration for Development + +When developing features that interact with LLMs: + +1. **Local Development**: Use a mock API key for unit tests +2. **Integration Testing**: Use your actual API key and configure `api_base` if using alternative providers +3. **Cost Management**: Consider using cheaper models or [optillm](https://github.com/codelion/optillm) for rate limiting during development + ## Pull Request Process 1. Create a new branch for your feature or bugfix: `git checkout -b feat-your-feature-name` 2. Make your changes 3. Add tests for your changes -4. Run the tests to make sure everything passes: `python -m unittest discover tests` +4. Run the tests to make sure everything passes: + ```bash + export OPENAI_API_KEY=test-key-for-unit-tests + python -m unittest discover tests + ``` 5. Commit your changes: `git commit -m "Add your descriptive commit message"` 6. Push to your fork: `git push origin feature/your-feature-name` 7. Submit a pull request to the main repository diff --git a/README.md b/README.md index ab7f46ff5..106e54ab1 100644 --- a/README.md +++ b/README.md @@ -42,13 +42,41 @@ pip install -e . ### Quick Start -We use the OpenAI SDK, so you can use any LLM or provider that supports an OpenAI compatible API. Just set the `OPENAI_API_KEY` environment variable -and update the `api_base` in config.yaml if you are using a provider other than OpenAI. For local models, you can use -an inference server like [optillm](https://github.com/codelion/optillm). +#### Setting up LLM Access + +OpenEvolve uses the OpenAI SDK, which means it works with any LLM provider that supports an OpenAI-compatible API: + +1. **Set the API Key**: Export the `OPENAI_API_KEY` environment variable: + ```bash + export OPENAI_API_KEY=your-api-key-here + ``` + +2. **Using Alternative LLM Providers**: + - For providers other than OpenAI (e.g., Anthropic, Cohere, local models), update the `api_base` in your config.yaml: + ```yaml + llm: + api_base: "https://your-provider-endpoint.com/v1" + ``` + +3. **Maximum Flexibility with optillm**: + - For advanced routing, rate limiting, or using multiple providers, we recommend [optillm](https://github.com/codelion/optillm) + - optillm acts as a proxy that can route requests to different LLMs based on your rules + - Simply point `api_base` to your optillm instance: + ```yaml + llm: + api_base: "http://localhost:8000/v1" + ``` + +This setup ensures OpenEvolve can work with any LLM provider - OpenAI, Anthropic, Google, Cohere, local models via Ollama/vLLM, or any OpenAI-compatible endpoint. ```python +import os from openevolve import OpenEvolve +# Ensure API key is set +if not os.environ.get("OPENAI_API_KEY"): + raise ValueError("Please set OPENAI_API_KEY environment variable") + # Initialize the system evolve = OpenEvolve( initial_program_path="path/to/initial_program.py", From 42acdc394b12f5173ef5d47db79a56401385f395 Mon Sep 17 00:00:00 2001 From: Asankhaya Sharma Date: Tue, 8 Jul 2025 14:02:49 +0800 Subject: [PATCH 2/5] Improve logging and shutdown handling in core modules Added logic to prevent duplicate log messages for LLM ensemble, OpenAI LLM, and prompt sampler initializations. Enhanced signal handling in the controller to allow graceful shutdown on first Ctrl+C and immediate exit on second. Evolution now checks for shutdown requests and exits cleanly if detected. Added a comprehensive examples/README.md to guide users in creating and configuring OpenEvolve examples. --- examples/README.md | 330 +++++++++++++++++++++++++++++++++++ openevolve/controller.py | 13 ++ openevolve/llm/ensemble.py | 15 +- openevolve/llm/openai.py | 8 +- openevolve/prompt/sampler.py | 5 +- 5 files changed, 363 insertions(+), 8 deletions(-) create mode 100644 examples/README.md diff --git a/examples/README.md b/examples/README.md new file mode 100644 index 000000000..058dc3064 --- /dev/null +++ b/examples/README.md @@ -0,0 +1,330 @@ +# OpenEvolve Examples + +This directory contains a collection of examples demonstrating how to use OpenEvolve for various tasks including optimization, algorithm discovery, and code evolution. Each example showcases different aspects of OpenEvolve's capabilities and provides templates for creating your own evolutionary coding projects. + +## Quick Start Template + +To create your own OpenEvolve example, you need three essential components: + +### 1. Initial Program (`initial_program.py`) + +Your initial program must contain exactly **one** `EVOLVE-BLOCK`: + +```python +# EVOLVE-BLOCK-START +def your_function(): + # Your initial implementation here + # This is the only section OpenEvolve will modify + pass +# EVOLVE-BLOCK-END + +# Helper functions and other code outside the evolve block +def helper_function(): + # This code won't be modified by OpenEvolve + pass +``` + +**Critical Requirements:** +- ✅ **Exactly one EVOLVE-BLOCK** (not multiple blocks) +- ✅ Use `# EVOLVE-BLOCK-START` and `# EVOLVE-BLOCK-END` markers +- ✅ Put only the code you want evolved inside the block +- ✅ Helper functions and imports go outside the block + +### 2. Evaluator (`evaluator.py`) + +Your evaluator must return a **dictionary** with specific metric names: + +```python +def evaluate(program_path: str) -> Dict: + """ + Evaluate the program and return metrics as a dictionary. + + CRITICAL: Must return a dictionary, not an EvaluationResult object. + """ + try: + # Import and run your program + # Calculate metrics + + return { + 'combined_score': 0.8, # PRIMARY METRIC for evolution (required) + 'accuracy': 0.9, # Your custom metrics + 'speed': 0.7, + 'robustness': 0.6, + # Add any other metrics you want to track + } + except Exception as e: + return { + 'combined_score': 0.0, # Always return combined_score, even on error + 'error': str(e) + } +``` + +**Critical Requirements:** +- ✅ **Return a dictionary**, not `EvaluationResult` object +- ✅ **Must include `'combined_score'`** - this is the primary metric OpenEvolve uses +- ✅ Higher `combined_score` values should indicate better programs +- ✅ Handle exceptions and return `combined_score: 0.0` on failure + +### 3. Configuration (`config.yaml`) + +Essential configuration structure: + +```yaml +# Evolution settings +max_iterations: 100 +checkpoint_interval: 10 +parallel_evaluations: 1 + +# LLM configuration +llm: + api_base: "https://api.openai.com/v1" # Or your LLM provider + models: + - name: "gpt-4" + weight: 1.0 + temperature: 0.7 + max_tokens: 4000 + timeout: 120 + +# Database configuration (MAP-Elites algorithm) +database: + population_size: 50 + num_islands: 3 + migration_interval: 10 + feature_dimensions: # MUST be a list, not an integer + - "score" + - "complexity" + +# Evaluation settings +evaluator: + timeout: 60 + max_retries: 3 + +# Prompt configuration +prompt: + system_message: | + You are an expert programmer. Your goal is to improve the code + in the EVOLVE-BLOCK to achieve better performance on the task. + + Focus on algorithmic improvements and code optimization. + num_top_programs: 3 + num_diverse_programs: 2 + +# Logging +log_level: "INFO" +``` + +**Critical Requirements:** +- ✅ **`feature_dimensions` must be a list** (e.g., `["score", "complexity"]`), not an integer +- ✅ Set appropriate timeouts for your use case +- ✅ Configure LLM settings for your provider +- ✅ Use meaningful `system_message` to guide evolution + +## Common Configuration Mistakes + +❌ **Wrong:** `feature_dimensions: 2` +✅ **Correct:** `feature_dimensions: ["score", "complexity"]` + +❌ **Wrong:** Returning `EvaluationResult` object +✅ **Correct:** Returning `{'combined_score': 0.8, ...}` dictionary + +❌ **Wrong:** Using `'total_score'` metric name +✅ **Correct:** Using `'combined_score'` metric name + +❌ **Wrong:** Multiple EVOLVE-BLOCK sections +✅ **Correct:** Exactly one EVOLVE-BLOCK section + +## Running Your Example + +```bash +# Basic run +python openevolve-run.py path/to/initial_program.py path/to/evaluator.py --config path/to/config.yaml --iterations 100 + +# Resume from checkpoint +python openevolve-run.py path/to/initial_program.py path/to/evaluator.py \ + --config path/to/config.yaml \ + --checkpoint path/to/checkpoint_directory \ + --iterations 50 + +# View results +python scripts/visualizer.py --path path/to/openevolve_output/checkpoints/checkpoint_100/ +``` + +## Advanced Configuration Options + +### LLM Ensemble (Multiple Models) +```yaml +llm: + models: + - name: "gpt-4" + weight: 0.7 + - name: "claude-3-sonnet" + weight: 0.3 +``` + +### Island Evolution (Population Diversity) +```yaml +database: + num_islands: 5 # More islands = more diversity + migration_interval: 15 # How often islands exchange programs + population_size: 100 # Larger population = more exploration +``` + +### Cascade Evaluation (Multi-Stage Testing) +```yaml +evaluator: + cascade_stages: + - stage1_timeout: 30 # Quick validation + - stage2_timeout: 120 # Full evaluation +``` + +## Example Directory + +### 🧮 Mathematical Optimization + +#### [Function Minimization](function_minimization/) +**Task:** Find global minimum of complex non-convex function +**Achievement:** Evolved from random search to sophisticated simulated annealing +**Key Lesson:** Shows automatic discovery of optimization algorithms +```bash +cd examples/function_minimization +python ../../openevolve-run.py initial_program.py evaluator.py --config config.yaml +``` + +#### [Circle Packing](circle_packing/) +**Task:** Pack 26 circles in unit square to maximize sum of radii +**Achievement:** Matched AlphaEvolve paper results (2.634/2.635) +**Key Lesson:** Demonstrates evolution from geometric heuristics to mathematical optimization +```bash +cd examples/circle_packing +python ../../openevolve-run.py initial_program.py evaluator.py --config config_phase_1.yaml +``` + +### 🔧 Algorithm Discovery + +#### [Signal Processing](signal_processing/) +**Task:** Design digital filters for audio processing +**Achievement:** Discovered novel filter designs with superior characteristics +**Key Lesson:** Shows evolution of domain-specific algorithms +```bash +cd examples/signal_processing +python ../../openevolve-run.py initial_program.py evaluator.py --config config.yaml +``` + +#### [Rust Adaptive Sort](rust_adaptive_sort/) +**Task:** Create sorting algorithm that adapts to data patterns +**Achievement:** Evolved sorting strategies beyond traditional algorithms +**Key Lesson:** Multi-language support (Rust) and algorithm adaptation +```bash +cd examples/rust_adaptive_sort +python ../../openevolve-run.py initial_program.rs evaluator.py --config config.yaml +``` + +### 🚀 Performance Optimization + +#### [MLX Metal Kernel Optimization](mlx_metal_kernel_opt/) +**Task:** Optimize attention mechanisms for Apple Silicon +**Achievement:** 2-3x speedup over baseline implementation +**Key Lesson:** Hardware-specific optimization and performance tuning +```bash +cd examples/mlx_metal_kernel_opt +python ../../openevolve-run.py initial_program.py evaluator.py --config config.yaml +``` + +### 🌐 Web and Data Processing + +#### [Web Scraper with optillm](web_scraper_optillm/) +**Task:** Extract API documentation from HTML pages +**Achievement:** Demonstrates optillm integration with readurls and MoA +**Key Lesson:** Shows integration with LLM proxy systems and test-time compute +```bash +cd examples/web_scraper_optillm +python ../../openevolve-run.py initial_program.py evaluator.py --config config.yaml +``` + +### 💻 Programming Challenges + +#### [Online Judge Programming](online_judge_programming/) +**Task:** Solve competitive programming problems +**Achievement:** Automated solution generation and submission +**Key Lesson:** Integration with external evaluation systems +```bash +cd examples/online_judge_programming +python ../../openevolve-run.py initial_program.py evaluator.py --config config.yaml +``` + +### 📊 Machine Learning and AI + +#### [LLM Prompt Optimization](llm_prompt_optimazation/) +**Task:** Evolve prompts for better LLM performance +**Achievement:** Discovered effective prompt engineering techniques +**Key Lesson:** Self-improving AI systems and prompt evolution +```bash +cd examples/llm_prompt_optimazation +python ../../openevolve-run.py initial_prompt.txt evaluator.py --config config.yaml +``` + +#### [LM-Eval Integration](lm_eval/) +**Task:** Integrate with language model evaluation harness +**Achievement:** Automated benchmark improvement +**Key Lesson:** Integration with standard ML evaluation frameworks + +#### [Symbolic Regression](symbolic_regression/) +**Task:** Discover mathematical expressions from data +**Achievement:** Automated discovery of scientific equations +**Key Lesson:** Scientific discovery and mathematical modeling + +### 🔬 Scientific Computing + +#### [R Robust Regression](r_robust_regression/) +**Task:** Develop robust statistical regression methods +**Achievement:** Novel statistical algorithms resistant to outliers +**Key Lesson:** Multi-language support (R) and statistical algorithm evolution +```bash +cd examples/r_robust_regression +python ../../openevolve-run.py initial_program.r evaluator.py --config config.yaml +``` + +### 🎯 Advanced Features + +#### [Circle Packing with Artifacts](circle_packing_with_artifacts/) +**Task:** Circle packing with detailed execution feedback +**Achievement:** Advanced debugging and artifact collection +**Key Lesson:** Using OpenEvolve's artifact system for detailed analysis +```bash +cd examples/circle_packing_with_artifacts +python ../../openevolve-run.py initial_program.py evaluator.py --config config_phase_1.yaml +``` + +## Best Practices + +### 🎯 Design Effective Evaluators +- Use meaningful metrics that reflect your goals +- Include both quality and efficiency measures +- Handle edge cases and errors gracefully +- Provide informative feedback for debugging + +### 🔧 Configuration Tuning +- Start with smaller populations and fewer iterations for testing +- Increase `num_islands` for more diverse exploration +- Adjust `temperature` based on how creative you want the LLM to be +- Set appropriate timeouts for your compute environment + +### 📈 Evolution Strategy +- Use multiple phases with different configurations +- Begin with exploration, then focus on exploitation +- Consider cascade evaluation for expensive tests +- Monitor progress and adjust configuration as needed + +### 🐛 Debugging +- Check logs in `openevolve_output/logs/` +- Examine failed programs in checkpoint directories +- Use artifacts to understand program behavior +- Test your evaluator independently before evolution + +## Getting Help + +- 📖 See individual example READMEs for detailed walkthroughs +- 🔍 Check the main [OpenEvolve documentation](../README.md) +- 💬 Open issues on the [GitHub repository](https://github.com/codelion/openevolve) + +Each example is self-contained and includes all necessary files to get started. Pick an example similar to your use case and adapt it to your specific problem! \ No newline at end of file diff --git a/openevolve/controller.py b/openevolve/controller.py index bf4ea683d..f9cefe956 100644 --- a/openevolve/controller.py +++ b/openevolve/controller.py @@ -265,6 +265,14 @@ async def run( def signal_handler(signum, frame): logger.info(f"Received signal {signum}, initiating graceful shutdown...") self.parallel_controller.request_shutdown() + + # Set up a secondary handler for immediate exit if user presses Ctrl+C again + def force_exit_handler(signum, frame): + logger.info("Force exit requested - terminating immediately") + import sys + sys.exit(0) + + signal.signal(signal.SIGINT, force_exit_handler) signal.signal(signal.SIGINT, signal_handler) signal.signal(signal.SIGTERM, signal_handler) @@ -432,6 +440,11 @@ async def _run_evolution_with_checkpoints( checkpoint_callback=self._save_checkpoint ) + # Check if shutdown was requested + if self.parallel_controller.shutdown_flag.is_set(): + logger.info("Evolution stopped due to shutdown request") + return + # Save final checkpoint if needed final_iteration = start_iteration + max_iterations - 1 if final_iteration > 0 and final_iteration % self.config.checkpoint_interval == 0: diff --git a/openevolve/llm/ensemble.py b/openevolve/llm/ensemble.py index b036138eb..fa11e28f8 100644 --- a/openevolve/llm/ensemble.py +++ b/openevolve/llm/ensemble.py @@ -35,13 +35,16 @@ def __init__(self, models_cfg: List[LLMModelConfig]): self.random_state.seed(models_cfg[0].random_seed) logger.debug(f"LLMEnsemble: Set random seed to {models_cfg[0].random_seed} for deterministic model selection") - logger.info( - f"Initialized LLM ensemble with models: " - + ", ".join( - f"{model.name} (weight: {weight:.2f})" - for model, weight in zip(models_cfg, self.weights) + # Only log if we have multiple models or this is the first ensemble + if len(models_cfg) > 1 or not hasattr(logger, '_ensemble_logged'): + logger.info( + f"Initialized LLM ensemble with models: " + + ", ".join( + f"{model.name} (weight: {weight:.2f})" + for model, weight in zip(models_cfg, self.weights) + ) ) - ) + logger._ensemble_logged = True async def generate(self, prompt: str, **kwargs) -> str: """Generate text using a randomly selected model based on weights""" diff --git a/openevolve/llm/openai.py b/openevolve/llm/openai.py index 463cb0eaa..705d0d4ac 100644 --- a/openevolve/llm/openai.py +++ b/openevolve/llm/openai.py @@ -40,7 +40,13 @@ def __init__( base_url=self.api_base, ) - logger.info(f"Initialized OpenAI LLM with model: {self.model}") + # Only log unique models to reduce duplication + if not hasattr(logger, '_initialized_models'): + logger._initialized_models = set() + + if self.model not in logger._initialized_models: + logger.info(f"Initialized OpenAI LLM with model: {self.model}") + logger._initialized_models.add(self.model) async def generate(self, prompt: str, **kwargs) -> str: """Generate text from a prompt""" diff --git a/openevolve/prompt/sampler.py b/openevolve/prompt/sampler.py index 186c9fe13..44b8acfb3 100644 --- a/openevolve/prompt/sampler.py +++ b/openevolve/prompt/sampler.py @@ -28,7 +28,10 @@ def __init__(self, config: PromptConfig): self.system_template_override = None self.user_template_override = None - logger.info("Initialized prompt sampler") + # Only log once to reduce duplication + if not hasattr(logger, '_prompt_sampler_logged'): + logger.info("Initialized prompt sampler") + logger._prompt_sampler_logged = True def set_templates( self, system_template: Optional[str] = None, user_template: Optional[str] = None From 1ed886796d527dabf3ca3835390091d792c554d1 Mon Sep 17 00:00:00 2001 From: Asankhaya Sharma Date: Tue, 8 Jul 2025 14:03:00 +0800 Subject: [PATCH 3/5] Add web scraper evolution example using optillm Introduces a new example in examples/web_scraper_optillm demonstrating web scraper evolution with optillm and OpenEvolve. Includes a detailed README, configuration for optillm with readurls and Mixture of Agents, an evaluator for robust function extraction, an initial BeautifulSoup-based scraper, and required dependencies. --- examples/web_scraper_optillm/README.md | 247 ++++++++++++ examples/web_scraper_optillm/config.yaml | 77 ++++ examples/web_scraper_optillm/evaluator.py | 366 ++++++++++++++++++ .../web_scraper_optillm/initial_program.py | 149 +++++++ examples/web_scraper_optillm/requirements.txt | 4 + 5 files changed, 843 insertions(+) create mode 100644 examples/web_scraper_optillm/README.md create mode 100644 examples/web_scraper_optillm/config.yaml create mode 100644 examples/web_scraper_optillm/evaluator.py create mode 100644 examples/web_scraper_optillm/initial_program.py create mode 100644 examples/web_scraper_optillm/requirements.txt diff --git a/examples/web_scraper_optillm/README.md b/examples/web_scraper_optillm/README.md new file mode 100644 index 000000000..514d176ef --- /dev/null +++ b/examples/web_scraper_optillm/README.md @@ -0,0 +1,247 @@ +# Web Scraper Evolution with optillm + +This example demonstrates how to use [optillm](https://github.com/codelion/optillm) with OpenEvolve to leverage test-time compute techniques for improved code evolution accuracy. We'll evolve a web scraper that extracts structured data from documentation pages, showcasing two key optillm features: + +1. **readurls plugin**: Automatically fetches webpage content when URLs are mentioned in prompts +2. **Inference optimization**: Uses techniques like Mixture of Agents (MoA) to improve response accuracy + +## Why optillm? + +Traditional LLM usage in code evolution has limitations: +- LLMs may not have knowledge of the latest library documentation +- Single LLM calls can produce inconsistent or incorrect code +- No ability to dynamically fetch relevant documentation during evolution + +optillm solves these problems by: +- **Dynamic Documentation Fetching**: The readurls plugin automatically fetches and includes webpage content when URLs are detected in prompts +- **Test-Time Compute**: Techniques like MoA generate multiple responses and synthesize the best solution +- **Flexible Routing**: Can route requests to different models based on requirements + +## Problem Description + +We're evolving a web scraper that extracts API documentation from Python library documentation pages. The scraper needs to: +1. Parse HTML documentation pages +2. Extract function signatures, descriptions, and parameters +3. Structure the data in a consistent format +4. Handle various documentation formats + +This is an ideal problem for optillm because: +- The LLM benefits from seeing actual documentation HTML structure +- Accuracy is crucial for correct parsing +- Different documentation sites have different formats + +## Architecture + +``` +┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ +│ OpenEvolve │────▶│ optillm │────▶│ Local LLM │ +│ │ │ (proxy:8000) │ │ (Qwen-0.5B) │ +└─────────────────┘ └─────────────────┘ └─────────────────┘ + │ + ├── readurls plugin + │ (fetches web content) + │ + └── MoA optimization + (improves accuracy) +``` + +## Setup Instructions + +### 1. Install and Configure optillm + +```bash +# Clone optillm +git clone https://github.com/codelion/optillm.git +cd optillm + +# Install dependencies +pip install -r requirements.txt + +# Start optillm proxy with local inference server (in a separate terminal) +export OPTILLM_API_KEY=optillm +python optillm.py --port 8000 +``` + +optillm will now be running on `http://localhost:8000` with its built-in local inference server. + +**Note for Non-Mac Users**: This example uses `Qwen/Qwen3-0.6B-MLX-bf16` which is optimized for Apple Silicon (M1/M2/M3 chips). If you're not using a Mac, you should: + +1. **For NVIDIA GPUs**: Use a CUDA-compatible model like: + - `Qwen/Qwen2.5-32B-Instruct` (best quality, high VRAM) + - `Qwen/Qwen2.5-14B-Instruct` (good balance) + - `meta-llama/Llama-3.1-8B-Instruct` (efficient option) + - `Qwen/Qwen2.5-7B-Instruct` (lower VRAM) + +2. **For CPU-only**: Use a smaller model like: + - `Qwen/Qwen2.5-7B-Instruct` (7B parameters) + - `meta-llama/Llama-3.2-3B-Instruct` (3B parameters) + - `Qwen/Qwen2.5-3B-Instruct` (3B parameters) + +3. **Update the config**: Replace the model names in `config.yaml` with your chosen model: + ```yaml + models: + - name: "readurls-your-chosen-model" + weight: 0.6 + - name: "moa&readurls-your-chosen-model" + weight: 0.4 + ``` + +### 2. Install Web Scraping Dependencies + +```bash +# Install required Python packages for the example +pip install -r examples/web_scraper_optillm/requirements.txt +``` + +### 3. Run the Evolution + +```bash +# From the openevolve root directory +export OPENAI_API_KEY=optillm +python openevolve-run.py examples/web_scraper_optillm/initial_program.py \ + examples/web_scraper_optillm/evaluator.py \ + --config examples/web_scraper_optillm/config.yaml \ + --iterations 100 +``` + +The configuration demonstrates both optillm capabilities: +- **Primary model (90%)**: `readurls-Qwen/Qwen3-0.6B-MLX-bf16` - fetches URLs mentioned in prompts +- **Secondary model (10%)**: `moa&readurls-Qwen/Qwen3-0.6B-MLX-bf16` - uses Mixture of Agents for improved accuracy + +## How It Works + +### 1. readurls Plugin + +When the evolution prompt contains URLs (e.g., "Parse the documentation at https://docs.python.org/3/library/json.html"), the readurls plugin: +1. Detects the URL in the prompt +2. Fetches the webpage content +3. Extracts text and table data +4. Appends it to the prompt as context + +This ensures the LLM has access to the latest documentation structure when generating code. + +### 2. Mixture of Agents (MoA) + +The MoA technique improves accuracy by: +1. Generating 3 different solutions to the problem +2. Having each "agent" critique all solutions +3. Synthesizing a final, improved solution based on the critiques + +This is particularly valuable for complex parsing logic where multiple approaches might be valid. + +### 3. Evolution Process + +1. **Initial Program**: A basic BeautifulSoup scraper that extracts simple text +2. **Evaluator**: Tests the scraper against real documentation pages, checking: + - Correct extraction of function names + - Accurate parameter parsing + - Proper handling of edge cases +3. **Evolution**: The LLM improves the scraper by: + - Fetching actual documentation HTML (via readurls) + - Generating multiple parsing strategies (via MoA) + - Learning from evaluation feedback + +## Example Evolution Trajectory + +**Generation 1** (Basic scraper): +```python +# Simple text extraction +soup = BeautifulSoup(html, 'html.parser') +text = soup.get_text() +``` + +**Generation 10** (With readurls context): +```python +# Targets specific documentation structures +functions = soup.find_all('dl', class_='function') +for func in functions: + name = func.find('dt').get('id') + desc = func.find('dd').text +``` + +**Generation 50** (With MoA refinement): +```python +# Robust parsing with error handling +def extract_function_docs(soup): + # Multiple strategies for different doc formats + strategies = [ + lambda: soup.select('dl.function dt'), + lambda: soup.select('.sig-name'), + lambda: soup.find_all('code', class_='descname') + ] + + for strategy in strategies: + try: + results = strategy() + if results: + return parse_results(results) + except: + continue +``` + +## Monitoring Progress + +Watch the evolution progress and see how optillm enhances the process: + +```bash +# View optillm logs (in the terminal running optillm) +# You'll see: +# - URLs being fetched by readurls +# - Multiple completions generated by MoA +# - Final synthesized responses + +# View OpenEvolve logs +tail -f examples/web_scraper_optillm/openevolve_output/evolution.log +``` + +## Results + +After evolution, you should see: +1. **Improved Accuracy**: The scraper correctly handles various documentation formats +2. **Better Error Handling**: Robust parsing that doesn't break on edge cases +3. **Optimized Performance**: Efficient extraction strategies + +Compare the checkpoints to see the evolution: +```bash +# Initial vs evolved program +diff examples/web_scraper_optillm/openevolve_output/checkpoints/checkpoint_10/best_program.py \ + examples/web_scraper_optillm/openevolve_output/checkpoints/checkpoint_100/best_program.py +``` + +## Key Insights + +1. **Documentation Access Matters**: The readurls plugin significantly improves the LLM's ability to generate correct parsing code by providing actual HTML structure + +2. **Test-Time Compute Works**: MoA's multiple generation and critique approach produces more robust solutions than single-shot generation + +3. **Powerful Local Models**: Large models like Qwen-32B with 4-bit quantization provide excellent results while being memory efficient when enhanced with optillm techniques + +## Customization + +You can experiment with different optillm features by modifying `config.yaml`: + +1. **Different Plugins**: Try the `executecode` plugin for runtime validation +2. **Other Techniques**: Experiment with `cot_reflection`, `rstar`, or `bon` +3. **Model Combinations**: Adjust weights or try different technique combinations + +Example custom configuration: +```yaml +llm: + models: + - name: "cot_reflection&readurls-Qwen/Qwen3-0.6B-MLX-bf16" + weight: 0.7 + - name: "moa&executecode-Qwen/Qwen3-0.6B-MLX-bf16" + weight: 0.3 +``` + +## Troubleshooting + +1. **optillm not responding**: Ensure it's running on port 8000 with `OPTILLM_API_KEY=optillm` +2. **Model not found**: Make sure optillm's local inference server is working (check optillm logs) +3. **Slow evolution**: MoA generates multiple completions, so it's slower but more accurate + +## Further Reading + +- [optillm Documentation](https://github.com/codelion/optillm) +- [OpenEvolve Configuration Guide](../../configs/default_config.yaml) +- [Mixture of Agents Paper](https://arxiv.org/abs/2406.04692) \ No newline at end of file diff --git a/examples/web_scraper_optillm/config.yaml b/examples/web_scraper_optillm/config.yaml new file mode 100644 index 000000000..f01a6d5c1 --- /dev/null +++ b/examples/web_scraper_optillm/config.yaml @@ -0,0 +1,77 @@ +# optillm configuration demonstrating readurls plugin and Mixture of Agents (MoA) +# This config shows both capabilities in a single configuration + +# Evolution settings +max_iterations: 100 +checkpoint_interval: 10 +parallel_evaluations: 1 + +# LLM configuration - using optillm proxy with different techniques +llm: + # Point to optillm proxy instead of direct LLM + api_base: "http://localhost:8000/v1" + + # Demonstrate both optillm capabilities in one config + models: + # Primary model: readurls plugin for URL fetching + - name: "readurls-Qwen/Qwen3-1.7B-MLX-bf16" + weight: 0.9 + + # Secondary model: MoA + readurls for improved accuracy + - name: "moa&readurls-Qwen/Qwen3-1.7B-MLX-bf16" + weight: 0.1 + + # Generation settings optimized for both techniques + temperature: 0.6 + max_tokens: 16000 # Higher for MoA's multiple generations and critiques + top_p: 0.95 + + # Request parameters optimized for local models + timeout: 600 # Extended timeout for local model generation (10 minutes) + retries: 3 + retry_delay: 5 + +# Database configuration +database: + population_size: 50 + num_islands: 3 + migration_interval: 10 + feature_dimensions: + - "score" + - "complexity" + +# Evaluation settings +evaluator: + timeout: 300 # Extended timeout for local model evaluation (5 minutes) + max_retries: 3 + +# Prompt configuration +prompt: + # Enhanced system message that leverages both readurls and MoA + system_message: | + You are an expert Python developer tasked with evolving a web scraper for API documentation. + + Your goal is to improve the scraper's ability to extract function signatures, parameters, and descriptions + from HTML documentation pages. The scraper should be robust and handle various documentation formats. + + Key considerations: + 1. Parse HTML efficiently using BeautifulSoup + 2. Extract function names, signatures, and descriptions accurately + 3. Handle different documentation structures (Python docs, library docs, etc.) + 4. Provide meaningful error handling + 5. Return structured data in the expected format + + When analyzing documentation structures, refer to actual documentation pages like: + - https://docs.python.org/3/library/json.html + - https://requests.readthedocs.io/en/latest/api/ + - https://www.crummy.com/software/BeautifulSoup/bs4/doc/ + + Focus on improving the EVOLVE-BLOCK sections to make the scraper more accurate and robust. + Consider multiple parsing strategies and implement the most effective approach. + + # Include more examples for better context + num_top_programs: 3 + num_diverse_programs: 2 + +# General settings +log_level: "INFO" \ No newline at end of file diff --git a/examples/web_scraper_optillm/evaluator.py b/examples/web_scraper_optillm/evaluator.py new file mode 100644 index 000000000..14098b31c --- /dev/null +++ b/examples/web_scraper_optillm/evaluator.py @@ -0,0 +1,366 @@ +""" +Evaluator for web scraper evolution. + +This evaluator tests the scraper against real documentation pages, +providing feedback on accuracy and robustness. It includes URLs +that will be fetched by optillm's readurls plugin during evolution. +""" + +import sys +import os +import traceback +from typing import Dict, List, Any + +# Add the program directory to the path +sys.path.insert(0, os.path.dirname(os.path.abspath(__file__))) + + + +def evaluate(program_path: str) -> Dict: + """ + Evaluate the web scraper program. + + Args: + program_path: Path to the program to evaluate + + Returns: + Dictionary with metrics and artifacts for OpenEvolve compatibility + """ + try: + # Import the program + sys.path.insert(0, os.path.dirname(program_path)) + program_name = os.path.basename(program_path).replace('.py', '') + program = __import__(program_name) + + # Test data: HTML content from various documentation sources + test_cases = get_test_cases() + + # Evaluate each test case + metrics = { + 'accuracy': 0.0, + 'completeness': 0.0, + 'robustness': 0.0, + 'parsing_errors': 0.0, + 'total_score': 0.0 + } + + artifacts = {} + + total_correct = 0 + total_expected = 0 + parsing_errors = 0 + + for i, test_case in enumerate(test_cases): + try: + # Run the scraper + docs = program.scrape_api_docs(test_case['html']) + + # Evaluate accuracy + correct, expected = evaluate_extraction(docs, test_case['expected']) + total_correct += correct + total_expected += expected + + # Test parameter extraction + for doc in docs: + if 'parameters' not in doc: + doc['parameters'] = program.extract_parameters(doc.get('signature', '')) + + # Test formatting + formatted = program.format_documentation(docs) + + # Store results for debugging + artifacts[f'test_case_{i}'] = { + 'expected_count': expected, + 'found_count': correct, + 'extracted_functions': [doc.get('name', 'unknown') for doc in docs], + 'formatted_length': len(formatted) + } + + except Exception as e: + parsing_errors += 1 + artifacts[f'test_case_{i}_error'] = str(e) + + # Calculate metrics + if total_expected > 0: + metrics['accuracy'] = total_correct / total_expected + + metrics['completeness'] = min(1.0, total_correct / 20) # Expect ~20 functions total + metrics['robustness'] = max(0.0, 1.0 - (parsing_errors / len(test_cases))) + metrics['parsing_errors'] = parsing_errors / len(test_cases) + + # Overall score - use 'combined_score' as primary metric for evolution + metrics['combined_score'] = ( + metrics['accuracy'] * 0.4 + + metrics['completeness'] * 0.3 + + metrics['robustness'] * 0.3 + ) + + # Add detailed feedback for the LLM + artifacts['evaluation_feedback'] = generate_feedback(metrics, artifacts) + + # Return dictionary format for OpenEvolve compatibility + return metrics + + except Exception as e: + return { + 'accuracy': 0.0, + 'completeness': 0.0, + 'robustness': 0.0, + 'parsing_errors': 1.0, + 'combined_score': 0.0, + 'error': str(e), + 'traceback': traceback.format_exc(), + 'stage': 'program_import' + } + + +def get_test_cases() -> List[Dict[str, Any]]: + """ + Get test cases with HTML content and expected results. + + These test cases include URLs that will be fetched by optillm's + readurls plugin during evolution, providing the LLM with actual + documentation structure. + + Returns: + List of test cases with HTML content and expected results + """ + return [ + { + 'name': 'json_module_docs', + 'html': ''' + + +
+

json — JSON encoder and decoder

+

Source: https://docs.python.org/3/library/json.html

+ +
+
+ dumps + ( + obj, + indent=None + ) +
+
+

Serialize obj to a JSON formatted string.

+
+
+ +
+
+ loads + ( + s + ) +
+
+

Deserialize s to a Python object.

+
+
+
+ + + ''', + 'expected': [ + {'name': 'dumps', 'params': ['obj', 'indent']}, + {'name': 'loads', 'params': ['s']} + ] + }, + { + 'name': 'requests_docs', + 'html': ''' + + +
+

Requests Documentation

+

Refer to https://requests.readthedocs.io/en/latest/api/ for full API

+ +
+

requests.get(url, params=None, **kwargs)

+

Sends a GET request.

+
+ +
+

requests.post(url, data=None, json=None, **kwargs)

+

Sends a POST request.

+
+
+ + + ''', + 'expected': [ + {'name': 'requests.get', 'params': ['url', 'params']}, + {'name': 'requests.post', 'params': ['url', 'data', 'json']} + ] + }, + { + 'name': 'beautifulsoup_docs', + 'html': ''' + + +
+

BeautifulSoup Documentation

+

Documentation at https://www.crummy.com/software/BeautifulSoup/bs4/doc/

+ + + BeautifulSoup(markup, parser) + +

Parse a string using a specified parser.

+ + + find(name, attrs=None) + +

Find the first matching tag.

+ + + find_all(name, attrs=None, limit=None) + +

Find all matching tags.

+
+ + + ''', + 'expected': [ + {'name': 'BeautifulSoup', 'params': ['markup', 'parser']}, + {'name': 'find', 'params': ['name', 'attrs']}, + {'name': 'find_all', 'params': ['name', 'attrs', 'limit']} + ] + }, + { + 'name': 'edge_case_malformed', + 'html': ''' + + +
+

Unusual Documentation Format

+

This tests robustness - check https://example.com/weird-api-docs

+ +
+                    function_name(arg1, arg2=default_value)
+                    Another description here
+                    
+ + + + + + +
another_func()Does something
+
+ + + ''', + 'expected': [ + {'name': 'function_name', 'params': ['arg1', 'arg2']}, + {'name': 'another_func', 'params': []} + ] + } + ] + + +def evaluate_extraction(docs: List[Dict[str, Any]], expected: List[Dict[str, Any]]) -> tuple[int, int]: + """ + Evaluate the accuracy of extracted documentation. + + Args: + docs: Extracted documentation + expected: Expected results + + Returns: + Tuple of (correct_count, expected_count) + """ + correct = 0 + expected_count = len(expected) + + for exp in expected: + # Check if we found this function + found = False + for doc in docs: + doc_name = doc.get('name', '').lower() + exp_name = exp['name'].lower() + + if exp_name in doc_name or doc_name in exp_name: + found = True + # Check parameter extraction + doc_params = doc.get('parameters', []) + exp_params = exp.get('params', []) + + if len(doc_params) >= len(exp_params): + correct += 1 + else: + correct += 0.5 # Partial credit + break + + if not found and docs: # Only penalize if we extracted something + pass # No additional penalty + + return correct, expected_count + + +def generate_feedback(metrics: Dict[str, float], artifacts: Dict[str, Any]) -> str: + """ + Generate detailed feedback for the LLM to improve the scraper. + + This feedback will be included in the evolution prompt to guide + the LLM toward better solutions. + + Args: + metrics: Evaluation metrics + artifacts: Evaluation artifacts + + Returns: + Detailed feedback string + """ + feedback = [] + + feedback.append("## Evaluation Feedback") + feedback.append(f"Overall Score: {metrics['combined_score']:.2f}/1.0") + feedback.append("") + + # Accuracy feedback + if metrics['accuracy'] < 0.5: + feedback.append("⚠️ **Low Accuracy**: The scraper is missing many expected functions.") + feedback.append("Consider improving the HTML parsing logic to handle different documentation formats.") + feedback.append("Look for patterns like
,
, and tags.") + elif metrics['accuracy'] < 0.8: + feedback.append("✅ **Good Accuracy**: Most functions are found, but some are missed.") + feedback.append("Fine-tune the extraction logic for edge cases.") + else: + feedback.append("🎉 **Excellent Accuracy**: Function extraction is working well!") + + feedback.append("") + + # Completeness feedback + if metrics['completeness'] < 0.5: + feedback.append("⚠️ **Low Completeness**: Not extracting enough functions overall.") + feedback.append("Increase the limit or improve the search scope.") + + # Robustness feedback + if metrics['robustness'] < 0.8: + feedback.append("⚠️ **Low Robustness**: The scraper fails on some HTML formats.") + feedback.append("Add try-catch blocks and handle different documentation structures.") + feedback.append("Consider multiple parsing strategies and fallback methods.") + + # Specific improvements + feedback.append("") + feedback.append("## Specific Improvements:") + + # Analyze test case results + for key, value in artifacts.items(): + if key.startswith('test_case_') and isinstance(value, dict): + if 'error' in key: + feedback.append(f"- Fix error in {key}: {value}") + elif value.get('found_count', 0) < value.get('expected_count', 0): + feedback.append(f"- Improve extraction for {key}: found {value.get('found_count', 0)}/{value.get('expected_count', 0)} functions") + + # Documentation URL hints (these will be fetched by readurls plugin) + feedback.append("") + feedback.append("## Documentation References:") + feedback.append("For improving parsing, refer to these documentation structures:") + feedback.append("- Python docs: https://docs.python.org/3/library/json.html") + feedback.append("- Requests docs: https://requests.readthedocs.io/en/latest/api/") + feedback.append("- BeautifulSoup docs: https://www.crummy.com/software/BeautifulSoup/bs4/doc/") + + return '\n'.join(feedback) \ No newline at end of file diff --git a/examples/web_scraper_optillm/initial_program.py b/examples/web_scraper_optillm/initial_program.py new file mode 100644 index 000000000..f0c97f5a2 --- /dev/null +++ b/examples/web_scraper_optillm/initial_program.py @@ -0,0 +1,149 @@ +""" +Web scraper for extracting API documentation from HTML pages. + +This initial implementation provides basic HTML parsing functionality +that will be evolved to handle complex documentation structures. + +The LLM will have access to actual documentation pages through optillm's +readurls plugin, allowing it to understand the specific HTML structure +and improve the parsing logic accordingly. +""" + +from bs4 import BeautifulSoup +from typing import Dict, List, Optional +import re + + +# EVOLVE-BLOCK-START +def scrape_api_docs(html_content: str) -> List[Dict[str, any]]: + """ + Extract API documentation from HTML content. + + Args: + html_content: Raw HTML content of a documentation page + + Returns: + List of dictionaries containing function documentation + """ + soup = BeautifulSoup(html_content, 'html.parser') + functions = [] + + # Try multiple approaches to find functions + # 1. Look for code blocks + code_blocks = soup.find_all('code') + for block in code_blocks: + text = block.get_text(strip=True) + if '(' in text and ')' in text: + functions.append({ + 'name': text.split('(')[0].strip(), + 'signature': text, + 'description': 'No description found', + 'parameters': extract_parameters(text) + }) + + # 2. Look for function signatures in headers (h3) + h3_blocks = soup.find_all('h3') + for block in h3_blocks: + text = block.get_text(strip=True) + if '(' in text and ')' in text: + functions.append({ + 'name': text.split('(')[0].strip(), + 'signature': text, + 'description': 'No description found', + 'parameters': extract_parameters(text) + }) + + # 3. Look for dt elements with sig class + dt_blocks = soup.find_all('dt', class_='sig') + for block in dt_blocks: + sig_name = block.find(class_='sig-name') + if sig_name: + name = sig_name.get_text(strip=True) + functions.append({ + 'name': name, + 'signature': block.get_text(strip=True), + 'description': 'No description found', + 'parameters': extract_parameters(block.get_text(strip=True)) + }) + + return functions[:20] # Return more functions + + +def extract_parameters(signature: str) -> List[Dict[str, str]]: + """ + Extract parameter information from a function signature. + + Args: + signature: Function signature string + + Returns: + List of parameter dictionaries + """ + params = [] + # Very basic parameter extraction + match = re.search(r'\((.*?)\)', signature) + if match: + param_string = match.group(1) + if param_string: + param_parts = param_string.split(',') + for part in param_parts: + part = part.strip() + if part: + params.append({ + 'name': part.split('=')[0].strip(), + 'type': 'unknown', + 'default': None, + 'description': '' + }) + + return params + + +def format_documentation(api_docs: List[Dict[str, any]]) -> str: + """ + Format extracted documentation into a readable string. + + Args: + api_docs: List of API documentation dictionaries + + Returns: + Formatted documentation string + """ + output = [] + for doc in api_docs: + output.append(f"Function: {doc['name']}") + output.append(f"Signature: {doc['signature']}") + output.append(f"Description: {doc['description']}") + + if doc.get('parameters'): + output.append("Parameters:") + for param in doc['parameters']: + output.append(f" - {param['name']}: {param.get('description', 'No description')}") + + output.append("") # Empty line between functions + + return '\n'.join(output) +# EVOLVE-BLOCK-END + + +# Example usage and test +if __name__ == "__main__": + # Sample HTML for testing basic functionality + sample_html = """ + + +
+ json.dumps(obj, indent=2) +

Serialize obj to a JSON formatted string.

+
+
+ json.loads(s) +

Deserialize s to a Python object.

+
+ + + """ + + docs = scrape_api_docs(sample_html) + print(format_documentation(docs)) + print(f"\nExtracted {len(docs)} functions") \ No newline at end of file diff --git a/examples/web_scraper_optillm/requirements.txt b/examples/web_scraper_optillm/requirements.txt new file mode 100644 index 000000000..17ed96e5a --- /dev/null +++ b/examples/web_scraper_optillm/requirements.txt @@ -0,0 +1,4 @@ +# Web scraping dependencies +beautifulsoup4>=4.12.0 +requests>=2.31.0 +lxml>=4.9.0 \ No newline at end of file From 6024935f075f3b66274b343e05658a2636ae014d Mon Sep 17 00:00:00 2001 From: Asankhaya Sharma Date: Tue, 8 Jul 2025 15:45:03 +0800 Subject: [PATCH 4/5] Update README.md --- examples/web_scraper_optillm/README.md | 161 ++++++++++++++++--------- 1 file changed, 105 insertions(+), 56 deletions(-) diff --git a/examples/web_scraper_optillm/README.md b/examples/web_scraper_optillm/README.md index 514d176ef..492b52b75 100644 --- a/examples/web_scraper_optillm/README.md +++ b/examples/web_scraper_optillm/README.md @@ -64,7 +64,7 @@ python optillm.py --port 8000 optillm will now be running on `http://localhost:8000` with its built-in local inference server. -**Note for Non-Mac Users**: This example uses `Qwen/Qwen3-0.6B-MLX-bf16` which is optimized for Apple Silicon (M1/M2/M3 chips). If you're not using a Mac, you should: +**Note for Non-Mac Users**: This example uses `Qwen/Qwen3-1.7B-MLX-bf16` which is optimized for Apple Silicon (M1/M2/M3 chips). If you're not using a Mac, you should: 1. **For NVIDIA GPUs**: Use a CUDA-compatible model like: - `Qwen/Qwen2.5-32B-Instruct` (best quality, high VRAM) @@ -81,9 +81,9 @@ optillm will now be running on `http://localhost:8000` with its built-in local i ```yaml models: - name: "readurls-your-chosen-model" - weight: 0.6 + weight: 0.9 - name: "moa&readurls-your-chosen-model" - weight: 0.4 + weight: 0.1 ``` ### 2. Install Web Scraping Dependencies @@ -105,8 +105,8 @@ python openevolve-run.py examples/web_scraper_optillm/initial_program.py \ ``` The configuration demonstrates both optillm capabilities: -- **Primary model (90%)**: `readurls-Qwen/Qwen3-0.6B-MLX-bf16` - fetches URLs mentioned in prompts -- **Secondary model (10%)**: `moa&readurls-Qwen/Qwen3-0.6B-MLX-bf16` - uses Mixture of Agents for improved accuracy +- **Primary model (90%)**: `readurls-Qwen/Qwen3-1.7B-MLX-bf16` - fetches URLs mentioned in prompts +- **Secondary model (10%)**: `moa&readurls-Qwen/Qwen3-1.7B-MLX-bf16` - uses Mixture of Agents for improved accuracy ## How It Works @@ -141,44 +141,54 @@ This is particularly valuable for complex parsing logic where multiple approache - Generating multiple parsing strategies (via MoA) - Learning from evaluation feedback -## Example Evolution Trajectory +## Actual Evolution Results -**Generation 1** (Basic scraper): -```python -# Simple text extraction -soup = BeautifulSoup(html, 'html.parser') -text = soup.get_text() -``` +Based on our evolution run, here's what we achieved: + +### Performance Metrics +- **Initial Score**: 0.6864 (72.2% accuracy, 32.5% completeness) +- **Final Score**: 0.7458 (83.3% accuracy, 37.5% completeness) +- **Improvement**: +8.6% overall performance (+11.1% accuracy) +- **Time to Best**: Found optimal solution by iteration 3 (within 10 minutes) + +### Key Evolution Improvements -**Generation 10** (With readurls context): +**Initial Program** (Basic approach): ```python -# Targets specific documentation structures -functions = soup.find_all('dl', class_='function') -for func in functions: - name = func.find('dt').get('id') - desc = func.find('dd').text +# Simple code block parsing +code_blocks = soup.find_all('code') +for block in code_blocks: + text = block.get_text(strip=True) + if '(' in text and ')' in text: + # Extract function info ``` -**Generation 50** (With MoA refinement): +**Evolved Program** (Sophisticated multi-strategy parsing): ```python -# Robust parsing with error handling -def extract_function_docs(soup): - # Multiple strategies for different doc formats - strategies = [ - lambda: soup.select('dl.function dt'), - lambda: soup.select('.sig-name'), - lambda: soup.find_all('code', class_='descname') - ] - - for strategy in strategies: - try: - results = strategy() - if results: - return parse_results(results) - except: - continue +# 1. Code blocks +code_blocks = soup.find_all('code') +# 2. Headers (h3) +h3_blocks = soup.find_all('h3') +# 3. Documentation signatures +dt_blocks = soup.find_all('dt', class_='sig') +# 4. Table-based documentation (NEW!) +table_blocks = soup.find_all('table') +for block in table_blocks: + rows = block.find_all('tr') + for row in rows: + cells = row.find_all('td') + if len(cells) >= 2: + signature = cells[0].get_text(strip=True) + description = cells[1].get_text(strip=True) + # Extract structured function data ``` +### What optillm Contributed + +1. **Early Discovery**: Found best solution by iteration 3, suggesting enhanced reasoning helped quickly identify effective parsing strategies +2. **Table Parsing Innovation**: The evolved program added sophisticated table parsing logic that wasn't in the initial version +3. **Robust Architecture**: Multiple fallback strategies ensure the scraper works across different documentation formats + ## Monitoring Progress Watch the evolution progress and see how optillm enhances the process: @@ -194,46 +204,85 @@ Watch the evolution progress and see how optillm enhances the process: tail -f examples/web_scraper_optillm/openevolve_output/evolution.log ``` -## Results +## Results Analysis + +After 100 iterations of evolution, here's what we achieved: + +### Quantitative Results +- **Accuracy**: 72.2% → 83.3% (+11.1% improvement) +- **Completeness**: 32.5% → 37.5% (+5% improvement) +- **Robustness**: 100% (maintained - no parsing errors) +- **Combined Score**: 0.6864 → 0.7458 (+8.6% improvement) -After evolution, you should see: -1. **Improved Accuracy**: The scraper correctly handles various documentation formats -2. **Better Error Handling**: Robust parsing that doesn't break on edge cases -3. **Optimized Performance**: Efficient extraction strategies +### Qualitative Improvements +1. **Multi-Strategy Parsing**: Added table-based extraction for broader documentation format support +2. **Robust Function Detection**: Improved pattern matching for function signatures +3. **Better Parameter Extraction**: Enhanced parameter parsing from various HTML structures +4. **Error Resilience**: Maintained 100% robustness with no parsing failures -Compare the checkpoints to see the evolution: +### Evolution Pattern +- **Early Success**: Best solution found by iteration 3 (within 10 minutes) +- **Plateau Effect**: Algorithm maintained optimal score from iteration 3-90 +- **Island Migration**: MAP-Elites explored alternatives but local optimum was strong + +Compare the evolution: ```bash -# Initial vs evolved program -diff examples/web_scraper_optillm/openevolve_output/checkpoints/checkpoint_10/best_program.py \ - examples/web_scraper_optillm/openevolve_output/checkpoints/checkpoint_100/best_program.py +# View the final evolved program +cat examples/web_scraper_optillm/openevolve_output/best/best_program.py + +# Compare initial vs final +diff examples/web_scraper_optillm/initial_program.py \ + examples/web_scraper_optillm/openevolve_output/best/best_program.py ``` -## Key Insights +## Key Insights from This Run + +1. **optillm Enhanced Early Discovery**: The best solution was found by iteration 3, suggesting optillm's test-time compute (MoA) and documentation access (readurls) helped quickly identify effective parsing strategies. + +2. **Smaller Models Can Excel**: The 1.7B Qwen model with optillm achieved significant improvements (+8.6%), proving that test-time compute can make smaller models highly effective. -1. **Documentation Access Matters**: The readurls plugin significantly improves the LLM's ability to generate correct parsing code by providing actual HTML structure +3. **Local Optimization Works**: Fast inference times (<100ms after initial) show that local models with optillm provide both efficiency and quality. -2. **Test-Time Compute Works**: MoA's multiple generation and critique approach produces more robust solutions than single-shot generation +4. **Pattern: Quick Discovery, Then Plateau**: Evolution found a strong local optimum quickly. This suggests the current test cases were well-solved by the table parsing innovation. -3. **Powerful Local Models**: Large models like Qwen-32B with 4-bit quantization provide excellent results while being memory efficient when enhanced with optillm techniques +5. **optillm Plugin Value**: The evolved program's sophisticated multi-strategy approach (especially table parsing) likely benefited from optillm's enhanced reasoning capabilities. -## Customization +## Available optillm Plugins and Techniques -You can experiment with different optillm features by modifying `config.yaml`: +optillm offers many plugins and optimization techniques. Here are the most useful for code evolution: -1. **Different Plugins**: Try the `executecode` plugin for runtime validation -2. **Other Techniques**: Experiment with `cot_reflection`, `rstar`, or `bon` -3. **Model Combinations**: Adjust weights or try different technique combinations +### Core Plugins +- **`readurls`**: Automatically fetches web content when URLs are detected in prompts +- **`executecode`**: Runs code and includes output in the response (great for validation) + +### Optimization Techniques +- **`moa`** (Mixture of Agents): Generates multiple responses, critiques them, and synthesizes the best +- **`cot_reflection`**: Uses chain-of-thought reasoning with self-reflection +- **`rstar`**: Advanced reasoning technique for complex problems +- **`bon`** (Best of N): Generates N responses and selects the best one +- **`z3_solver`**: Uses Z3 theorem prover for logical reasoning +- **`rto`** (Round Trip Optimization): Optimizes responses through iterative refinement + +### Combining Techniques +You can chain multiple techniques using `&`: -Example custom configuration: ```yaml llm: models: - - name: "cot_reflection&readurls-Qwen/Qwen3-0.6B-MLX-bf16" + # Use chain-of-thought + readurls for primary model + - name: "cot_reflection&readurls-Qwen/Qwen3-1.7B-MLX-bf16" weight: 0.7 - - name: "moa&executecode-Qwen/Qwen3-0.6B-MLX-bf16" + # Use MoA + code execution for secondary validation + - name: "moa&executecode-Qwen/Qwen3-1.7B-MLX-bf16" weight: 0.3 ``` +### Recommended Combinations for Code Evolution +1. **For Documentation-Heavy Tasks**: `cot_reflection&readurls` +2. **For Complex Logic**: `moa&executecode` +3. **For Mathematical Problems**: `cot_reflection&z3_solver` +4. **For Validation-Critical Code**: `bon&executecode` + ## Troubleshooting 1. **optillm not responding**: Ensure it's running on port 8000 with `OPTILLM_API_KEY=optillm` From bb9f9df4709bdceb00cdf448e3d6fdb0f456ba24 Mon Sep 17 00:00:00 2001 From: Asankhaya Sharma Date: Tue, 8 Jul 2025 15:50:59 +0800 Subject: [PATCH 5/5] Bump version to 0.0.13 Update project version from 0.0.12 to 0.0.13 in both pyproject.toml and setup.py for a new release. --- pyproject.toml | 2 +- setup.py | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/pyproject.toml b/pyproject.toml index b5de7e6b5..d73b29612 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta" [project] name = "openevolve" -version = "0.0.12" +version = "0.0.13" description = "Open-source implementation of AlphaEvolve" readme = "README.md" requires-python = ">=3.9" diff --git a/setup.py b/setup.py index 2d13b91f2..d3a277533 100644 --- a/setup.py +++ b/setup.py @@ -2,7 +2,7 @@ setup( name="openevolve", - version="0.0.12", + version="0.0.13", packages=find_packages(), include_package_data=True, )