algorithmicsuperintelligence · codelion · Jul 3, 2025 · Jul 3, 2025
diff --git a/examples/llm_prompt_optimazation/README.md b/examples/llm_prompt_optimazation/README.md
@@ -0,0 +1,184 @@
+# Evolving Better Prompts with OpenEvolve 🧠✨
+
+This example shows how to use **OpenEvolve** to automatically optimize prompts for **Large Language Models (LLMs)**. Whether you're working on classification, summarization, generation, or code tasks, OpenEvolve helps you find high-performing prompts using **evolutionary search**. For this example we'll use syntihetic data for sentiment analysis task, but you can adapt it to your own datasets and tasks.
+
+---
+
+## 🎯 What Is Prompt Optimization?
+
+Prompt engineering is key to getting reliable outputs from LLMs—but finding the right prompt manually can be slow and inconsistent.
+
+OpenEvolve automates this by:
+
+* Generating and evolving prompt variations
+* Testing them against your task and metrics
+* Selecting the best prompts through generations
+
+You start with a simple prompt and let OpenEvolve evolve it into something smarter and more effective.
+
+---
+
+## 🚀 Getting Started
+
+### 1. Install Dependencies
+
+```bash
+cd examples/llm_prompt_optimazation
+pip install -r requirements.txt
+sh run.sh
+```
+
+### 2. Add Your models
+
+1. Update your `config.yaml`:
+
+```yaml
+llm:
+  primary_model: "llm_name"
+  api_base: "llm_server_url"
+  api_key: "your_api_key_here"
+```
+
+2. Update your task-model in `evaluator.py`:
+
+```python
+TASK_MODEL_NAME = "task_llm_name"
+TASK_MODEL_URL = "task_llm_server_url"
+TASK_MODEL_API_KEY = "your_api_key_here"
+SAMPLE_SIZE = 25  # Number of samples to use for evaluation
+MAX_RETRIES = 3  # Number of retries for LLM calls
+
+```
+
+### 3. Run OpenEvolve
+
+```bash
+sh run.sh
+```
+
+---
+
+## 🔧 How to Adapt This Template
+
+### 1. Replace the Dataset
+
+Edit `data.json` to match your use case:
+
+```json
+[
+  {
+    "id": 1,
+    "input": "Your input here",
+    "expected_output": "Target output"
+  }
+]
+```
+
+### 2. Customize the Evaluator
+
+In `evaluator.py`, define how to evaluate a prompt:
+
+* Load your data
+* Call the LLM using the prompt
+* Measure output quality (accuracy, score, etc.)
+
+### 3. Write Your Initial Prompt
+
+Create a basic starting prompt in `initial_prompt.txt`:
+
+```
+# EVOLVE-BLOCK-START
+Your task prompt using {input_text} as a placeholder.
+# EVOLVE-BLOCK-END
+```
+
+This is the part OpenEvolve will improve over time.
+Good to add the name of your task in 'initial_prompt.txt' header to help the model understand the context.
+
+---
+
+## ⚙️ Key Config Options (`config.yaml`)
+
+```yaml
+llm:
+  primary_model: "gpt-4o"           # or your preferred model
+  secondary_model: "gpt-3.5"        # optional for diversity
+  temperature: 0.9
+  max_tokens: 2048
+
+database:
+  population_size: 40
+  max_iterations: 15
+  elite_selection_ratio: 0.25
+
+evaluator:
+  timeout: 45
+  parallel_evaluations: 3
+  use_llm_feedback: true
+```
+
+---
+
+## 📈 Example Output
+
+OpenEvolve evolves prompts like this:
+
+**Initial Prompt:**
+
+```
+Please analyze the sentiment of the following sentence and provide a sentiment score:
+
+"{input_text}"
+
+Rate the sentiment on a scale from 0.0 to 10.0.
+
+Score:
+```
+
+**Evolved Prompt:**
+
+```
+Please analyze the sentiment of the following sentence and provide a sentiment score using the following guidelines:
+- 0.0-2.9: Strongly negative sentiment (e.g., expresses anger, sadness, or despair)
+- 3.0-6.9: Neutral or mixed sentiment (e.g., factual statements, ambiguous content)
+- 7.0-10.0: Strongly positive sentiment (e.g., expresses joy, satisfaction, or hope)
+
+"{input_text}"
+
+Rate the sentiment on a scale from 0.0 to 10.0:
+- 0.0-2.9: Strongly negative (e.g., "This product is terrible")
+- 3.0-6.9: Neutral/mixed (e.g., "The sky is blue today")
+- 7.0-10.0: Strongly positive (e.g., "This is amazing!")
+
+Provide only the numeric score (e.g., "8.5") without any additional text:
+
+Score:
+```
+
+**Result**: Improved accuracy and output consistency.
+
+---
+
+## 🔍 Where to Use This
+
+OpenEvolve could be addapted on many tasks:
+
+* **Text Classification**: Spam detection, intent recognition
+* **Content Generation**: Social media posts, product descriptions
+* **Question Answering & Summarization**
+* **Code Tasks**: Review, generation, completion
+* **Structured Output**: JSON, table filling, data extraction
+
+---
+
+## ✅ Best Practices
+
+* Start with a basic but relevant prompt
+* Use good-quality data and clear evaluation metrics
+* Run multiple evolutions for better results
+* Validate on held-out data before deployment
+
+---
+
+**Ready to discover better prompts?**
+Use this template to evolve prompts for any LLM task—automatically.
diff --git a/examples/llm_prompt_optimazation/best_program.txt b/examples/llm_prompt_optimazation/best_program.txt
@@ -0,0 +1,19 @@
+"""Sentiment analysis prompt example for OpenEvolve"""
+
+# EVOLVE-BLOCK-START
+Please analyze the sentiment of the following sentence and provide a sentiment score using the following guidelines:
+- 0.0-2.9: Strongly negative sentiment (e.g., expresses anger, sadness, or despair)
+- 3.0-6.9: Neutral or mixed sentiment (e.g., factual statements, ambiguous content)
+- 7.0-10.0: Strongly positive sentiment (e.g., expresses joy, satisfaction, or hope)
+
+"{input_text}"
+
+Rate the sentiment on a scale from 0.0 to 10.0:
+- 0.0-2.9: Strongly negative (e.g., "This product is terrible")
+- 3.0-6.9: Neutral/mixed (e.g., "The sky is blue today")
+- 7.0-10.0: Strongly positive (e.g., "This is amazing!")
+
+Provide only the numeric score (e.g., "8.5") without any additional text:
+
+Score:
+# EVOLVE-BLOCK-END
diff --git a/examples/llm_prompt_optimazation/config.yaml b/examples/llm_prompt_optimazation/config.yaml
@@ -0,0 +1,58 @@
+# Configuration for prompt optimization
+max_iterations: 30
+checkpoint_interval: 10
+log_level: "INFO"
+
+# LLM configuration
+llm:
+  primary_model: "qwen3-32b-fp8"
+  api_base: "http://localhost:1234/v1"
+  api_key: "your_api_key_here"
+  temperature: 0.9
+  top_p: 0.95
+  max_tokens: 2048
+
+# Prompt configuration
+prompt:
+  system_message: |
+    You are an expert prompt engineer. Your task is to revise an existing prompt designed for large language models (LLMs), without being explicitly told what the task is.
+
+    Your improvements should:
+
+    * Infer the intended task and expected output format based on the structure and language of the original prompt.
+    * Clarify vague instructions, eliminate ambiguity, and improve overall interpretability for the LLM.
+    * Strengthen alignment between the prompt and the desired task outcome, ensuring more consistent and accurate responses.
+    * Improve robustness against edge cases or unclear input phrasing.
+    * If helpful, include formatting instructions, boundary conditions, or illustrative examples that reinforce the LLM's expected behavior.
+    * Avoid adding unnecessary verbosity or assumptions not grounded in the original prompt.
+
+    You will receive a prompt that uses the following structure:
+
+    ```python
+    prompt.format(input_text=some_text)
+    ```
+
+    The revised prompt should maintain the same input interface but be more effective, reliable, and production-ready for LLM use.
+
+    Return only the improved prompt text. Do not include explanations or additional comments. Your output should be a clean, high-quality replacement that enhances clarity, consistency, and LLM performance.
+
+  num_top_programs: 8
+  use_template_stochasticity: true
+
+# Database configuration
+database:
+  population_size: 40
+  archive_size: 20
+  num_islands: 3
+  elite_selection_ratio: 0.25
+  exploitation_ratio: 0.65
+
+# Evaluator configuration
+evaluator:
+  timeout: 45  
+  use_llm_feedback: true
+
+# Evolution settings
+diff_based_evolution: true
+allow_full_rewrites: true
+diversity_threshold: 0.1