# Prompt Evolution with SCOPE

Welcome to this tutorial on **automatic prompt optimization**!

## What You'll Learn

In this notebook, you'll discover how AI systems can automatically improve their own prompts through observation and learning. We'll use a simple information extraction task to demonstrate:

- üìö How SCOPE (Self-Correcting Optimal Prompt Evolution) works
- üîÑ How prompts evolve automatically over time
- üìä How to measure improvement through iterative learning
- üéØ Real-world application with LangChain

## Context

Traditional AI systems use **static prompts** - they never change or improve. But what if your AI could learn from experience and automatically optimize its own instructions? That's exactly what SCOPE enables.

Think of it like a student who:
1. Completes a task
2. Reviews what went well and what didn't
3. Updates their approach for next time
4. Gets better with each attempt

SCOPE does this automatically, with no manual prompt engineering required!

## Setup

First, let's install the required packages and set up our environment.

In [None]:
%%capture --no-stderr
%pip install --quiet -U langchain_openai langchain_core scope-optimizer

In [None]:
import os, getpass

def _set_env(var: str):
    if not os.environ.get(var):
        os.environ[var] = getpass.getpass(f"{var}: ")

_set_env("OPENAI_API_KEY")

## The Concept: Before and After

Let's understand what we're building:

### Without SCOPE (Traditional Approach)
```
Static Prompt ‚Üí LLM ‚Üí Output
     ‚Üì
  Never changes!
```

### With SCOPE (Evolving Approach)
```
Initial Prompt ‚Üí LLM ‚Üí Output
      ‚Üì                  ‚Üì
      ‚Üì          SCOPE Observes
      ‚Üì                  ‚Üì
      ‚Üì          Learns Patterns
      ‚Üì                  ‚Üì
Improved Prompt ‚Üê Updates Rules
```

The magic happens in the learning loop - SCOPE observes what works and what doesn't, then automatically generates improvement rules!

## Our Simple Task: Information Extraction

We'll use information extraction as our example because:
- ‚úÖ It's easy to understand
- ‚úÖ Results are measurable
- ‚úÖ Improvements are visible

We'll ask the AI to extract information like emails, names, and phone numbers from text. As it completes tasks, SCOPE will learn how to do this better.

## Step 1: The Base Prompt (Before Learning)

Let's start with a simple, basic prompt. This is what the AI begins with - no optimization yet.

In [None]:
BASE_EXTRACTION_PROMPT = """You are an information extraction specialist.
Your task is to extract requested information from text accurately.

## Core Instructions:
- Extract only the requested information
- Be accurate and precise
- If information is missing, state "Not found"
- Provide clean, structured output
"""

print("üìã Base Prompt:")
print(BASE_EXTRACTION_PROMPT)

This prompt is good, but generic. It doesn't have specific strategies for handling edge cases, formatting output consistently, or dealing with ambiguous data. **This is where SCOPE comes in!**

## Step 2: Define Extraction Tasks

Let's create a variety of extraction tasks. Each one will teach SCOPE something different:

In [None]:
EXTRACTION_TASKS = [
    # Simple extraction
    {
        "instruction": "Extract the email address",
        "text": "Contact John Doe at john.doe@example.com for support"
    },
    
    # Multiple fields
    {
        "instruction": "Parse and extract: name, age, and city",
        "text": "Name: Jane Smith, Age: 28, City: Boston"
    },
    
    # Missing data (edge case)
    {
        "instruction": "Extract the phone number",
        "text": "You can email us at support@company.com"
    },
    
    # Malformed data
    {
        "instruction": "Extract name, age, city, and phone",
        "text": "name:John|age:|city:NYC|phone:555-0123"
    },
    
    # Multiple items
    {
        "instruction": "Extract all email addresses",
        "text": "Team: alice@test.com, Bob <bob@example.org>, charlie@mail.net"
    },
]

print(f"‚úÖ Created {len(EXTRACTION_TASKS)} diverse extraction tasks")
print("\nThese tasks cover: simple extraction, multiple fields, missing data, malformed data, and multiple items")

## Step 3: Set Up LangChain

Now let's create our LangChain chat model. We'll use GPT-4o for high-quality extractions.

In [None]:
from langchain_openai import ChatOpenAI
from langchain_core.messages import SystemMessage, HumanMessage

# Initialize the chat model
llm = ChatOpenAI(model="gpt-4o", temperature=0)

print("‚úÖ LangChain ChatOpenAI initialized")
print("   Model: gpt-4o")
print("   Temperature: 0 (deterministic)")

## Step 4: Initialize SCOPE

Here's where the magic begins! SCOPE will observe each task completion and learn improvement patterns.

In [None]:
from scope import SCOPEOptimizer
from scope.models import create_openai_model

# Create SCOPE's model (for analyzing and learning)
scope_model = create_openai_model(
    model="gpt-4o",
    api_key=os.environ["OPENAI_API_KEY"]
)

# Initialize SCOPE optimizer
optimizer = SCOPEOptimizer(
    synthesizer_model=scope_model,
    exp_path="./scope_data",  # Where to save learned rules
    enable_quality_analysis=True,  # Analyze quality after each task
    quality_analysis_frequency=1,  # Check every task
    synthesis_mode="efficiency",  # Fast learning mode
    store_history=True  # Keep learning history
)

print("‚úÖ SCOPE Optimizer initialized")
print("   üìä Quality analysis: Enabled")
print("   üíæ Learning history: Stored")
print("   ‚ö° Mode: Efficiency (fast learning)")

### What do these parameters mean?

- **enable_quality_analysis**: After each task, SCOPE analyzes if the output could be better
- **quality_analysis_frequency**: How often to check (1 = every task)
- **synthesis_mode**: "efficiency" learns quickly, "comprehensive" is more thorough
- **store_history**: Keeps a record of all learning events

## Step 5: First Run - Observe Learning

Let's run through our tasks and watch SCOPE learn in real-time!

In [None]:
import asyncio

async def extract_with_scope(instruction, text, task_id):
    """Extract information and let SCOPE observe."""
    
    # Get current prompt (starts with base, evolves over time)
    strategic_rules = optimizer.get_strategic_rules_for_agent("info_extractor")
    current_prompt = BASE_EXTRACTION_PROMPT
    if strategic_rules:
        current_prompt += f"\n\n## Strategic Guidelines (Learned):\n{strategic_rules}"
    
    # Create messages
    messages = [
        SystemMessage(content=current_prompt),
        HumanMessage(content=f"{instruction}\n\nText: {text}")
    ]
    
    # Get response from LLM
    response = llm.invoke(messages)
    output = response.content
    
    # Let SCOPE observe and learn
    result = await optimizer.on_step_complete(
        agent_name="info_extractor",
        agent_role="Information Extraction Specialist",
        task=f"{instruction} | Text: {text}",
        model_output=output,
        observations=f"Extracted from: '{text[:50]}...'",
        error=None,
        current_system_prompt=current_prompt,
        task_id=task_id
    )
    
    return output, result

# Run the tasks
print("üöÄ Starting extraction tasks...\n")
print("=" * 70)

learning_events = []

for i, task in enumerate(EXTRACTION_TASKS, 1):
    print(f"\nüìù Task {i}/{len(EXTRACTION_TASKS)}")
    print(f"Instruction: {task['instruction']}")
    print(f"Text: {task['text']}")
    
    # Run extraction
    output, learning_result = await extract_with_scope(
        task['instruction'],
        task['text'],
        f"task_{i}"
    )
    
    print(f"\n‚úì Output: {output}")
    
    # Check if SCOPE learned something
    if learning_result:
        guideline, guideline_type = learning_result
        learning_events.append({"task": i, "type": guideline_type, "rule": guideline})
        print(f"\nüìö SCOPE LEARNED ({guideline_type.upper()}):")
        print(f"   {guideline[:120]}...")
    
    print("\n" + "-" * 70)

print(f"\n‚úÖ Completed {len(EXTRACTION_TASKS)} tasks")
print(f"üìö SCOPE learning events: {len(learning_events)}")

In [None]:
print("üìä Learning Summary\n")
print("=" * 70)

if learning_events:
    print(f"\nTotal learning events: {len(learning_events)}\n")
    
    for event in learning_events:
        print(f"Task {event['task']} - {event['type'].upper()}:")
        print(f"  {event['rule'][:100]}...")
        print()
else:
    print("No learning events recorded.")

# Get the complete evolved prompt
strategic_rules = optimizer.get_strategic_rules_for_agent("info_extractor")
evolved_prompt = BASE_EXTRACTION_PROMPT
if strategic_rules:
    evolved_prompt += f"\n\n## Strategic Guidelines (Learned):\n{strategic_rules}"

print("\n" + "=" * 70)
print("EVOLVED PROMPT (After Learning)")
print("=" * 70)
print(evolved_prompt)

## Step 7: Compare Before and After

Now let's run the same tasks again with the evolved prompt and compare:

In [None]:
print("üîÑ Running tasks again with evolved prompt...\n")
print("=" * 70)

second_run_learning = []

for i, task in enumerate(EXTRACTION_TASKS, 1):
    print(f"\nüìù Task {i}/{len(EXTRACTION_TASKS)}")
    
    # Run extraction
    output, learning_result = await extract_with_scope(
        task['instruction'],
        task['text'],
        f"task_{i}_round2"
    )
    
    print(f"‚úì Output: {output}")
    
    if learning_result:
        second_run_learning.append(learning_result)
        print(f"üìö New learning event")
    else:
        print(f"‚úì No new learning needed (prompt already optimized!)")

print("\n" + "=" * 70)
print("COMPARISON")
print("=" * 70)
print(f"\n1st Run: {len(learning_events)} learning events")
print(f"2nd Run: {len(second_run_learning)} learning events")
print(f"\nImprovement: {len(learning_events) - len(second_run_learning)} fewer learning events needed!")
print("\nüí° This means the prompt is already better optimized.")

## Understanding the Results

What just happened?

### First Run (Baseline)
- Started with a basic prompt
- Completed all tasks
- SCOPE observed multiple opportunities for improvement
- Generated strategic rules

### Second Run (Optimized)
- Used the evolved prompt with learned rules
- Completed same tasks
- SCOPE found fewer (or no) issues
- Prompt is already better!

### Key Insight
**Fewer learning events = better prompts!**

When SCOPE finds fewer things to improve, it means the prompt is already doing a good job.

## Key Takeaways

1. **Automatic Optimization**: SCOPE improves prompts without manual engineering
2. **Observable Learning**: You can see what SCOPE learns in real-time
3. **Measurable Results**: Fewer learning events = better prompts
4. **Persistent Memory**: Rules are saved and reused across runs
5. **LangChain Integration**: Works seamlessly with existing LangChain code

## Try It Yourself!

Experiment with:
- Different extraction tasks
- More iterations (run 10-15 tasks)
- Other domains (classification, summarization, etc.)
- Different models

The more SCOPE observes, the better it gets!

## Next Steps

Want to learn more? Check out:
- **Complex Scenarios**: See SCOPE optimize multi-agent research workflows
- **Iterative Learning**: Run the same task 10-20 times and track improvement
- **Custom Integration**: Add SCOPE to your own LangChain applications