# DsPy Prompt Optimization with Azure AI Search
## Information Extraction and Summarization with MLFlow 3

This notebook demonstrates:
1. Azure AI Search integration with DsPy
2. Structured information extraction
3. Google-like summarization
4. Prompt optimization with MIPROv2
5. MLFlow 3 tracking and GENAI evaluation


In [None]:
# Install required packages
# Uncomment to install dependencies
# !uv pip install -U dspy-ai mlflow>=3.1.0 azure-search-documents azure-identity openai pydantic


In [None]:
import os
import sys
import dspy
import mlflow
import pandas as pd
from typing import List, Dict, Any

# Add src to path for module imports
sys.path.insert(0, os.path.abspath('../src'))

from azure_search_dspy import AzureSearchRM, InformationExtractionAgent, evaluate_quality, evaluate_agent

# Set MLflow experiment
mlflow.set_experiment("/Users/your_user/dspy_azure_search_extraction")
print("‚úì Imports complete")


## 1. Configuration

Set up your Azure AI Search and Azure OpenAI credentials.


In [None]:
# Azure AI Search Configuration
AZURE_SEARCH_ENDPOINT = os.getenv("AZURE_SEARCH_ENDPOINT", "https://your-search-service.search.windows.net")
AZURE_SEARCH_KEY = os.getenv("AZURE_SEARCH_KEY", "your-key")
AZURE_SEARCH_INDEX = os.getenv("AZURE_SEARCH_INDEX", "your-index-name")

# Azure OpenAI Configuration (for LLM)
AZURE_OPENAI_ENDPOINT = os.getenv("AZURE_OPENAI_ENDPOINT", "https://your-openai.openai.azure.com/")
AZURE_OPENAI_KEY = os.getenv("AZURE_OPENAI_KEY", "your-key")
AZURE_OPENAI_DEPLOYMENT = os.getenv("AZURE_OPENAI_DEPLOYMENT", "gpt-4")
AZURE_OPENAI_API_VERSION = "2024-02-15-preview"

print(f"Azure Search Index: {AZURE_SEARCH_INDEX}")
print(f"Azure OpenAI Deployment: {AZURE_OPENAI_DEPLOYMENT}")


## 2. Initialize Azure Search Retriever and LLM


In [None]:
# Initialize Azure Search retriever
azure_rm = AzureSearchRM(
    search_endpoint=AZURE_SEARCH_ENDPOINT,
    search_key=AZURE_SEARCH_KEY,
    index_name=AZURE_SEARCH_INDEX,
    k=5,
    content_field="content",  # Adjust to your index schema
    title_field="title",      # Adjust to your index schema
    use_semantic_search=True
)

# Configure Azure OpenAI LLM for DsPy
lm = dspy.AzureOpenAI(
    api_base=AZURE_OPENAI_ENDPOINT,
    api_key=AZURE_OPENAI_KEY,
    api_version=AZURE_OPENAI_API_VERSION,
    deployment_id=AZURE_OPENAI_DEPLOYMENT,
    model_type="chat",
    max_tokens=2000,
    temperature=0.1
)

dspy.settings.configure(lm=lm)
print("‚úì Azure Search retriever and LLM initialized")


## 3. Create Information Extraction Agent


In [None]:
# Initialize the information extraction agent
agent = InformationExtractionAgent(retriever=azure_rm)
print("‚úì Agent created")


## 4. Test the Agent

Let's test the agent with a sample query before optimization.


In [None]:
test_query = "What are the key features of Azure AI Search?"

with mlflow.start_run(run_name="baseline_test") as run:
    result = agent(test_query)
    
    print("=" * 80)
    print(f"Query: {result.query}")
    print(f"Rewritten Query: {result.rewritten_query}")
    print("\n" + "=" * 80)
    print("SUMMARY:")
    print(result.summary)
    print("\n" + "=" * 80)
    print("KEY POINTS:")
    print(result.key_points)
    print("\n" + "=" * 80)
    print("ENTITIES:")
    print(result.entities_json[:500])  # Truncate for display
    print("\n" + "=" * 80)
    print("SOURCES:")
    for i, source in enumerate(result.sources, 1):
        print(f"{i}. {source}")
    
    # Log to MLflow
    mlflow.log_param("query", test_query)
    mlflow.log_text(result.summary, "summary.txt")
    mlflow.log_text(result.key_points, "key_points.txt")


## 5. Create Evaluation Dataset

Create a curated dataset for evaluation. In production, this should be a comprehensive golden dataset.


In [None]:
# Create evaluation dataset
eval_dataset = [
    dspy.Example(
        query="What is Azure AI Search?",
        expected_topics=["search service", "cognitive search", "AI-powered"],
    ).with_inputs("query"),
    dspy.Example(
        query="How does semantic search work?",
        expected_topics=["semantic", "ranking", "understanding"],
    ).with_inputs("query"),
    dspy.Example(
        query="What are vector search capabilities?",
        expected_topics=["vector", "embeddings", "similarity"],
    ).with_inputs("query"),
    dspy.Example(
        query="How can I enrich my documents during indexing?",
        expected_topics=["enrichment", "skills", "cognitive"],
    ).with_inputs("query"),
    dspy.Example(
        query="What pricing models are available?",
        expected_topics=["pricing", "tiers", "cost"],
    ).with_inputs("query"),
]

# Split into train and test sets
train_size = int(len(eval_dataset) * 0.6)
trainset = eval_dataset[:train_size]
testset = eval_dataset[train_size:]

print(f"Training examples: {len(trainset)}")
print(f"Test examples: {len(testset)}")


## 6. Baseline Evaluation


In [None]:
print("Running baseline evaluation...")
with mlflow.start_run(run_name="baseline_evaluation") as run:
    baseline_results = evaluate_agent(agent, testset, "baseline")
    
    print(f"\nBaseline Score: {baseline_results['average_score']:.2%}")
    print(f"Individual scores: {[f'{s:.2%}' for s in baseline_results['scores']]}")
    
    # Log to MLflow
    mlflow.log_metric("baseline_score", baseline_results['average_score'])
    mlflow.log_metric("num_test_examples", baseline_results['num_examples'])


## 7. DsPy Prompt Optimization with MIPROv2

Now let's optimize the prompts using DsPy's MIPROv2 optimizer. This will automatically improve the prompts based on our evaluation metric.


In [None]:
from dspy.teleprompt import MIPROv2

print("Starting prompt optimization with MIPROv2...")
print("This may take several minutes...\n")

with mlflow.start_run(run_name="mipro_optimization") as run:
    # Initialize optimizer
    optimizer = MIPROv2(
        prompt_model=lm,
        task_model=lm,
        metric=evaluate_quality,
        num_candidates=5,  # Number of prompt candidates to try
        init_temperature=0.2
    )
    
    # Optimize the agent
    kwargs = dict(num_threads=2, display_progress=True, display_table=0)
    
    optimized_agent = optimizer.compile(
        student=agent,
        trainset=trainset,
        eval_kwargs=kwargs,
        requires_permission_to_run=False,
    )
    
    mlflow.log_param("optimizer", "MIPROv2")
    mlflow.log_param("num_candidates", 5)
    mlflow.log_param("train_size", len(trainset))
    
    print("\n‚úì Optimization complete!")


## 8. Evaluate Optimized Agent


In [None]:
print("Evaluating optimized agent...")
with mlflow.start_run(run_name="optimized_evaluation") as run:
    optimized_results = evaluate_agent(optimized_agent, testset, "optimized")
    
    print(f"\n{'='*80}")
    print("RESULTS COMPARISON")
    print(f"{'='*80}")
    print(f"Optimized Score: {optimized_results['average_score']:.2%}")
    print(f"Baseline Score:  {baseline_results['average_score']:.2%}")
    improvement = optimized_results['average_score'] - baseline_results['average_score']
    print(f"Improvement:     {improvement:+.2%}")
    print(f"{'='*80}")
    
    # Log to MLflow
    mlflow.log_metric("optimized_score", optimized_results['average_score'])
    mlflow.log_metric("improvement", improvement)
    mlflow.log_metric("num_test_examples", optimized_results['num_examples'])


## 9. MLflow GENAI Evaluation

Use MLflow 3's built-in GENAI evaluation metrics for additional insights.


In [None]:
def prepare_mlflow_eval_data(dataset, predictions):
    """Prepare data for MLflow evaluation"""
    eval_data = []
    for example, pred in zip(dataset, predictions):
        if pred is not None:
            eval_data.append({
                "request": example.query,
                "response": pred.summary,
                "retrieved_context": [{"content": pred.context[:1000]}],  # Truncate
                "key_points": pred.key_points,
                "entities": pred.entities_json[:500],  # Truncate
                "sources": str(pred.sources)
            })
    return pd.DataFrame(eval_data)

# Prepare evaluation data
baseline_eval_df = prepare_mlflow_eval_data(testset, baseline_results['predictions'])
optimized_eval_df = prepare_mlflow_eval_data(testset, optimized_results['predictions'])

print("Running MLflow GENAI evaluation...\n")

# Evaluate baseline
print("Baseline GENAI Metrics:")
with mlflow.start_run(run_name="mlflow_baseline_genai_eval") as run:
    baseline_mlflow = mlflow.evaluate(
        data=baseline_eval_df,
        model_type="question-answering",
        targets="request",
        predictions="response",
        extra_metrics=[
            mlflow.metrics.genai.answer_relevance(),
            mlflow.metrics.genai.answer_correctness(),
        ]
    )
    print(baseline_mlflow.metrics)

# Evaluate optimized
print("\nOptimized GENAI Metrics:")
with mlflow.start_run(run_name="mlflow_optimized_genai_eval") as run:
    optimized_mlflow = mlflow.evaluate(
        data=optimized_eval_df,
        model_type="question-answering",
        targets="request",
        predictions="response",
        extra_metrics=[
            mlflow.metrics.genai.answer_relevance(),
            mlflow.metrics.genai.answer_correctness(),
        ]
    )
    print(optimized_mlflow.metrics)


## 10. Save Optimized Agent


In [None]:
with mlflow.start_run(run_name="save_optimized_model") as run:
    # Save agent state
    optimized_agent.save("optimized_extraction_agent.json")
    
    # Log as artifact
    mlflow.log_artifact("optimized_extraction_agent.json")
    
    # Log configuration
    config = {
        "model": "information_extraction_agent",
        "optimizer": "MIPROv2",
        "baseline_score": baseline_results['average_score'],
        "optimized_score": optimized_results['average_score'],
        "improvement": improvement,
        "azure_search_index": AZURE_SEARCH_INDEX,
        "llm_model": AZURE_OPENAI_DEPLOYMENT
    }
    
    mlflow.log_dict(config, "model_config.json")
    
    print("\n‚úì Optimized agent saved!")
    print(f"MLflow Run ID: {run.info.run_id}")
    print(f"Artifact URI: {run.info.artifact_uri}")


## 11. Interactive Testing

Use the optimized agent interactively with your own queries.


In [None]:
import json

def interactive_query(query_text: str, use_optimized: bool = True):
    """Run an interactive query and display results"""
    selected_agent = optimized_agent if use_optimized else agent
    agent_name = "Optimized" if use_optimized else "Baseline"
    
    print(f"\n{'='*80}")
    print(f"Using {agent_name} Agent")
    print(f"{'='*80}")
    print(f"Query: {query_text}\n")
    
    result = selected_agent(query_text)
    
    print("üìù SUMMARY:")
    print(result.summary)
    
    print("\nüîë KEY POINTS:")
    print(result.key_points)
    
    print("\nüè∑Ô∏è EXTRACTED ENTITIES:")
    try:
        entities = json.loads(result.entities_json)
        print(json.dumps(entities, indent=2)[:500])  # Truncate
    except:
        print(result.entities_json[:500])
    
    print("\nüìö SOURCES:")
    for i, source in enumerate(result.sources, 1):
        print(f"  {i}. {source}")
    
    return result

# Try it with your own query!
# result = interactive_query("What are the security features of Azure AI Search?", use_optimized=True)


## Summary

This notebook demonstrated:

1. ‚úÖ **Azure AI Search Integration**: Custom DsPy retrieval module for Azure AI Search with semantic search
2. ‚úÖ **Structured Information Extraction**: Extracting entities, summaries, and key points using DsPy signatures
3. ‚úÖ **Google-like Summarization**: Concise, informative summaries with sources
4. ‚úÖ **DsPy Prompt Optimization**: Using MIPROv2 to automatically optimize prompts
5. ‚úÖ **MLFlow 3 Integration**: Comprehensive experiment tracking and GENAI evaluation metrics

### Next Steps:

- **Expand Dataset**: Add more diverse queries to your evaluation dataset
- **Custom Metrics**: Fine-tune evaluation metrics for your specific domain
- **Deploy**: Package the optimized agent for production deployment
- **Continuous Improvement**: Monitor production queries and retrain periodically
- **Advanced Features**: Add caching, async processing, and batch inference

### Key Benefits of this Approach:

- **Modular Design**: Clean separation of concerns with reusable components in `src/`
- **Testable**: Easy to write unit tests for each component
- **Optimizable**: Automatic prompt improvement with DsPy optimizers
- **Observable**: Full MLflow tracking for reproducibility
- **Scalable**: Ready for Databricks deployment with minimal changes
