# Introduction to Azure AI Evaluation
    
This notebook provides a hands-on introduction to evaluating AI models and agents using Azure AI Evaluation.

## Prerequisites
- Azure subscription with access to Azure AI Foundry
- Python environment with required packages installed
- Basic understanding of AI models and agents

## Learning Objectives
- Understand Azure AI Evaluation capabilities
- Learn to set up evaluation metrics
- Practice basic evaluation scenarios
- Analyze evaluation results


In [None]:
# Import required libraries
import os
from azure.identity import DefaultAzureCredential
from azure.ai.resources import AIProjectClient
from azure.ai.evaluation import EvaluationClient

# Initialize credentials and clients
credential = DefaultAzureCredential()
project_client = AIProjectClient(
    subscription_id=os.getenv("AZURE_SUBSCRIPTION_ID"),
    resource_group=os.getenv("AZURE_RESOURCE_GROUP"),
    credential=credential
)

evaluation_client = EvaluationClient(credential=credential)


## Basic Evaluation Setup

Let's start with a simple evaluation scenario for our customer service agent.

In [None]:
# Define evaluation metrics
evaluation_metrics = {
    "response_relevance": {
        "type": "relevance",
        "weight": 0.4
    },
    "response_accuracy": {
        "type": "exact_match",
        "weight": 0.3
    },
    "response_time": {
        "type": "latency",
        "weight": 0.3
    }
}

# Create test cases
test_cases = [
    {
        "input": "How do I reset my password?",
        "expected_output": "To reset your password, click the 'Forgot Password' link, enter your email, and follow the instructions sent to your inbox."
    },
    {
        "input": "What are the product features?",
        "expected_output": "Our product includes cloud storage, synchronization capabilities, sharing features, and administrative controls."
    }
]

## Running Evaluations

Now let's run some basic evaluations using our metrics and test cases.

In [None]:
async def run_evaluation():
    try:
        # Create evaluation run
        evaluation = await evaluation_client.create_evaluation(
            name="customer-service-basic-eval",
            metrics=evaluation_metrics,
            test_cases=test_cases
        )
        
        # Run evaluation
        results = await evaluation.run()
        
        # Print results
        print("Evaluation Results:")
        print(f"Overall Score: {results.overall_score}")
        print("
Metric Scores:")
        for metric, score in results.metric_scores.items():
            print(f"{metric}: {score}")
            
        return results
    except Exception as e:
        print(f"Evaluation error: {str(e)}")
        return None

# Run evaluation
await run_evaluation()

## Analyzing Results

Let's look at how to interpret and analyze the evaluation results.

In [None]:
def analyze_results(results):
    if not results:
        print("No results to analyze")
        return
    
    # Calculate performance metrics
    performance_summary = {
        "total_tests": len(test_cases),
        "successful_tests": sum(1 for score in results.test_scores if score > 0.8),
        "average_response_time": sum(results.response_times) / len(results.response_times)
    }
    
    # Print analysis
    print("Performance Summary:")
    print(f"Total Tests: {performance_summary['total_tests']}")
    print(f"Successful Tests: {performance_summary['successful_tests']}")
    print(f"Success Rate: {(performance_summary['successful_tests'] / performance_summary['total_tests']) * 100:.2f}%")
    print(f"Average Response Time: {performance_summary['average_response_time']:.2f}ms")

# Analyze the results
analyze_results(await run_evaluation())

## Error Handling and Best Practices

Important considerations when working with Azure AI Evaluation:

1. Always validate your metrics configuration
2. Use appropriate test case sizes
3. Monitor evaluation performance
4. Handle timeouts and errors gracefully
5. Store and version your evaluation results

In [None]:
def validate_evaluation_config(metrics, test_cases):
    '''Validate evaluation configuration.'''
    try:
        # Check metric weights sum to 1
        total_weight = sum(metric["weight"] for metric in metrics.values())
        assert abs(total_weight - 1.0) < 0.001, "Metric weights must sum to 1"
        
        # Validate test cases
        for test_case in test_cases:
            assert "input" in test_case, "Test case missing input"
            assert "expected_output" in test_case, "Test case missing expected output"
        
        print("✓ Evaluation configuration is valid")
        return True
    except Exception as e:
        print(f"Configuration error: {str(e)}")
        return False

# Validate our configuration
validate_evaluation_config(evaluation_metrics, test_cases)

## Next Steps

Now that you understand the basics of Azure AI Evaluation, you can:
1. Create more comprehensive evaluation metrics
2. Design larger test case sets
3. Implement continuous evaluation
4. Set up automated evaluation pipelines
5. Track evaluation results over time