## 📊 Available Evaluators in Azure AI Foundry

Azure AI Foundry provides a comprehensive set of built-in evaluators for different aspects of AI model quality:

### **AI Quality (AI Assisted)**
- **Groundedness** - Measures how well responses are grounded in provided context
- **Relevance** - Evaluates how relevant responses are to the input query  
- **Coherence** - Assesses logical flow and consistency in responses
- **Fluency** - Measures language quality and readability
- **GPT Similarity** - Compares responses to reference answers

### **AI Quality (NLP Metrics)**
- **F1 Score** - Measures precision and recall balance
- **ROUGE Score** - Evaluates text summarization quality
- **BLEU Score** - Measures translation and generation quality
- **GLEU Score** - Google's BLEU variant for better correlation
- **METEOR Score** - Considers synonyms and stemming

### **Risk and Safety**
- **Violence** - Detects violent content
- **Sexual** - Identifies sexual content
- **Self-harm** - Detects self-harm related content
- **Hate/Unfairness** - Identifies hateful or unfair content
- **Protected Material** - Detects copyrighted content
- **Indirect Attack** - Identifies indirect prompt injection attempts

📚 **For complete details on all available evaluators, their parameters, and usage examples, visit:**  
**[Azure AI Foundry Evaluators Documentation](https://learn.microsoft.com/en-us/azure/ai-foundry/concepts/observability)**

---

# 🏋️‍♀️ Azure AI Foundry Evaluations - Clean Version 🏋️‍♂️

This notebook demonstrates how to evaluate AI models using Azure AI Foundry with both **local** and **cloud** evaluations.

## What This Notebook Does:
1. **Setup & Data Creation** - Creates synthetic health & fitness Q&A data
2. **Local Evaluation** - Runs F1Score and Relevance evaluators locally  
3. **Cloud Evaluation** - Uploads results to Azure AI Foundry project

## Key Features:
✅ **Local Evaluations** - F1Score and AI-assisted Relevance evaluators
✅ **Cloud Integration** - Upload results to Azure AI Foundry
✅ **Browser Authentication** - Uses InteractiveBrowserCredential  
✅ **Error Handling** - Robust fallbacks and clear status reporting

In [None]:
# Setup and Data Creation
import json
import os
import time
from pathlib import Path
from dotenv import load_dotenv

# Create synthetic health & fitness evaluation data
synthetic_eval_data = [
    {
        "query": "How can I start a beginner workout routine at home?",
        "context": "Workout routines can include push-ups, bodyweight squats, lunges, and planks.",
        "response": "You can just go for 10 push-ups total.",
        "ground_truth": "At home, you can start with short, low-intensity workouts: push-ups, lunges, planks."
    },
    {
        "query": "Are diet sodas healthy for daily consumption?",
        "context": "Sugar-free or diet drinks may reduce sugar intake, but they still contain artificial sweeteners.",
        "response": "Yes, diet sodas are 100% healthy.",
        "ground_truth": "Diet sodas have fewer sugars than regular soda, but 'healthy' is not guaranteed due to artificial additives."
    },
    {
        "query": "What's the capital of France?",
        "context": "France is in Europe. Paris is the capital.",
        "response": "London.",
        "ground_truth": "Paris."
    }
]

# Write data to JSONL file
eval_data_path = Path("./health_fitness_eval_data.jsonl")
with eval_data_path.open("w", encoding="utf-8") as f:
    for row in synthetic_eval_data:
        f.write(json.dumps(row) + "\n")

print(f"✅ Evaluation data created: {eval_data_path.resolve()}")
print(f"📊 Total samples: {len(synthetic_eval_data)}")

# Load environment variables
current_dir = Path(os.getcwd())
root_dir = current_dir.parent.parent if current_dir.name == "3-quality_attributes" else current_dir.parent.parent.parent
load_dotenv(root_dir / '.env')
print("✅ Environment variables loaded")

## 🔍 Local Evaluation

Run evaluations locally using F1Score (basic text similarity) and Relevance (AI-assisted) evaluators.

In [None]:
# Local Evaluation with Azure AI Foundry
from azure.ai.evaluation import evaluate, F1ScoreEvaluator, RelevanceEvaluator
import logging

# Reduce logging noise
logging.getLogger('promptflow').setLevel(logging.ERROR)
logging.getLogger('azure.ai.evaluation').setLevel(logging.WARNING)

print("🔍 Running Local Evaluation...")

# Configure evaluators
evaluators = {
    "f1_score": F1ScoreEvaluator()
}

evaluator_config = {
    "f1_score": {
        "column_mapping": {
            "response": "${data.response}",
            "ground_truth": "${data.ground_truth}"
        }
    }
}

# Add AI-assisted evaluator if Azure OpenAI is configured
model_config = {
    "azure_endpoint": os.environ.get("AZURE_OPENAI_ENDPOINT", ""),
    "api_key": os.environ.get("AZURE_OPENAI_API_KEY", ""),
    "azure_deployment": os.environ.get("AZURE_OPENAI_DEPLOYMENT", "gpt-4"),
    "api_version": os.environ.get("AOAI_API_VERSION", "2024-02-15-preview"),
}

if model_config["azure_endpoint"] and model_config["api_key"]:
    print("🤖 Adding AI-assisted Relevance evaluator...")
    evaluators["relevance"] = RelevanceEvaluator(model_config=model_config)
    evaluator_config["relevance"] = {
        "column_mapping": {
            "query": "${data.query}",
            "response": "${data.response}"
        }
    }
else:
    print("⚠️ Azure OpenAI not configured - using F1Score only")

# Run local evaluation
try:
    local_result = evaluate(
        data=str(eval_data_path),
        evaluators=evaluators,
        evaluator_config=evaluator_config
    )
    
    print("✅ Local evaluation completed!")
    
    # Display results
    metrics = local_result['metrics']
    for metric_name, value in metrics.items():
        print(f"📊 {metric_name}: {value:.4f}")
        
    # Save results locally
    with open("local_evaluation_results.json", "w") as f:
        json.dump(local_result, f, indent=2)
    
    print("💾 Results saved to: local_evaluation_results.json")
    
except Exception as e:
    print(f"❌ Local evaluation failed: {e}")
    local_result = None

## ☁️ Cloud Evaluation

Upload evaluation results to Azure AI Foundry project for tracking and collaboration.

In [None]:
# Cloud Evaluation - Fixed using Official Microsoft Documentation
from azure.identity import DefaultAzureCredential
import os
import json
import time

print("☁️ Setting up Cloud Evaluation following official documentation...")

# Step 1: Install and import required packages
try:
    from azure.ai.projects import AIProjectClient
    from azure.ai.projects.models import (
        EvaluatorConfiguration,
        EvaluatorIds,
        Evaluation,
        InputDataset
    )
    print("✅ Azure AI Projects SDK found")
except ImportError:
    print("❌ Installing azure-ai-projects...")
    import subprocess
    import sys
    subprocess.check_call([sys.executable, "-m", "pip", "install", "azure-ai-projects>=1.0.0b4"])
    
    from azure.ai.projects import AIProjectClient
    from azure.ai.projects.models import (
        EvaluatorConfiguration,
        EvaluatorIds,
        Evaluation,
        InputDataset
    )
    print("✅ Packages installed successfully")

# Step 2: Configuration using official environment variable names
PROJECT_ENDPOINT = os.environ.get("AZURE_AI_PROJECT_ENDPOINT")
MODEL_ENDPOINT = os.environ.get("AZURE_OPENAI_ENDPOINT") 
MODEL_API_KEY = os.environ.get("AZURE_OPENAI_API_KEY")
MODEL_DEPLOYMENT_NAME = os.environ.get("AZURE_OPENAI_DEPLOYMENT")

# For cloud evaluation, convert Azure OpenAI endpoint to Azure AI services format
# Documentation requires: https://<account>.services.ai.azure.com format
if MODEL_ENDPOINT and "openai.azure.com" in MODEL_ENDPOINT:
    # Extract account name from https://account-name.openai.azure.com/
    account_name = MODEL_ENDPOINT.split("://")[1].split(".openai.azure.com")[0]
    MODEL_ENDPOINT = f"https://{account_name}.services.ai.azure.com"
    print(f"🔄 Converted endpoint to Azure AI services format")

print(f"🏢 Project Endpoint: {PROJECT_ENDPOINT}")
print(f"🤖 Model Deployment: {MODEL_DEPLOYMENT_NAME}")
print(f"🔗 Model Endpoint: {MODEL_ENDPOINT}")

if not PROJECT_ENDPOINT:
    print("⚠️ Missing AZURE_AI_PROJECT_ENDPOINT in .env file")
    cloud_result = None
else:
    try:
        # Step 3: Authentication using DefaultAzureCredential
        print("🔐 Setting up authentication...")
        credential = DefaultAzureCredential()
        
        # Step 4: Create AI Project Client
        print("🏭 Creating AIProjectClient...")
        project_client = AIProjectClient(
            endpoint=PROJECT_ENDPOINT,
            credential=credential,
        )
        print("✅ AIProjectClient created successfully!")
        
        # Step 5: Upload dataset
        print("\n📤 Uploading evaluation data...")
        dataset_name = f"health-fitness-eval-{int(time.time())}"
        dataset_version = "1.0"
        
        upload_response = project_client.datasets.upload_file(
            name=dataset_name,
            version=dataset_version,
            file_path=str(eval_data_path),
        )
        data_id = upload_response.id
        print(f"✅ Dataset uploaded with ID: {data_id}")
        
        # Step 6: Configure evaluators using correct format from documentation
        print("\n⚙️ Configuring evaluators...")
        evaluators = {}
        
        # BLEU Score - Works without AI model (mathematical evaluator)
        evaluators["bleu_score"] = EvaluatorConfiguration(
            id=EvaluatorIds.BLEU_SCORE.value,
            data_mapping={
                "prediction": "${data.response}",
                "reference": "${data.ground_truth}",
            },
        )
        print("✅ Added BLEU Score evaluator")
        
        # AI-assisted evaluators (if Azure OpenAI is configured)
        if MODEL_ENDPOINT and MODEL_API_KEY and MODEL_DEPLOYMENT_NAME:
            # Relevance evaluator
            evaluators["relevance"] = EvaluatorConfiguration(
                id=EvaluatorIds.RELEVANCE.value,
                init_params={"deployment_name": MODEL_DEPLOYMENT_NAME},
                data_mapping={
                    "query": "${data.query}",
                    "response": "${data.response}",
                },
            )
            print("✅ Added Relevance evaluator")
            
            # Coherence evaluator  
            evaluators["coherence"] = EvaluatorConfiguration(
                id=EvaluatorIds.COHERENCE.value,
                init_params={"deployment_name": MODEL_DEPLOYMENT_NAME},
                data_mapping={
                    "query": "${data.query}",
                    "response": "${data.response}",
                },
            )
            print("✅ Added Coherence evaluator")
        else:
            print("⚠️ Azure OpenAI not configured - using BLEU Score only")
        
        # Step 7: Create and submit evaluation
        print("\n🚀 Submitting cloud evaluation...")
        eval_display_name = f"Health-Fitness-Eval-{int(time.time())}"
        
        evaluation = Evaluation(
            display_name=eval_display_name,
            description="Health & Fitness Q&A Evaluation - Fixed Implementation",
            data=InputDataset(id=data_id),
            evaluators=evaluators,
        )
        
        # Add headers for AI evaluators as per documentation
        headers = {}
        if MODEL_ENDPOINT and MODEL_API_KEY:
            headers = {
                "model-endpoint": MODEL_ENDPOINT,
                "api-key": MODEL_API_KEY,
            }
            print("✅ Added model configuration headers")
        
        evaluation_response = project_client.evaluations.create(
            evaluation,
            headers=headers if headers else None,
        )
        
        print("🎉 CLOUD EVALUATION SUBMITTED!")
        print(f"   📋 Name: {evaluation_response.name}")
        print(f"   ⏳ Status: {evaluation_response.status}")
        
        # Brief status check
        print("\n⏳ Checking initial status...")
        time.sleep(2)
        
        try:
            status = project_client.evaluations.get(evaluation_response.name)
            print(f"📊 Current Status: {status.status}")
        except Exception:
            print("⚠️ Could not retrieve status update")
        
        print(f"🔗 View results: https://ai.azure.com/")
        
        # Save results
        cloud_result = {
            "evaluation_name": evaluation_response.name,
            "status": evaluation_response.status,
            "dataset_id": data_id,
            "evaluators": list(evaluators.keys()),
            "project_endpoint": PROJECT_ENDPOINT,
            "timestamp": int(time.time())
        }
        
        with open("cloud_evaluation_results.json", "w") as f:
            json.dump(cloud_result, f, indent=2)
        print("💾 Results saved to: cloud_evaluation_results.json")
        
        print("\n✅ SUCCESS: Cloud evaluation is running!")
        print("💡 Check Azure AI Studio portal for detailed results")
        
    except Exception as e:
        print(f"❌ Cloud evaluation failed: {e}")
        print(f"📋 Error type: {type(e).__name__}")
        
        # Enhanced error handling
        error_str = str(e).lower()
        if "401" in error_str or "unauthorized" in error_str:
            print("\n🔐 AUTHENTICATION ISSUE:")
            print("   - Run 'az login' in terminal")
            print("   - Ensure you're logged in to correct tenant")
        elif "403" in error_str or "forbidden" in error_str:
            print("\n🚫 PERMISSION ISSUE:")
            if "storage" in error_str:
                print("   - Storage account needs proper permissions")
                print("   - Add 'Storage Blob Data Contributor' role to project MSI")
            else:
                print("   - Check project permissions")
                print("   - Verify you have access to the Azure AI project")
        elif "404" in error_str or "not found" in error_str:
            print("\n🔍 RESOURCE NOT FOUND:")
            print("   - Verify project endpoint is correct")
            print("   - Check if project exists in Azure AI Foundry")
        else:
            print(f"\n💡 TROUBLESHOOTING:")
            print(f"   - Full error: {str(e)[:200]}...")
            print("   - Check network connectivity")
            print("   - Verify all environment variables")
        
        cloud_result = None