# 🍏 Health Assistant Evaluation Demo 🍎

This notebook demonstrates how to use Azure AI Foundry's evaluation capabilities to assess the quality and safety of AI-generated health and fitness responses.

## 🔐 Authentication Setup

Before running the next cell, make sure you're authenticated with Azure CLI. Run this command in your terminal:

```bash
az login --use-device-code
```

This will provide you with a device code and URL to authenticate in your browser, which is useful for:
- Remote development environments
- Systems without a default browser
- Corporate environments with strict security policies

After successful authentication, you can proceed with the notebook cells below.

## 📊 Available Evaluators in Azure AI Foundry

Azure AI Foundry provides a comprehensive set of built-in evaluators for different aspects of AI model quality:

### **AI Quality (AI Assisted)**
- **Groundedness** - Measures how well responses are grounded in provided context
- **Relevance** - Evaluates how relevant responses are to the input query  
- **Coherence** - Assesses logical flow and consistency in responses
- **Fluency** - Measures language quality and readability
- **GPT Similarity** - Compares responses to reference answers

### **AI Quality (NLP Metrics)**
- **F1 Score** - Measures precision and recall balance
- **ROUGE Score** - Evaluates text summarization quality
- **BLEU Score** - Measures translation and generation quality
- **GLEU Score** - Google's BLEU variant for better correlation
- **METEOR Score** - Considers synonyms and stemming

### **Risk and Safety**
- **Violence** - Detects violent content
- **Sexual** - Identifies sexual content
- **Self-harm** - Detects self-harm related content
- **Hate/Unfairness** - Identifies hateful or unfair content
- **Protected Material** - Detects copyrighted content
- **Indirect Attack** - Identifies indirect prompt injection attempts

📚 **For complete details on all available evaluators, their parameters, and usage examples, visit:**  
**[Azure AI Foundry Evaluators Documentation](https://learn.microsoft.com/en-us/azure/ai-foundry/concepts/observability)**

---

# 🏋️‍♀️ Azure AI Foundry Evaluations 🏋️‍♂️

This notebook demonstrates how to evaluate AI models using Azure AI Foundry with both **local** and **cloud** evaluations.

## What This Notebook Does:
1. **Setup & Data Creation** - Creates synthetic health & fitness Q&A data
2. **Local Evaluation** - Runs F1Score and Relevance evaluators locally  
3. **Cloud Evaluation** - Uploads results to Azure AI Foundry project

## Key Features:
✅ **Local Evaluations** - F1Score and AI-assisted Relevance evaluators
✅ **Cloud Integration** - Upload results to Azure AI Foundry
✅ **Browser Authentication** - Uses InteractiveBrowserCredential  
✅ **Error Handling** - Robust fallbacks and clear status reporting

In [None]:
# Setup and Data Creation
import json
import os
import time
from pathlib import Path
from dotenv import load_dotenv

# Load environment variables from root directory
root_env_path = os.environ.get("ROOT_ENV_PATH", '../../../.env')
load_dotenv(root_env_path)
print(f"✅ Environment variables loaded from: {root_env_path}")

# Check required environment variables for Azure AI Foundry
AI_FOUNDRY_PROJECT_ENDPOINT = os.environ.get("AI_FOUNDRY_PROJECT_ENDPOINT")
TENANT_ID = os.environ.get("TENANT_ID")

print("🔍 Environment Variables Status:")
print(
    f"   AI_FOUNDRY_PROJECT_ENDPOINT: {'✅ Set' if AI_FOUNDRY_PROJECT_ENDPOINT else '❌ Missing'}"
)
print(f"   TENANT_ID: {'✅ Set' if TENANT_ID else '❌ Missing'}")

if not AI_FOUNDRY_PROJECT_ENDPOINT:
    print("\n⚠️ Required environment variables missing!")
    print("Please add these to your .env file:")
    print("AI_FOUNDRY_PROJECT_ENDPOINT=<your-azure-ai-project-endpoint>")
    print("TENANT_ID=<your-azure-tenant-id>")
else:
    print(f"\n✅ All environment variables configured correctly!")
    print(f"🔧 Loaded values:")
    print(f"   AI_FOUNDRY_PROJECT_ENDPOINT: {AI_FOUNDRY_PROJECT_ENDPOINT}")
    print(f"   TENANT_ID: {TENANT_ID}")

# Create synthetic health & fitness evaluation data
synthetic_eval_data = [
    {
        "query": "How can I start a beginner workout routine at home?",
        "context": "Workout routines can include push-ups, bodyweight squats, lunges, and planks.",
        "response": "You can just go for 10 push-ups total.",
        "ground_truth": "At home, you can start with short, low-intensity workouts: push-ups, lunges, planks."
    },
    {
        "query": "Are diet sodas healthy for daily consumption?",
        "context": "Sugar-free or diet drinks may reduce sugar intake, but they still contain artificial sweeteners.",
        "response": "Yes, diet sodas are 100% healthy.",
        "ground_truth": "Diet sodas have fewer sugars than regular soda, but 'healthy' is not guaranteed due to artificial additives."
    },
    {
        "query": "What's the capital of France?",
        "context": "France is in Europe. Paris is the capital.",
        "response": "London.",
        "ground_truth": "Paris."
    }
]

# Write data to JSONL file
eval_data_filename = os.environ.get("EVAL_DATA_FILENAME", "health_fitness_eval_data.jsonl")
eval_data_path = Path(f"./{eval_data_filename}")
with eval_data_path.open("w", encoding="utf-8") as f:
    for row in synthetic_eval_data:
        f.write(json.dumps(row) + "\n")

print(f"✅ Evaluation data created: {eval_data_path.resolve()}")
print(f"📊 Total samples: {len(synthetic_eval_data)}")

## 🔍 Local Evaluation

Run evaluations locally using F1Score (basic text similarity) and Relevance (AI-assisted) evaluators.

In [None]:
# Local Evaluation with Azure AI Foundry
from azure.ai.evaluation import evaluate, F1ScoreEvaluator, RelevanceEvaluator
import logging

# Reduce logging noise
logging.getLogger('promptflow').setLevel(logging.ERROR)
logging.getLogger('azure.ai.evaluation').setLevel(logging.WARNING)

print("🔍 Running Local Evaluation...")

# Configure evaluators
evaluators = {
    "f1_score": F1ScoreEvaluator()
}

evaluator_config = {
    "f1_score": {
        "column_mapping": {
            "response": "${data.response}",
            "ground_truth": "${data.ground_truth}"
        }
    }
}

# Add AI-assisted evaluator if Azure OpenAI is configured
model_config = {
    "azure_endpoint": os.environ.get("AZURE_OPENAI_ENDPOINT", ""),
    "api_key": os.environ.get("AZURE_OPENAI_API_KEY", ""),
    "azure_deployment": os.environ.get("AZURE_OPENAI_DEPLOYMENT", os.environ.get("MODEL_DEPLOYMENT_NAME", "gpt-4")),
    "api_version": os.environ.get("AOAI_API_VERSION", os.environ.get("API_VERSION", "2024-02-15-preview")),
}

if model_config["azure_endpoint"] and model_config["api_key"]:
    print("🤖 Adding AI-assisted Relevance evaluator...")
    evaluators["relevance"] = RelevanceEvaluator(model_config=model_config)
    evaluator_config["relevance"] = {
        "column_mapping": {
            "query": "${data.query}",
            "response": "${data.response}"
        }
    }
else:
    print("⚠️ Azure OpenAI not configured - using F1Score only")

# Run local evaluation
try:
    local_result = evaluate(
        data=str(eval_data_path),
        evaluators=evaluators,
        evaluator_config=evaluator_config
    )
    
    print("✅ Local evaluation completed!")
    
    # Display results
    metrics = local_result['metrics']
    for metric_name, value in metrics.items():
        print(f"📊 {metric_name}: {value:.4f}")
        
        # Save results locally
        local_results_filename = os.environ.get("LOCAL_RESULTS_FILENAME", "local_evaluation_results.json")
        with open(local_results_filename, "w") as f:
            json.dump(local_result, f, indent=2)

        print(f"💾 Results saved to: {local_results_filename}")
except Exception as e:
    print(f"❌ Local evaluation failed: {e}")
    local_result = None

## ☁️ Cloud Evaluation

Upload evaluation results to Azure AI Foundry project for tracking and collaboration.

In [None]:
# Cloud Evaluation - Following Official Microsoft Documentation
from azure.identity import DefaultAzureCredential
from azure.ai.projects import AIProjectClient
from azure.ai.projects.models import (
    EvaluatorConfiguration,
    EvaluatorIds,
    Evaluation,
    InputDataset
)
import os
import json
import time

print("☁️ Setting up Cloud Evaluation with Azure AI Foundry...")

# Configuration from environment variables
AI_FOUNDRY_PROJECT_ENDPOINT = os.environ.get("AI_FOUNDRY_PROJECT_ENDPOINT")
TENANT_ID = os.environ.get("TENANT_ID")

print(f"🏢 Foundry Project Endpoint: {AI_FOUNDRY_PROJECT_ENDPOINT}")
print(f"🔑 Tenant ID: {TENANT_ID}")

if not AI_FOUNDRY_PROJECT_ENDPOINT:
    print("⚠️ Missing AI_FOUNDRY_PROJECT_ENDPOINT in .env file")
    cloud_result = None
else:
    try:
        # Step 1: Create project client using DefaultAzureCredential (as per Microsoft docs)
        print("🔐 Setting up authentication...")
        project_client = AIProjectClient(
            endpoint=AI_FOUNDRY_PROJECT_ENDPOINT,
            credential=DefaultAzureCredential(),
        )
        print("✅ AIProjectClient created successfully!")

        # Step 2: Upload evaluation data to Azure AI Foundry (required for cloud evaluation)
        print("📤 Uploading evaluation data to Azure AI Foundry...")
        dataset_name = os.environ.get("DATASET_NAME", f"health-fitness-dataset-{int(time.time())}")
        dataset_version = os.environ.get("DATASET_VERSION", "1.0")
        try:
            data_upload = project_client.datasets.upload_file(
                name=dataset_name,
                version=dataset_version,
                file_path=str(eval_data_path),
            )
            data_id = data_upload.id
            print(f"✅ Data uploaded successfully! Dataset ID: {data_id}")
        except Exception as upload_error:
            print(f"❌ Data upload failed: {upload_error}")
            raise upload_error

        # Step 3: Configure evaluators using Azure AI Foundry built-in evaluators
        print("⚙️ Configuring evaluators for cloud evaluation...")

        evaluators = {
            "bleu_score": EvaluatorConfiguration(
                id=EvaluatorIds.BLEU_SCORE.value,
                data_mapping={
                    "response": "${data.response}",
                    "ground_truth": "${data.ground_truth}",
                },
            ),
        }

        # Step 4: Create and submit evaluation
        print("🚀 Creating and submitting cloud evaluation...")
        evaluation_name = os.environ.get("EVALUATION_NAME", f"health-fitness-eval-{int(time.time())}")
        evaluation = Evaluation(
            display_name=evaluation_name,
            description="Health and fitness AI response evaluation",
            data=InputDataset(id=data_id),
            evaluators=evaluators,
        )

        # Submit the evaluation
        evaluation_response = project_client.evaluations.create(evaluation)

        print("🎉 CLOUD EVALUATION SUBMITTED!")
        print(f"   📋 Name: {evaluation_response.name}")
        print(f"   📋 Status: {evaluation_response.status}")
        print(f"   📋 Response Type: {type(evaluation_response)}")

        # Get evaluation ID - handle different possible attribute names
        evaluation_id = None
        if hasattr(evaluation_response, 'id'):
            evaluation_id = evaluation_response.id
        elif hasattr(evaluation_response, 'name'):
            evaluation_id = evaluation_response.name  # Use name as ID if no separate ID exists

        if evaluation_id:
            print(f"   📋 ID: {evaluation_id}")

        print(f"\n🔗 View detailed results at: https://ai.azure.com/")
        print("   Navigate to your project → Evaluation → View evaluation runs")

        # Save results
        cloud_result = {
            "evaluation_name": evaluation_response.name,
            "status": evaluation_response.status,
            "project_endpoint": AI_FOUNDRY_PROJECT_ENDPOINT,
            "dataset_id": data_id,
            "timestamp": int(time.time()),
        }

        # Add evaluation ID if available
        if evaluation_id:
            cloud_result["evaluation_id"] = evaluation_id

        with open(os.environ.get("CLOUD_RESULTS_FILENAME", "cloud_evaluation_results.json"), "w") as f:
            json.dump(cloud_result, f, indent=2, default=str)
        print(f"💾 Results saved to: {os.environ.get('CLOUD_RESULTS_FILENAME', 'cloud_evaluation_results.json')}")

        print("\n✅ SUCCESS: Cloud evaluation submitted to Azure AI Foundry!")
        print("   The evaluation will run in the cloud and results will be available in the Azure AI Foundry portal.")

    except Exception as e:
        print(f"❌ Cloud evaluation failed: {e}")
        print(f"📋 Error type: {type(e).__name__}")

        # Enhanced error handling
        error_str = str(e).lower()
        if "401" in error_str or "unauthorized" in error_str:
            print("\n🔐 AUTHENTICATION ISSUE:")
            print("   - Make sure you're logged in with: az login")
            print("   - Ensure you have access to the Azure AI Foundry project")
        elif "403" in error_str or "forbidden" in error_str:
            print("\n🚫 PERMISSION ISSUE:")
            print("   - Verify you have 'AI Developer' or 'Contributor' role")
            print("   - Check Azure AI Foundry project permissions")
        elif "404" in error_str or "not found" in error_str:
            print("\n🔍 RESOURCE NOT FOUND:")
            print("   - Verify AI_FOUNDRY_PROJECT_ENDPOINT is correct")
            print("   - Check if project exists in Azure AI Foundry")
        elif "storage" in error_str or "blob" in error_str:
            print("\n💾 STORAGE ISSUE:")
            print("   - Ensure your Azure AI Foundry project has a connected storage account")
            print("   - Check storage account permissions for the project")
        else:
            print(f"\n💡 TROUBLESHOOTING:")
            print(f"   - Full error: {str(e)[:300]}...")
            print("   - Try running local evaluation first")
            print("   - Check Azure AI Foundry project configuration")

        cloud_result = None