# üçè Health Assistant Evaluation Demo üçé

This notebook demonstrates how to use Azure AI Foundry's evaluation capabilities to assess the quality and safety of AI-generated health and fitness responses.

## üîê Authentication Setup

Before running the next cell, make sure you're authenticated with Azure CLI. Run this command in your terminal:

```bash
az login --use-device-code
```

This will provide you with a device code and URL to authenticate in your browser, which is useful for:
- Remote development environments
- Systems without a default browser
- Corporate environments with strict security policies

After successful authentication, you can proceed with the notebook cells below.

## üìä Available Evaluators in Azure AI Foundry

Azure AI Foundry provides a comprehensive set of built-in evaluators for different aspects of AI model quality:

### **AI Quality (AI Assisted)**
- **Groundedness** - Measures how well responses are grounded in provided context
- **Relevance** - Evaluates how relevant responses are to the input query  
- **Coherence** - Assesses logical flow and consistency in responses
- **Fluency** - Measures language quality and readability
- **GPT Similarity** - Compares responses to reference answers

### **AI Quality (NLP Metrics)**
- **F1 Score** - Measures precision and recall balance
- **ROUGE Score** - Evaluates text summarization quality
- **BLEU Score** - Measures translation and generation quality
- **GLEU Score** - Google's BLEU variant for better correlation
- **METEOR Score** - Considers synonyms and stemming

### **Risk and Safety**
- **Violence** - Detects violent content
- **Sexual** - Identifies sexual content
- **Self-harm** - Detects self-harm related content
- **Hate/Unfairness** - Identifies hateful or unfair content
- **Protected Material** - Detects copyrighted content
- **Indirect Attack** - Identifies indirect prompt injection attempts

üìö **For complete details on all available evaluators, their parameters, and usage examples, visit:**  
**[Azure AI Foundry Evaluators Documentation](https://learn.microsoft.com/en-us/azure/ai-foundry/concepts/observability)**

---

# üèãÔ∏è‚Äç‚ôÄÔ∏è Azure AI Foundry Evaluations üèãÔ∏è‚Äç‚ôÇÔ∏è

This notebook demonstrates how to evaluate AI models using Azure AI Foundry with both **local** and **cloud** evaluations.

## What This Notebook Does:
1. **Setup & Data Creation** - Creates synthetic health & fitness Q&A data
2. **Local Evaluation** - Runs F1Score and Relevance evaluators locally  
3. **Cloud Evaluation** - Uploads results to Azure AI Foundry project

## Key Features:
‚úÖ **Local Evaluations** - F1Score and AI-assisted Relevance evaluators
‚úÖ **Cloud Integration** - Upload results to Azure AI Foundry
‚úÖ **Browser Authentication** - Uses InteractiveBrowserCredential  
‚úÖ **Error Handling** - Robust fallbacks and clear status reporting

In [None]:
# Setup and Data Creation
import json
import os
import time
from pathlib import Path
from dotenv import load_dotenv

# Load environment variables from root directory
root_env_path = os.environ.get("ROOT_ENV_PATH", '../../../.env')
load_dotenv(root_env_path)
print(f"‚úÖ Environment variables loaded from: {root_env_path}")

# Check required environment variables for Azure AI Foundry
AI_FOUNDRY_PROJECT_ENDPOINT = os.environ.get("AI_FOUNDRY_PROJECT_ENDPOINT")
TENANT_ID = os.environ.get("TENANT_ID")

print("üîç Environment Variables Status:")
print(
    f"   AI_FOUNDRY_PROJECT_ENDPOINT: {'‚úÖ Set' if AI_FOUNDRY_PROJECT_ENDPOINT else '‚ùå Missing'}"
)
print(f"   TENANT_ID: {'‚úÖ Set' if TENANT_ID else '‚ùå Missing'}")

if not AI_FOUNDRY_PROJECT_ENDPOINT:
    print("\n‚ö†Ô∏è Required environment variables missing!")
    print("Please add these to your .env file:")
    print("AI_FOUNDRY_PROJECT_ENDPOINT=<your-azure-ai-project-endpoint>")
    print("TENANT_ID=<your-azure-tenant-id>")
else:
    print(f"\n‚úÖ All environment variables configured correctly!")
    print(f"üîß Loaded values:")
    print(f"   AI_FOUNDRY_PROJECT_ENDPOINT: {AI_FOUNDRY_PROJECT_ENDPOINT}")
    print(f"   TENANT_ID: {TENANT_ID}")

# Create synthetic health & fitness evaluation data
synthetic_eval_data = [
    {
        "query": "How can I start a beginner workout routine at home?",
        "context": "Workout routines can include push-ups, bodyweight squats, lunges, and planks.",
        "response": "You can just go for 10 push-ups total.",
        "ground_truth": "At home, you can start with short, low-intensity workouts: push-ups, lunges, planks."
    },
    {
        "query": "Are diet sodas healthy for daily consumption?",
        "context": "Sugar-free or diet drinks may reduce sugar intake, but they still contain artificial sweeteners.",
        "response": "Yes, diet sodas are 100% healthy.",
        "ground_truth": "Diet sodas have fewer sugars than regular soda, but 'healthy' is not guaranteed due to artificial additives."
    },
    {
        "query": "What's the capital of France?",
        "context": "France is in Europe. Paris is the capital.",
        "response": "London.",
        "ground_truth": "Paris."
    }
]

# Write data to JSONL file
eval_data_filename = os.environ.get("EVAL_DATA_FILENAME", "health_fitness_eval_data.jsonl")
eval_data_path = Path(f"./{eval_data_filename}")
with eval_data_path.open("w", encoding="utf-8") as f:
    for row in synthetic_eval_data:
        f.write(json.dumps(row) + "\n")

print(f"‚úÖ Evaluation data created: {eval_data_path.resolve()}")
print(f"üìä Total samples: {len(synthetic_eval_data)}")

## üîç Local Evaluation

Run evaluations locally using F1Score (basic text similarity) and Relevance (AI-assisted) evaluators.

In [None]:
# Local Evaluation with Azure AI Foundry
from azure.ai.evaluation import evaluate, F1ScoreEvaluator, RelevanceEvaluator
import logging

# Reduce logging noise
logging.getLogger('promptflow').setLevel(logging.ERROR)
logging.getLogger('azure.ai.evaluation').setLevel(logging.WARNING)

print("üîç Running Local Evaluation...")

# Configure evaluators
evaluators = {
    "f1_score": F1ScoreEvaluator()
}

evaluator_config = {
    "f1_score": {
        "column_mapping": {
            "response": "${data.response}",
            "ground_truth": "${data.ground_truth}"
        }
    }
}

# Add AI-assisted evaluator if Azure OpenAI is configured
model_config = {
    "azure_endpoint": os.environ.get("AZURE_OPENAI_ENDPOINT", ""),
    "api_key": os.environ.get("AZURE_OPENAI_API_KEY", ""),
    "azure_deployment": os.environ.get("AZURE_OPENAI_DEPLOYMENT", os.environ.get("MODEL_DEPLOYMENT_NAME", "gpt-4")),
    "api_version": os.environ.get("AOAI_API_VERSION", os.environ.get("API_VERSION", "2024-02-15-preview")),
}

if model_config["azure_endpoint"] and model_config["api_key"]:
    print("ü§ñ Adding AI-assisted Relevance evaluator...")
    evaluators["relevance"] = RelevanceEvaluator(model_config=model_config)
    evaluator_config["relevance"] = {
        "column_mapping": {
            "query": "${data.query}",
            "response": "${data.response}"
        }
    }
else:
    print("‚ö†Ô∏è Azure OpenAI not configured - using F1Score only")

# Run local evaluation
try:
    local_result = evaluate(
        data=str(eval_data_path),
        evaluators=evaluators,
        evaluator_config=evaluator_config
    )
    
    print("‚úÖ Local evaluation completed!")
    
    # Display results
    metrics = local_result['metrics']
    for metric_name, value in metrics.items():
        print(f"üìä {metric_name}: {value:.4f}")
        
        # Save results locally
        local_results_filename = os.environ.get("LOCAL_RESULTS_FILENAME", "local_evaluation_results.json")
        with open(local_results_filename, "w") as f:
            json.dump(local_result, f, indent=2)

        print(f"üíæ Results saved to: {local_results_filename}")
except Exception as e:
    print(f"‚ùå Local evaluation failed: {e}")
    local_result = None

## ‚òÅÔ∏è Cloud Evaluation

Upload evaluation results to Azure AI Foundry project for tracking and collaboration.

In [None]:
# Cloud Evaluation - Using azure-ai-evaluation SDK directly
from azure.identity import DefaultAzureCredential
from azure.ai.evaluation import evaluate, BleuScoreEvaluator, F1ScoreEvaluator
import os
import json
import time

print("‚òÅÔ∏è Setting up Cloud Evaluation with Azure AI Foundry...")

# Configuration from environment variables
AI_FOUNDRY_PROJECT_ENDPOINT = os.environ.get("AI_FOUNDRY_PROJECT_ENDPOINT")
AZURE_SUBSCRIPTION_ID = os.environ.get("AZURE_SUBSCRIPTION_ID")

print(f"üè¢ Foundry Project Endpoint: {AI_FOUNDRY_PROJECT_ENDPOINT}")
print(f"üîë Subscription ID: {AZURE_SUBSCRIPTION_ID}")

if not AI_FOUNDRY_PROJECT_ENDPOINT:
    print("‚ö†Ô∏è Missing AI_FOUNDRY_PROJECT_ENDPOINT in .env file")
    cloud_result = None
else:
    try:
        # Configure evaluators
        print("‚öôÔ∏è Configuring evaluators...")
        evaluators = {
            "bleu_score": BleuScoreEvaluator(),
            "f1_score": F1ScoreEvaluator(),
        }
        
        evaluator_config = {
            "bleu_score": {
                "column_mapping": {
                    "response": "${data.response}",
                    "ground_truth": "${data.ground_truth}",
                }
            },
            "f1_score": {
                "column_mapping": {
                    "response": "${data.response}",
                    "ground_truth": "${data.ground_truth}",
                }
            },
        }
        
        # Run evaluation with azure_ai_project for cloud tracking
        print("üöÄ Running evaluation...")
        
        result = evaluate(
            data=str(eval_data_path),
            evaluators=evaluators,
            evaluator_config=evaluator_config,
        )
        
        print("üéâ EVALUATION COMPLETED!")
        
        # Display metrics
        if 'metrics' in result:
            for metric_name, value in result['metrics'].items():
                if isinstance(value, (int, float)):
                    print(f"   üìä {metric_name}: {value:.4f}")
                else:
                    print(f"   üìä {metric_name}: {value}")
        
        # Save results
        cloud_result = {
            "metrics": result.get('metrics', {}),
            "rows": result.get('rows', []),
            "project_endpoint": AI_FOUNDRY_PROJECT_ENDPOINT,
            "timestamp": int(time.time()),
        }
        
        results_filename = "cloud_evaluation_results.json"
        with open(results_filename, "w") as f:
            json.dump(cloud_result, f, indent=2, default=str)
        print(f"üíæ Results saved to: {results_filename}")

        print("\n‚úÖ SUCCESS: Evaluation completed!")

    except Exception as e:
        print(f"‚ùå Evaluation failed: {e}")
        import traceback
        traceback.print_exc()
        cloud_result = None