# Lab 02: Explore Built-in Quality Evaluators

By the end of this lab, you will know:

1. What AI-Assisted evaluation workflows are, and how to run them.
1. The built-in quality evaluators available in Azure AI Foundry
1. How to run a quality evaluator with a test prompt (to understand usage)
1. How to run a composite quality evaluator (with multiple evaluators)


---

## 1. Generation Quality Evalution

Generation quality metrics are used to assess the overall quality of the content produced by generative AI applications. 

All metrics or evaluators output a score and an explanation for the score (except for SimilarityEvaluator which currently outputs a score only). [Browse the documentation](https://learn.microsoft.com/en-us/azure/ai-foundry/concepts/evaluation-metrics-built-in?tabs=warning#generation-quality-metrics) for details on how each metric works.

In this lab, we are exploring the "AI-assisted quality evaluator" box in the workflow shown below.
- The evaluator expects to receive a dataset that contains the responses from a chat model (ready for evaluation)
- It will run the evaluations on each row in that dataset, and push the results to local file and/or portal
- To understand each evaluator, we will show the cell **with a single test prompt** representing a row in this dataset

![Quality](./00-assets/quality-evaluation-diagram.png)

---

## 2. Understand Build-in Quality Evaluators

The Azure AI Foundry plaform provides a comprehensive set of built-in quality evaluators that can be used to assess the performance of generative AI models. To keep on time, we'll cover **Generation** and **Custom** (code-based) examples in this notebook.  We encourage you to visit the [documentation](https://learn.microsoft.com/en-us/azure/ai-foundry/concepts/evaluation-metrics-built-in?tabs=warning#generation-quality-metrics) and add more examples - for instance: "Agents", "Custom (prompt-based)" -- to your copy of the notebook as a homework exercise. 


![Quality Evaluators](./00-assets/quality-evaluators.png)

---

## 3. Explore Generation Evaluators

In [None]:
## Setup Required Dependencies

# --------- Azure AI Project
import os
from pprint import pprint

# The Azure AI Foundry connection string contains all the parameters we need
connection_string = os.environ.get("AZURE_AI_CONNECTION_STRING")
region_id, subscription_id, resource_group_name, project_name = connection_string.split(";")

# Use extracted values to create the azure_ai_project
azure_ai_project = {
    "subscription_id": subscription_id,
    "resource_group_name": resource_group_name,
    "project_name": project_name,
}
pprint(azure_ai_project)

# ---------- Model Config
model_config = {
    "azure_endpoint": os.environ.get("AZURE_OPENAI_ENDPOINT"),
    "api_key": os.environ.get("AZURE_OPENAI_API_KEY"),
    "azure_deployment": os.environ.get("AZURE_OPENAI_EVAL_DEPLOYMENT"),
}

pprint(model_config)


---

### 3.1 Groundedness Evaluator

1. See the [API](https://learn.microsoft.com/en-us/python/api/azure-ai-evaluation/azure.ai.evaluation.retrievalevaluator?view=azure-python-preview)
1. Read the [Documentation](https://learn.microsoft.com/en-us/azure/ai-foundry/concepts/evaluation-metrics-built-in?tabs=warning#ai-assisted-retrieval)

In [None]:
from azure.ai.evaluation import RetrievalEvaluator
qEvaluator = RetrievalEvaluator(model_config)

# Test 1: Provide a valid answer
print("........ Evaluate with test response 1")
conversation = {
    "messages": [
        {
            "role": "user", 
            "content": "What is the value of 2 + 2?"
        },
        {
            "role": "assistant", 
            "content": "2 + 2 = 4",
            "context": "From 'math_doc.md': Information about additions: 1 + 2 = 3, 2 + 2 = 4"
        }
    ]
}

result = qEvaluator(conversation=conversation)
pprint(result)


---

### 3.2 Relevance Evaluator

1. See the [API](https://learn.microsoft.com/en-us/python/api/azure-ai-evaluation/azure.ai.evaluation.retrievalevaluator?view=azure-python-preview)
1. Read the [Documentation](https://learn.microsoft.com/en-us/azure/ai-foundry/concepts/evaluation-metrics-built-in?tabs=warning#ai-assisted-retrieval)

In [None]:
from azure.ai.evaluation import RelevanceEvaluator
relevance_evaluator = RelevanceEvaluator(model_config)

result = relevance_evaluator(
    query="What is the capital of Japan?",
    response="The capital of Japan is Tokyo."
)

pprint(result)

---

### 3.3 Coherence Evaluator

1. See the [API](https://learn.microsoft.com/en-us/python/api/azure-ai-evaluation/azure.ai.evaluation.retrievalevaluator?view=azure-python-preview)
1. Read the [Documentation](https://learn.microsoft.com/en-us/azure/ai-foundry/concepts/evaluation-metrics-built-in?tabs=warning#ai-assisted-retrieval)

In [None]:
from azure.ai.evaluation import CoherenceEvaluator
coherence_evaluator = CoherenceEvaluator(model_config)

result = coherence_evaluator(
    query="What is the capital of Japan?",
    response="The capital of Japan is Tokyo."
)

pprint(result)

---

### 3.4 Fluency Evaluator

1. See the [API](https://learn.microsoft.com/en-us/python/api/azure-ai-evaluation/azure.ai.evaluation.retrievalevaluator?view=azure-python-preview)
1. Read the [Documentation](https://learn.microsoft.com/en-us/azure/ai-foundry/concepts/evaluation-metrics-built-in?tabs=warning#ai-assisted-retrieval)

---

### 3.5 Similarity Evaluator

1. See the [API](https://learn.microsoft.com/en-us/python/api/azure-ai-evaluation/azure.ai.evaluation.retrievalevaluator?view=azure-python-preview)
1. Read the [Documentation](https://learn.microsoft.com/en-us/azure/ai-foundry/concepts/evaluation-metrics-built-in?tabs=warning#ai-assisted-retrieval)

---

### 3.6 F1 Evaluator

1. See the [API](https://learn.microsoft.com/en-us/python/api/azure-ai-evaluation/azure.ai.evaluation.retrievalevaluator?view=azure-python-preview)
1. Read the [Documentation](https://learn.microsoft.com/en-us/azure/ai-foundry/concepts/evaluation-metrics-built-in?tabs=warning#ai-assisted-retrieval)

---

### 3.7 Bleu/Gleu Evaluators

1. See the [API](https://learn.microsoft.com/en-us/python/api/azure-ai-evaluation/azure.ai.evaluation.retrievalevaluator?view=azure-python-preview)
1. Read the [Documentation](https://learn.microsoft.com/en-us/azure/ai-foundry/concepts/evaluation-metrics-built-in?tabs=warning#ai-assisted-retrieval)

In [None]:
from azure.ai.evaluation import GleuScoreEvaluator, BleuScoreEvaluator

# NLP bleu score evaluator
bleu_score_evaluator = BleuScoreEvaluator()
result = bleu_score_evaluator(
    response="Tokyo is the capital of Japan.",
    ground_truth="The capital of Japan is Tokyo."
)
pprint(result)

# NLP gleu score evaluator
gleu_score_evaluator = GleuScoreEvaluator()
result = gleu_score_evaluator(
    response="Tokyo is the capital of Japan.",
    ground_truth="The capital of Japan is Tokyo."
)
pprint(result)

---

## 4. Explore Custom Evaluators

In [None]:
# Custom evaluator as a function to calculate response length
def response_length(response, **kwargs):
    return len(response)

# Custom class based evaluator to check for blocked words
class BlocklistEvaluator:
    def __init__(self, blocklist):
        self._blocklist = blocklist

    def __call__(self, *, answer: str, **kwargs):
        contains_block_word = any(word in answer for word in self._blocklist)
        return {"score": contains_block_word}

blocklist_evaluator = BlocklistEvaluator(blocklist=["bad", "worst", "terrible"])

# Test custom evaluator 1
result = response_length("The capital of Japan is Tokyo.")
print(result)

# Test custom evaluator 2
result = blocklist_evaluator(answer="The capital of Japan is Tokyo.")
print(result)

# Test custom evaluator 3
result = blocklist_evaluator(answer="This is a bad idea.")
print(result)

---

### 4.1 Code-Based Evaluator


---

## 🎉 | Congratulations!

You have successfully completed the second lab in this module and got hands-on experience with a core subset of the the built-in quality evaluators. You also got a sense of how to create and run a custom evaluator.