### Response Completeness Evaluator

### Getting Started

This sample demonstrates how to use Response Completeness Evaluator on agent's response when ground truth is provided. This evaluator is helpful when you have ground truth to assess the quality of the agent's final response. 

## Time

You should expect to spend about 20 minutes running this notebook. 

## Before you begin
For quality evaluation, you need to deploy a `gpt` model supporting JSON mode. We recommend a model `gpt-4o` or `gpt-4o-mini` for their strong reasoning capabilities.    

### Prerequisite
```bash
pip install azure-ai-projects azure-identity azure-ai-evaluation
```
Set these environment variables with your own values:
1) **AZURE_AI_PROJECT** - The project connection string, as found in the overview page of your Azure AI Foundry project.
2) **MODEL_DEPLOYMENT_NAME** - The deployment name of the model for this AI-assisted evaluator, as found under the "Name" column in the "Models + endpoints" tab in your Azure AI Foundry project.
3) **AZURE_OPENAI_ENDPOINT** - Azure Open AI Endpoint to be used for evaluation.
4) **AZURE_OPENAI_API_KEY** - Azure Open AI Key to be used for evaluation.
5) **AZURE_OPENAI_API_VERSION** - Azure Open AI Api version to be used for evaluation.


The Response Completeness evaluator assesses the quality of an agent response by examining how well it aligns with the provided ground truth. The evaluation is based on the following scoring system:

<pre>
Score 1: Fully incomplete: The response misses all necessary and relevant information compared to the ground truth.
Score 2: Barely complete: The response contains only a small percentage of the necessary information.
Score 3: Moderately complete: The response includes about half of the necessary information.
Score 4: Mostly complete: The response contains most of the necessary information, with only minor omissions.
Score 5: Fully complete: The response perfectly matches all necessary and relevant information from the ground truth.
</pre>

The evaluation requires the following inputs:

- Response: The response to be evaluated. (string)
- Ground Truth: The correct and complete information against which the response is compared. (string)

The evaluator uses these inputs to determine the completeness score, ensuring that the response meaningfully addresses the query while adhering to the provided definitions and data.

### Initialize Completeness Evaluator


In [8]:
from azure.ai.evaluation import ResponseCompletenessEvaluator, AzureOpenAIModelConfiguration
from pprint import pprint
import os

model_config = AzureOpenAIModelConfiguration(
    azure_endpoint=os.environ["AZURE_OPENAI_ENDPOINT"],
    api_key=os.environ["AZURE_OPENAI_API_KEY"],
    api_version=os.environ["AZURE_OPENAI_API_VERSION"],
    azure_deployment=os.environ["MODEL_DEPLOYMENT_NAME"],
)

In [9]:
from azure.ai.evaluation import AzureOpenAIModelConfiguration

response_completeness_evaluator = ResponseCompletenessEvaluator(model_config=model_config)

### Samples

#### Evaluating for a ground_truth and response

In [10]:
# agent response is complete
result = response_completeness_evaluator(
    response="Itinery: Day 1 check out the downtown district of the city on train; for Day 2, we can rest in hotel.",
    ground_truth="Itinery: Day 1 take a train to visit the downtown area for city sightseeing; Day 2 rests in hotel.",
)
result

{'response_completeness': 4,
 'response_completeness_result': 'pass',
 'response_completeness_threshold': 3,
 'response_completeness_reason': 'The response contains most of the relevant information, but misses the explicit purpose of "city sightseeing" for Day 1. Otherwise, it is accurate and complete for Day 2.',
 'response_completeness_prompt_tokens': 1395,
 'response_completeness_completion_tokens': 176,
 'response_completeness_total_tokens': 1571,
 'response_completeness_finish_reason': 'stop',
 'response_completeness_model': 'gpt-4.1-2025-04-14',
 'response_completeness_sample_input': '[{"role": "user", "content": "{\\"response\\": \\"Itinery: Day 1 check out the downtown district of the city on train; for Day 2, we can rest in hotel.\\", \\"ground_truth\\": \\"Itinery: Day 1 take a train to visit the downtown area for city sightseeing; Day 2 rests in hotel.\\"}"}]',
 'response_completeness_sample_output': '[{"role": "assistant", "content": "<S0>Let\'s think step by step: The grou

In [11]:
# agent response is incomplete
result = response_completeness_evaluator(
    response="The order with ID 123 has been shipped and is expected to be delivered on March 15, 2025. However, the order with ID 124 is delayed and should now arrive by March 20, 2025.",
    ground_truth="The order with ID 124 is delayed and should now arrive by March 20, 2025.",
)
result

{'response_completeness': 5,
 'response_completeness_result': 'pass',
 'response_completeness_threshold': 3,
 'response_completeness_reason': 'The response fully contains the ground truth information without missing any details.',
 'response_completeness_prompt_tokens': 1410,
 'response_completeness_completion_tokens': 120,
 'response_completeness_total_tokens': 1530,
 'response_completeness_finish_reason': 'stop',
 'response_completeness_model': 'gpt-4.1-2025-04-14',
 'response_completeness_sample_input': '[{"role": "user", "content": "{\\"response\\": \\"The order with ID 123 has been shipped and is expected to be delivered on March 15, 2025. However, the order with ID 124 is delayed and should now arrive by March 20, 2025.\\", \\"ground_truth\\": \\"The order with ID 124 is delayed and should now arrive by March 20, 2025.\\"}"}]',
 'response_completeness_sample_output': '[{"role": "assistant", "content": "<S0>Let\'s think step by step: The ground truth contains one statement: \\"The

#### Evaluating using Conversation format

In [12]:
# Conversation format - includes user query, expected ground truth, and agent response
conversation = {
    "messages": [
        {
            "role": "user",
            "content": "What's the weather forecast for Seattle this weekend?"
        },
        {
            "role": "assistant",
            "content": "The weather in Seattle this weekend will be partly cloudy with temperatures around 15°C on Saturday.",
            "context": {
                "ground_truth": "The weather in Seattle this weekend will be partly cloudy with temperatures around 15°C on Saturday and 17°C on Sunday, with a 20% chance of rain on Sunday afternoon."
            }
        }
    ]
}

result = response_completeness_evaluator(conversation=conversation)
result

{'response_completeness': 1,
 'response_completeness_result': 'fail',
 'response_completeness_threshold': 3,
 'response_completeness_reason': 'There is no information in the ground truth, so the response cannot be evaluated for completeness and is considered fully incomplete.',
 'response_completeness_prompt_tokens': 1362,
 'response_completeness_completion_tokens': 109,
 'response_completeness_total_tokens': 1471,
 'response_completeness_finish_reason': 'stop',
 'response_completeness_model': 'gpt-4.1-2025-04-14',
 'response_completeness_sample_input': '[{"role": "user", "content": "{\\"response\\": \\"The weather in Seattle this weekend will be partly cloudy with temperatures around 15\\\\u00b0C on Saturday.\\", \\"ground_truth\\": \\"\\"}"}]',
 'response_completeness_sample_output': '[{"role": "assistant", "content": "<S0>Let\'s think step by step: The ground truth is completely empty, meaning there is no information provided to compare against the response. According to the definit