### Response Completeness Evaluator

### Getting Started

This sample demonstrates how to use Response Completeness Evaluator on agent's response when ground truth is provided. This evaluator is helpful when you have ground truth to assess the quality of the agent's final response. 

## Time

You should expect to spend about 20 minutes running this notebook. 

## Before you begin
For quality evaluation, you need to deploy a `gpt` model supporting JSON mode. We recommend using `gpt-4o` or `gpt-4.1`.  

### Prerequisite
```bash
pip install azure-ai-projects azure-identity openai
```
Set these environment variables with your own values:
1) **AZURE_AI_PROJECT_ENDPOINT** - Your Azure AI project endpoint in format: `https://<account_name>.services.ai.azure.com/api/projects/<project_name>`
2) **AZURE_AI_MODEL_DEPLOYMENT_NAME** - The deployment name of the model for this AI-assisted evaluator (e.g., gpt-4o-mini)


The Response Completeness evaluator assesses the quality of an agent response by examining how well it aligns with the provided ground truth. The evaluation is based on the following scoring system:

<pre>
Score 1: Fully incomplete: The response misses all necessary and relevant information compared to the ground truth.
Score 2: Barely complete: The response contains only a small percentage of the necessary information.
Score 3: Moderately complete: The response includes about half of the necessary information.
Score 4: Mostly complete: The response contains most of the necessary information, with only minor omissions.
Score 5: Fully complete: The response perfectly matches all necessary and relevant information from the ground truth.
</pre>

The evaluation requires the following inputs:

- Response: The response to be evaluated. (string)
- Ground Truth: The correct and complete information against which the response is compared. (string)

The evaluator uses these inputs to determine the completeness score, ensuring that the response meaningfully addresses the query while adhering to the provided definitions and data.

### Initialize Completeness Evaluator


In [None]:
import os
from openai.types.evals.create_eval_jsonl_run_data_source_param import SourceFileContentContent
from pprint import pprint
from agent_utils import run_evaluator

# Get environment variables
deployment_name = os.environ["AZURE_AI_MODEL_DEPLOYMENT_NAME"]

# Data source configuration (defines the schema for evaluation inputs)
data_source_config = {
    "type": "custom",
    "item_schema": {
        "type": "object",
        "properties": {
            "ground_truth": {
                "type": "string"
            },
            "response": {
                "type": "string"
            }
        },
        "required": ["ground_truth", "response"]
    },
    "include_sample_schema": True
}

# Data mapping (maps evaluation inputs to evaluator parameters)
data_mapping = {
    "ground_truth": "{{item.ground_truth}}",
    "response": "{{item.response}}"
}

# Initialization parameters for the evaluator
initialization_parameters = {
    "deployment_name": deployment_name
}

# Initialize the evaluation_contents list - we'll append all test cases here
evaluation_contents = []

### Samples

#### Response and Ground Truth as Strings (str)

In [None]:
# Test Case 1: Incomplete response (should score low)
response1 = "The capital of Japan"
ground_truth1 = "The capital of Japan is Tokyo."

# Append to evaluation_contents
evaluation_contents.append(
    SourceFileContentContent(
        item={
            "response": response1,
            "ground_truth": ground_truth1
        }
    )
)

In [None]:
# Test Case 2: Complete response (should score high)
response2 = "The capital of Japan is Tokyo."
ground_truth2 = "The capital of Japan is Tokyo."

# Append to evaluation_contents
evaluation_contents.append(
    SourceFileContentContent(
        item={
            "response": response2,
            "ground_truth": ground_truth2
        }
    )
)

### Run Evaluation on All Test Cases

Now that we've defined all test cases, let's run the evaluation once on all of them.

In [None]:
results = run_evaluator(
    evaluator_name="response_completeness",
    evaluation_contents=evaluation_contents,
    data_source_config=data_source_config,
    initialization_parameters=initialization_parameters,
    data_mapping=data_mapping
)

### Display Results

View the evaluation results for each test case.

In [None]:
pprint(results)