# Relevance Evaluator

## Objective
This sample demonstrates how to use the Relevance evaluator to assess the relevance of AI-generated responses. The evaluator supports the following input format:
- Simple query and response pairs

## Time

You should expect to spend about 15 minutes running this notebook.

## Before you begin
For quality evaluation, you need to deploy a `gpt` model supporting JSON mode. We recommend using `gpt-4.1`. It is not recommended to use `gpt-4.1-nano`.

### Prerequisite
```bash
pip install azure-ai-projects azure-identity openai
```
Set these environment variables with your own values:
1) **AZURE_AI_PROJECT_ENDPOINT** - Your Azure AI project endpoint in format: `https://<account_name>.services.ai.azure.com/api/projects/<project_name>`
2) **AZURE_AI_MODEL_DEPLOYMENT_NAME** - The deployment name of the model for this AI-assisted evaluator (e.g., gpt-4o-mini)


## What is Relevance?

The Relevance evaluator assesses the ability of AI responses to capture the key points and address the user's query appropriately. It measures how well the response relates to and answers the specific question or request.

**Scoring:** Relevance scores range from 1 to 5, with:
- **1**: Completely irrelevant to the query
- **2**: Partially relevant but misses key points
- **3**: Moderately relevant with some key points addressed
- **4**: Highly relevant with most key points covered
- **5**: Perfectly relevant, fully addressing the query

High relevance scores indicate the AI system's understanding of the input and capability to produce contextually appropriate outputs.

## Relevance Evaluator Input Requirements

The Relevance evaluator supports the following input format:

1. **Simple Query-Response Evaluation:**
   - `query`: The user's question or request (str)
   - `response`: The AI's response to evaluate (str)


### Initialize Relevance Evaluator

In [None]:
import os
from openai.types.evals.create_eval_jsonl_run_data_source_param import SourceFileContentContent
from pprint import pprint
from agent_utils import run_evaluator

# Get environment variables
deployment_name = os.environ["AZURE_AI_MODEL_DEPLOYMENT_NAME"]

# Data source configuration (defines the schema for evaluation inputs)
data_source_config = {
    "type": "custom",
    "item_schema": {
        "type": "object",
        "properties": {
            "query": {"anyOf": [{"type": "string"}, {"type": "array", "items": {"type": "object"}}]},
            "response": {"anyOf": [{"type": "string"}, {"type": "array", "items": {"type": "object"}}]},
        },
        "required": ["query", "response"],
    },
    "include_sample_schema": True,
}

# Data mapping (maps evaluation inputs to evaluator parameters)
data_mapping = {
    "query": "{{item.query}}",
    "response": "{{item.response}}"
}

# Initialization parameters for the evaluator
initialization_parameters = {
    "deployment_name": deployment_name
}

# Initialize the evaluation_contents list - we'll append all test cases here
evaluation_contents = []

## Sample Evaluations

### Query and Response as Strings (str)

In [None]:
# Test Case 1: Highly relevant weather response (should score high)
query1 = "How is the weather in Seattle?"
response1 = "The current weather in Seattle is rainy with a temperature of 14Â°C. It's typical Pacific Northwest weather for this time of year with overcast skies and light precipitation."

# Append to evaluation_contents
evaluation_contents.append(
    SourceFileContentContent(
        item={
            "query": query1,
            "response": response1
        }
    )
)

### Example of Irrelevant Response

In [None]:
# Test Case 2: Completely irrelevant response (should score low)
query2 = "What are the symptoms of flu?"
response2 = "The stock market had significant fluctuations today, with technology stocks leading the decline."

# Append to evaluation_contents
evaluation_contents.append(
    SourceFileContentContent(
        item={
            "query": query2,
            "response": response2
        }
    )
)

### Run Evaluation on All Test Cases

Now that we've defined all test cases, let's run the evaluation once on all of them.

In [None]:
results = run_evaluator(
    evaluator_name="relevance",
    evaluation_contents=evaluation_contents,
    data_source_config=data_source_config,
    initialization_parameters=initialization_parameters,
    data_mapping=data_mapping
)

### Display Results

View the evaluation results for each test case.

In [None]:
pprint(results)