# Relevance Evaluator

## Objective
This sample demonstrates how to use the Relevance evaluator to assess the relevance of AI-generated responses. The evaluator supports multiple input formats including:
- Simple query and response pairs
- Multi-turn conversations

## Time

You should expect to spend about 15 minutes running this notebook.

## Before you begin
For quality evaluation, you need to deploy a `gpt` model supporting JSON mode. We recommend a model `gpt-4o` or `gpt-4o-mini` for their strong reasoning capabilities.

### Prerequisite
```bash
pip install azure-ai-evaluation
```
Set these environment variables with your own values:
1) **MODEL_DEPLOYMENT_NAME** - The deployment name of the model for this AI-assisted evaluator
2) **AZURE_OPENAI_ENDPOINT** - Azure OpenAI Endpoint to be used for evaluation
3) **AZURE_OPENAI_API_KEY** - Azure OpenAI Key to be used for evaluation
4) **AZURE_OPENAI_API_VERSION** - Azure OpenAI API version to be used for evaluation


## What is Relevance?

The Relevance evaluator assesses the ability of AI responses to capture the key points and address the user's query appropriately. It measures how well the response relates to and answers the specific question or request.

**Scoring:** Relevance scores range from 1 to 5, with:
- **1**: Completely irrelevant to the query
- **2**: Partially relevant but misses key points
- **3**: Moderately relevant with some key points addressed
- **4**: Highly relevant with most key points covered
- **5**: Perfectly relevant, fully addressing the query

High relevance scores indicate the AI system's understanding of the input and capability to produce contextually appropriate outputs.

## Relevance Evaluator Input Requirements

The Relevance evaluator supports the following input formats:

1. **Simple Query-Response Evaluation:**
   - `query`: The user's question or request (str)
   - `response`: The AI's response to evaluate (str)

2. **Conversation Evaluation:**
   - `conversation`: Multi-turn conversation with messages list and optional context (Conversation object)


### Initialize Relevance Evaluator

In [None]:
import os
from azure.ai.evaluation import RelevanceEvaluator, AzureOpenAIModelConfiguration
from pprint import pprint

# Configure the model
model_config = AzureOpenAIModelConfiguration(
    azure_endpoint=os.environ["AZURE_OPENAI_ENDPOINT"],
    api_key=os.environ["AZURE_OPENAI_API_KEY"],
    api_version=os.environ["AZURE_OPENAI_API_VERSION"],
    azure_deployment=os.environ["MODEL_DEPLOYMENT_NAME"],
)

# Initialize the evaluator
relevance_evaluator = RelevanceEvaluator(model_config=model_config)

## Sample Evaluations

### 1. Highly Relevant Response

In [None]:
# Example of a highly relevant response - using weather query similar to ToolCallAccuracyEvaluator
query = "How is the weather in Seattle?"
response = "The current weather in Seattle is rainy with a temperature of 14Â°C. It's typical Pacific Northwest weather for this time of year with overcast skies and light precipitation."

result = relevance_evaluator(query=query, response=response)
print("=== Highly Relevant Response ===")
pprint(result)

### 2. Partially Relevant Response

In [None]:
# Example of a partially relevant response
query = "How do I bake a chocolate cake?"
response = "Chocolate is made from cocoa beans and was first discovered by the Mayans. There are many types of chocolate including dark, milk, and white chocolate."

result = relevance_evaluator(query=query, response=response)
print("=== Partially Relevant Response ===")
pprint(result)

### 3. Irrelevant Response

In [None]:
# Example of an irrelevant response
query = "What are the symptoms of flu?"
response = "The stock market had significant fluctuations today, with technology stocks leading the decline."

result = relevance_evaluator(query=query, response=response)
print("=== Irrelevant Response ===")
pprint(result)

### 4. Conversation Evaluation
The evaluator also supports evaluating multi-turn conversations using the `conversation` parameter.

In [None]:
# Example of conversation evaluation
conversation = {
    "messages": [
        {
            "role": "user",
            "content": "I'm planning a trip to Japan. Can you help me with some recommendations?"
        },
        {
            "role": "assistant",
            "content": "I'd be happy to help you plan your trip to Japan! Here are some popular destinations: Tokyo for modern city life and culture, Kyoto for traditional temples and gardens, Osaka for amazing food, and Mount Fuji for scenic beauty. What type of experiences are you most interested in?"
        },
        {
            "role": "user",
            "content": "I'm particularly interested in traditional culture and temples."
        },
        {
            "role": "assistant",
            "content": "Perfect! For traditional culture and temples, I highly recommend Kyoto. Visit Fushimi Inari Shrine with its thousands of red torii gates, Kinkaku-ji (Golden Pavilion), and Gion district where you might spot geishas. Nara is also wonderful with its deer park and Todai-ji Temple housing a giant Buddha statue."
        }
    ]
}

result = relevance_evaluator(conversation=conversation)
print("=== Conversation Evaluation ===")
pprint(result)

### 5. Technical Query Evaluation

In [None]:
# Example of technical query evaluation
query = "How do I implement a binary search algorithm in Python?"
response = """Here's a Python implementation of binary search:

def binary_search(arr, target):
    left, right = 0, len(arr) - 1
    
    while left <= right:
        mid = (left + right) // 2
        
        if arr[mid] == target:
            return mid
        elif arr[mid] < target:
            left = mid + 1
        else:
            right = mid - 1
    
    return -1

This algorithm works on sorted arrays and has O(log n) time complexity."""

result = relevance_evaluator(query=query, response=response)
print("=== Technical Query Evaluation ===")
pprint(result)

### 6. Customer Service Scenario

In [None]:
# Example of customer service conversation
conversation = {
    "messages": [
        {
            "role": "user",
            "content": "I received a damaged product in my order #12345. What can you do to help?"
        },
        {
            "role": "assistant",
            "content": "I'm sorry to hear about the damaged product in order #12345. I can help you with this issue. Let me check your order details and arrange either a replacement or refund. Could you please describe the damage and let me know your preference for resolution?"
        },
        {
            "role": "user",
            "content": "The screen of the tablet has a crack. I'd prefer a replacement."
        },
        {
            "role": "assistant",
            "content": "I understand you received a tablet with a cracked screen. I'll immediately process a replacement for you. You'll receive a new tablet within 2-3 business days, and we'll include a prepaid return label for the damaged unit. Is your shipping address still the same as in the original order?"
        }
    ]
}

result = relevance_evaluator(conversation=conversation)
print("=== Customer Service Scenario ===")
pprint(result)