# Intent Resolution Evaluator

## Objective
This sample demonstrates to how to use intent resolution evaluator on agent data. The supported input formats include:
- simple data such as strings;
- user-agent conversations in the form of list of agent messages. 

## Time

You should expect to spend about 20 minutes running this notebook. 

## Before you begin
For quality evaluation, you need to deploy a `gpt` model supporting JSON mode. We recommend a model `gpt-4o` or `gpt-4o-mini` for their strong reasoning capabilities.    

### Prerequisite
```bash
pip install azure-ai-projects azure-identity azure-ai-evaluation
```
Set these environment variables with your own values:
1) **PROJECT_CONNECTION_STRING** - The project connection string, as found in the overview page of your Azure AI Foundry project.
2) **MODEL_DEPLOYMENT_NAME** - The deployment name of the model for this AI-assisted evaluator, as found under the "Name" column in the "Models + endpoints" tab in your Azure AI Foundry project.
3) **AZURE_OPENAI_ENDPOINT** - Azure Open AI Endpoint to be used for evaluation.
4) **AZURE_OPENAI_API_KEY** - Azure Open AI Key to be used for evaluation.
5) **AZURE_OPENAI_API_VERSION** - Azure Open AI Api version to be used for evaluation.
6) **AZURE_SUBSCRIPTION_ID** - Azure Subscription Id of Azure AI Project
7) **PROJECT_NAME** - Azure AI Project Name
8) **RESOURCE_GROUP_NAME** - Azure AI Project Resource Group Name


The Intent Resolution evaluator measures how well an agent has identified and resolved the user intent.
The scoring is on a 1-5 integer scale and is as follows:

  - Score 1: Response completely unrelated to user intent
  - Score 2: Response minimally relates to user intent
  - Score 3: Response partially addresses the user intent but lacks complete details
  - Score 4: Response addresses the user intent with moderate accuracy but has minor inaccuracies or omissions
  - Score 5: Response directly addresses the user intent and fully resolves it

The evaluation requires the following inputs:

  - Query    : The user query. Either a string with a user request or a list of messages with previous requests from the user and responses from the assistant, potentially including a system message.
  - Response : The response to be evaluated. Either a string or a message with the response from the agent to the last user query.

There is a third optional parameter:
  - ToolDefinitions : The list of tool definitions the agent can call. This may be useful for the evaluator to better assess if the right tool was called to resolve a given intent.

### Initialize Intent Resolution Evaluator


In [1]:
%pip install azure-ai-projects azure-identity azure-ai-evaluation python-dotenv

Note: you may need to restart the kernel to use updated packages.


In [2]:
from dotenv import load_dotenv
load_dotenv()

True

In [3]:
import os
azure_ai_project = os.environ.get("AZURE_AI_PROJECT")

In [4]:
import os
from azure.ai.evaluation import AzureOpenAIModelConfiguration
from azure.ai.evaluation import IntentResolutionEvaluator
from azure.identity import DefaultAzureCredential, get_bearer_token_provider
from pprint import pprint

token_provider = get_bearer_token_provider(DefaultAzureCredential(), "https://cognitiveservices.azure.com/.default")

model_config = AzureOpenAIModelConfiguration(
    azure_endpoint=os.environ["AZURE_OPENAI_ENDPOINT"],
    api_key=os.environ["AZURE_OPENAI_API_KEY"],
    api_version=os.environ["AZURE_OPENAI_API_VERSION"],
    azure_deployment=os.environ["AZURE_OPENAI_DEPLOYMENT"],
)

intent_resolution_evaluator = IntentResolutionEvaluator(model_config)

Class IntentResolutionEvaluator: This is an experimental class, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.


### Samples

#### Evaluating query and response as string

In [5]:
# Success example. Intent is identified and understood and the response correctly resolves user intent
result = intent_resolution_evaluator(
    query="What are the opening hours of the Eiffel Tower?",
    response="Opening hours of the Eiffel Tower are 9:00 AM to 11:00 PM.",
)
pprint(result)

{'additional_details': {'actual_user_intent': 'find out the opening hours of '
                                              'the Eiffel Tower',
                        'agent_perceived_intent': 'provide the opening hours '
                                                  'of the Eiffel Tower',
                        'conversation_has_intent': True,
                        'correct_intent_detected': True,
                        'intent_resolved': True},
 'intent_resolution': 5.0,
 'intent_resolution_reason': 'The response provides the opening hours of the '
                             'Eiffel Tower, which directly addresses the '
                             "user's query with accurate and complete "
                             'information. No additional details or tools were '
                             'required to resolve the query.',
 'intent_resolution_result': 'pass',
 'intent_resolution_threshold': 3}


In [6]:
# Failure example. Even though intent is correctly identified, the response does not resolve the user intent
result = intent_resolution_evaluator(
    query="What is the opening hours of the Eiffel Tower?",
    response="Please check the official website for the up-to-date information on Eiffel Tower opening hours.",
)
pprint(result)

{'additional_details': {'actual_user_intent': 'find out the opening hours of '
                                              'the Eiffel Tower',
                        'agent_perceived_intent': 'provide information about '
                                                  'Eiffel Tower opening hours',
                        'conversation_has_intent': True,
                        'correct_intent_detected': True,
                        'intent_resolved': False},
 'intent_resolution': 2.0,
 'intent_resolution_reason': 'The response suggests checking the official '
                             'website for up-to-date information, which is a '
                             'generic and minimally helpful answer. While it '
                             'acknowledges the topic of the query (Eiffel '
                             'Tower opening hours), it does not provide any '
                             'specific or actionable information, leaving the '
                             'query 

#### Evaluating query and response as list of messages

In [7]:
query = [
    {"role": "system", "content": "You are a friendly and helpful customer service agent."},
    {
        "createdAt": "2025-03-14T06:14:20Z",
        "role": "user",
        "content": [
            {
                "type": "text",
                "text": "Hi, I need help with the last 2 orders on my account #888. Could you please update me on their status?",
            }
        ],
    },
]

response = [
    {
        "createdAt": "2025-03-14T06:14:30Z",
        "run_id": "0",
        "role": "assistant",
        "content": [{"type": "text", "text": "Hello! Let me quickly look up your account details."}],
    },
    {
        "createdAt": "2025-03-14T06:14:35Z",
        "run_id": "0",
        "role": "assistant",
        "content": [
            {
                "type": "tool_call",
                "tool_call_id": "tool_call_20250310_001",
                "name": "get_orders",
                "arguments": {"account_number": "888"},
            }
        ],
    },
    {
        "createdAt": "2025-03-14T06:14:40Z",
        "run_id": "0",
        "tool_call_id": "tool_call_20250310_001",
        "role": "tool",
        "content": [{"type": "tool_result", "tool_result": '[{ "order_id": "123" }, { "order_id": "124" }]'}],
    },
    {
        "createdAt": "2025-03-14T06:14:45Z",
        "run_id": "0",
        "role": "assistant",
        "content": [
            {
                "type": "text",
                "text": "Thanks for your patience. I see two orders on your account. Let me fetch the details for both.",
            }
        ],
    },
    {
        "createdAt": "2025-03-14T06:14:50Z",
        "run_id": "0",
        "role": "assistant",
        "content": [
            {
                "type": "tool_call",
                "tool_call_id": "tool_call_20250310_002",
                "name": "get_order",
                "arguments": {"order_id": "123"},
            },
            {
                "type": "tool_call",
                "tool_call_id": "tool_call_20250310_003",
                "name": "get_order",
                "arguments": {"order_id": "124"},
            },
        ],
    },
    {
        "createdAt": "2025-03-14T06:14:55Z",
        "run_id": "0",
        "tool_call_id": "tool_call_20250310_002",
        "role": "tool",
        "content": [
            {
                "type": "tool_result",
                "tool_result": '{ "order": { "id": "123", "status": "shipped", "delivery_date": "2025-03-15" } }',
            }
        ],
    },
    {
        "createdAt": "2025-03-14T06:15:00Z",
        "run_id": "0",
        "tool_call_id": "tool_call_20250310_003",
        "role": "tool",
        "content": [
            {
                "type": "tool_result",
                "tool_result": '{ "order": { "id": "124", "status": "delayed", "expected_delivery": "2025-03-20" } }',
            }
        ],
    },
    {
        "createdAt": "2025-03-14T06:15:05Z",
        "run_id": "0",
        "role": "assistant",
        "content": [
            {
                "type": "text",
                "text": "The order with ID 123 has been shipped and is expected to be delivered on March 15, 2025. However, the order with ID 124 is delayed and should now arrive by March 20, 2025. Is there anything else I can help you with?",
            }
        ],
    },
]

# please note that the tool definitions are not strictly required, and that some of the tools below are not used in the example above and that is ok.
# if context length is a concern you can remove the unused tool definitions or even the tool definitions altogether as the impact to the intent resolution evaluation is usual minimal.
tool_definitions = [
    {
        "name": "get_orders",
        "description": "Get the list of orders for a given account number.",
        "parameters": {
            "type": "object",
            "properties": {
                "account_number": {"type": "string", "description": "The account number to get the orders for."}
            },
        },
    },
    {
        "name": "get_order",
        "description": "Get the details of a specific order.",
        "parameters": {
            "type": "object",
            "properties": {"order_id": {"type": "string", "description": "The order ID to get the details for."}},
        },
    },
    {
        "name": "initiate_return",
        "description": "Initiate the return process for an order.",
        "parameters": {
            "type": "object",
            "properties": {"order_id": {"type": "string", "description": "The order ID for the return process."}},
        },
    },
    {
        "name": "update_shipping_address",
        "description": "Update the shipping address for a given account.",
        "parameters": {
            "type": "object",
            "properties": {
                "account_number": {"type": "string", "description": "The account number to update."},
                "new_address": {"type": "string", "description": "The new shipping address."},
            },
        },
    },
]

result = intent_resolution_evaluator(
    query=query,
    response=response,
    tool_definitions=tool_definitions,
)
pprint(result)

{'additional_details': {'actual_user_intent': 'provide an update on the status '
                                              'of the last two orders for '
                                              'account #888',
                        'agent_perceived_intent': 'provide an update on the '
                                                  'status of the last two '
                                                  'orders for account #888',
                        'conversation_has_intent': True,
                        'correct_intent_detected': True,
                        'intent_resolved': True},
 'intent_resolution': 5.0,
 'intent_resolution_reason': "The assistant accurately understood the user's "
                             'intent to get an update on the status of the '
                             'last two orders associated with their account. '
                             "It utilized the appropriate tools ('get_orders' "
                             "and 'get_order

## Batch evaluate and visualize results on Azure AI Foundry
Batch evaluate to leverage asynchronous evaluation on a dataset. 

Optionally, you can go to AI Foundry URL for rich Azure AI Foundry data visualization. You can inspect the evaluation scores and reasoning to quickly identify bugs and issues of your agent to fix and improve. Make sure to authenticate to Azure using `az login` in your terminal before running this cell.


In [8]:
from azure.ai.evaluation import evaluate

# This sample files contains the evaluation data in JSONL format. Where each line is a run from agent.
# This was saved using agent thread and converter.
file_name = "evaluation_data.jsonl"

azure_ai_project = "https://cvi-aie-wf-swedencentra-resource.services.ai.azure.com/api/projects/cvi-aie-wf-swedencentral"

response = evaluate(
    data=file_name,
    evaluation_name="Intent Resolution Evaluation",
    evaluators={
        "intent_resolution": intent_resolution_evaluator,
    },
    azure_ai_project={
        "subscription_id": os.environ["REPORT_AZURE_SUBSCRIPTION_ID"],
        "project_name": os.environ["REPORT_PROJECT_NAME"],
        "resource_group_name": os.environ["REPORT_RESOURCE_GROUP_NAME"],
    },
)
pprint(f'AI Foundary URL: {response.get("studio_url")}')

[2025-06-04 14:22:39 -0700][promptflow._sdk._orchestrator.run_submitter][INFO] - Submitting run azure_ai_evaluation_evaluators_intent_resolution_20250604_142239_252661, log path: /Users/cv/.promptflow/.runs/azure_ai_evaluation_evaluators_intent_resolution_20250604_142239_252661/logs.txt


2025-06-04 14:22:39 -0700   80056 execution.bulk     INFO     Current thread is not main thread, skip signal handler registration in BatchEngine.
2025-06-04 14:22:41 -0700   80056 execution.bulk     INFO     Finished 1 / 3 lines.
2025-06-04 14:22:41 -0700   80056 execution.bulk     INFO     Average execution time for completed lines: 2.24 seconds. Estimated time for incomplete lines: 4.48 seconds.
2025-06-04 14:22:42 -0700   80056 execution.bulk     INFO     Finished 3 / 3 lines.
2025-06-04 14:22:42 -0700   80056 execution.bulk     INFO     Average execution time for completed lines: 0.88 seconds. Estimated time for incomplete lines: 0.0 seconds.

Run name: "azure_ai_evaluation_evaluators_intent_resolution_20250604_142239_252661"
Run status: "Completed"
Start time: "2025-06-04 14:22:39.262350-07:00"
Duration: "0:00:03.150239"
Output path: "/Users/cv/.promptflow/.runs/azure_ai_evaluation_evaluators_intent_resolution_20250604_142239_252661"


{
    "intent_resolution": {
        "status"

In [9]:
import pandas as pd
pd.DataFrame(response["rows"])

Unnamed: 0,inputs.query,inputs.response,inputs.tool_definitions,outputs.intent_resolution.intent_resolution,outputs.intent_resolution.intent_resolution_result,outputs.intent_resolution.intent_resolution_threshold,outputs.intent_resolution.intent_resolution_reason,outputs.intent_resolution.additional_details,line_number
0,"[{'role': 'system', 'content': 'You are a help...","[{'createdAt': '2025-04-04T20:48:07Z', 'run_id...","[{'name': 'convert_temperature', 'description'...",5,pass,3,The assistant accurately understood the user's...,"{'conversation_has_intent': True, 'agent_perce...",0
1,"[{'role': 'system', 'content': 'You are a help...","[{'createdAt': '2025-04-05T09:15:03Z', 'run_id...","[{'name': 'convert_temperature', 'description'...",5,pass,3,The assistant accurately understood the user's...,"{'conversation_has_intent': True, 'agent_perce...",1
2,"[{'role': 'system', 'content': 'You are a help...","[{'createdAt': '2025-04-06T14:22:12Z', 'run_id...","[{'name': 'convert_temperature', 'description'...",5,pass,3,The assistant accurately understood the user's...,"{'conversation_has_intent': True, 'agent_perce...",2
