# Task Navigation Efficiency Evaluator

## Objective
This sample demonstrates how to use task navigation efficiency evaluator on agent data. The supported input formats include:
- simple data such as strings and `dict` describing task responses;
- user-agent conversations in the form of list of agent messages. 

## Time

You should expect to spend about 20 minutes running this notebook. 

## Before you begin
This is a deterministic evaluator that compares navigation paths.

### Prerequisite
```bash
pip install azure-ai-evaluation
```
No environment variables are required for this evaluator.

The Task Navigation Efficiency Evaluator measures how efficiently an agent navigates through a sequence of actions compared to an optimal task completion path.

The evaluator provides comprehensive evaluation with both binary matching results and additional detailed P\R\F1 results:

**Primary Result:**
- **Binary Match Result**: Pass/Fail based on the selected matching mode

**Available Matching Modes:**
- **Exact Match**: Agent's tool calls must exactly match the ground truth (default)
- **In-Order Match**: All ground truth steps must appear in correct order (allows extra steps)
- **Any-Order Match**: All ground truth steps must appear with sufficient frequency (most lenient)

**Properties Bag Additional Metrics (0.0 - 1.0):**
- **Precision**: How many of the agent's steps were necessary (relevant to ground truth)
- **Recall**: How many of the required steps were executed by the agent  
- **F1 Score**: Harmonic mean of precision and recall

The evaluation requires the following inputs:
- **Response**: The agent's response containing tool calls as a list of messages or string
- **Ground Truth**: List of expected tool/action steps as strings, or tuple with parameters for matching

### Initialize Task Navigation Efficiency Evaluator


In [None]:
from azure.ai.evaluation._evaluators._task_navigation_efficiency import _TaskNavigationEfficiencyEvaluator, _TaskNavigationEfficiencyMatchingMode
from pprint import pprint

# Initialize with exact match mode
task_navigation_efficiency_evaluator = _TaskNavigationEfficiencyEvaluator(
    matching_mode=_TaskNavigationEfficiencyMatchingMode.EXACT_MATCH
)

# Other examples:
# For in-order matching (allows extra steps but requires correct order)
# task_navigation_efficiency_evaluator = _TaskNavigationEfficiencyEvaluator(matching_mode=_TaskNavigationEfficiencyMatchingMode.IN_ORDER_MATCH)

# For any-order matching (most lenient - allows extra steps and different order)  
# task_navigation_efficiency_evaluator = _TaskNavigationEfficiencyEvaluator(matching_mode=_TaskNavigationEfficiencyMatchingMode.ANY_ORDER_MATCH)

# Or use defaults (exact match mode)
# task_navigation_efficiency_evaluator = _TaskNavigationEfficiencyEvaluator()

### Samples

#### Sample 1: Perfect Path (Exact Match)

In [None]:
# Agent follows the exact optimal path
response = [
    {
        "role": "assistant",
        "content": [{"type": "tool_call", "tool_call_id": "call_1", "name": "search", "arguments": {}}],
    },
    {
        "role": "assistant", 
        "content": [{"type": "tool_call", "tool_call_id": "call_2", "name": "analyze", "arguments": {}}],
    },
    {
        "role": "assistant",
        "content": [{"type": "tool_call", "tool_call_id": "call_3", "name": "report", "arguments": {}}],
    },
]

ground_truth = ["search", "analyze", "report"]

result = task_navigation_efficiency_evaluator(response=response, ground_truth=ground_truth)
print("Perfect Path Results:")
pprint(result)

#### Sample 2: Efficient Path with Extra Steps

In [None]:
# Agent performs all required steps but with extra unnecessary step
response = [
    {
        "role": "assistant",
        "content": [{"type": "tool_call", "tool_call_id": "call_1", "name": "search", "arguments": {}}],
    },
    {
        "role": "assistant",
        "content": [{"type": "tool_call", "tool_call_id": "call_2", "name": "validate", "arguments": {}}],
    },
    {
        "role": "assistant", 
        "content": [{"type": "tool_call", "tool_call_id": "call_3", "name": "analyze", "arguments": {}}],
    },
    {
        "role": "assistant",
        "content": [{"type": "tool_call", "tool_call_id": "call_4", "name": "report", "arguments": {}}],
    },
]

ground_truth = ["search", "analyze", "report"]

result = task_navigation_efficiency_evaluator(response=response, ground_truth=ground_truth)
print("\nPath with Extra Steps Results:")
pprint(result)

#### Sample 3: Inefficient Path (Wrong Order)

In [None]:
# Agent performs all required steps but in wrong order
response = [
    {
        "role": "assistant",
        "content": [{"type": "tool_call", "tool_call_id": "call_1", "name": "report", "arguments": {}}],
    },
    {
        "role": "assistant",
        "content": [{"type": "tool_call", "tool_call_id": "call_2", "name": "search", "arguments": {}}],
    },
    {
        "role": "assistant", 
        "content": [{"type": "tool_call", "tool_call_id": "call_3", "name": "analyze", "arguments": {}}],
    },
]

ground_truth = ["search", "analyze", "report"]

# Using in-order matching mode to demonstrate the difference
in_order_task_navigation_efficiency_evaluator = _TaskNavigationEfficiencyEvaluator(matching_mode=_TaskNavigationEfficiencyMatchingMode.IN_ORDER_MATCH)

result = in_order_task_navigation_efficiency_evaluator(response=response, ground_truth=ground_truth)
print("\nWrong Order Results:")
pprint(result)

#### Sample 4: Incomplete Path (Missing Steps)

In [None]:
# Agent performs only some of the required steps (incomplete)
response = [
    {
        "role": "assistant",
        "content": [{"type": "tool_call", "tool_call_id": "call_1", "name": "search", "arguments": {}}],
    },
    {
        "role": "assistant",
        "content": [{"type": "tool_call", "tool_call_id": "call_2", "name": "analyze", "arguments": {}}],
    },
]

ground_truth = ["search", "analyze", "report"]

result = task_navigation_efficiency_evaluator(response=response, ground_truth=ground_truth)
print("\nMissing Steps Results:")
pprint(result)

#### Sample 5: Real-World Customer Service Scenario

In [None]:
# Real-world example: Customer service agent handling a refund request
response = [
    {
        "role": "assistant",
        "content": [{"type": "tool_call", "tool_call_id": "call_1", "name": "lookup_order", "arguments": {"order_id": "12345"}}],
    },
    {
        "role": "assistant",
        "content": [{"type": "tool_call", "tool_call_id": "call_2", "name": "check_inventory", "arguments": {"product_id": "ABC123"}}],
    },
    {
        "role": "assistant", 
        "content": [{"type": "tool_call", "tool_call_id": "call_3", "name": "calculate_refund", "arguments": {"order_id": "12345"}}],
    },
    {
        "role": "assistant",
        "content": [{"type": "tool_call", "tool_call_id": "call_4", "name": "process_refund", "arguments": {"order_id": "12345", "amount": "29.99"}}],
    },
]

ground_truth = ["lookup_order", "calculate_refund", "process_refund"]

result = task_navigation_efficiency_evaluator(response=response, ground_truth=ground_truth)
print("\nCustomer Service Results:")
pprint(result)

#### Sample 6: Complex Path with Duplicates

In [None]:
# Agent repeats some steps and includes extra ones
response = [
    {
        "role": "assistant",
        "content": [{"type": "tool_call", "tool_call_id": "call_1", "name": "search", "arguments": {}}],
    },
    {
        "role": "assistant",
        "content": [{"type": "tool_call", "tool_call_id": "call_2", "name": "search", "arguments": {}}],  # duplicate
    },
    {
        "role": "assistant", 
        "content": [{"type": "tool_call", "tool_call_id": "call_3", "name": "validate", "arguments": {}}],  # extra step
    },
    {
        "role": "assistant",
        "content": [{"type": "tool_call", "tool_call_id": "call_4", "name": "analyze", "arguments": {}}],
    },
    {
        "role": "assistant",
        "content": [{"type": "tool_call", "tool_call_id": "call_5", "name": "report", "arguments": {}}],
    },
]

ground_truth = ["search", "analyze", "report"]

result = task_navigation_efficiency_evaluator(response=response, ground_truth=ground_truth)
print("\nComplex Path with Duplicates Results:")
pprint(result)

#### Sample 7: Tuple Format with Parameters

In [None]:
# TaskNavigationEfficiencyEvaluator also supports tuple format with parameters for exact parameter matching
response_with_params = [
    {
        "role": "assistant",
        "content": [{"type": "tool_call", "tool_call_id": "call_1", "name": "search", "arguments": {"query": "test"}}],
    },
]

# Ground truth using tuple format: (tool_names, parameters_dict)
# Parameters must match exactly for tools to be considered matching
ground_truth_with_params = (["search"], {"search": {"query": "test"}})

result = task_navigation_efficiency_evaluator(response=response_with_params, ground_truth=ground_truth_with_params)
print("\nTuple Format with Parameters Results:")
pprint(result)

#### Sample 8: String Response Input Type

In [None]:
# Demonstrate string response input type
# The string response should contain structured tool call information that can be parsed
string_response = "I'll help you with that. Let me search for information, then analyze the results, and finally provide a report."
ground_truth = ["search", "analyze", "report"]

result = task_navigation_efficiency_evaluator(response=string_response, ground_truth=ground_truth)
print("\nString Response Results:")
pprint(result)