# Path Efficiency Evaluator

### Getting Started

This sample demonstrates how to use Path Efficiency Evaluator
Before running the sample:
```bash
pip install azure-ai-projects azure-identity azure-ai-evaluation
```
Note: The Path Efficiency Evaluator does not require Azure OpenAI configuration as it's a rule-based evaluator.

The Path Efficiency Evaluator measures how efficient an agent's sequence of actions is compared to an optimal path.

The evaluator provides multiple metrics:

**Numeric Scores (0.0 - 1.0):**
- **Precision**: How many of the agent's steps were necessary (relevant to ground truth)
- **Recall**: How many of the required steps were executed by the agent  
- **F1 Score**: Harmonic mean of precision and recall

**Binary Match Results (pass/fail):**
- **Exact Match**: Whether the agent's steps exactly match the ground truth
- **In-Order Match**: Whether all ground truth steps appear in correct order (allows extra steps)
- **Any-Order Match**: Whether all ground truth steps appear with sufficient frequency (ignores order, allows extra steps)

The evaluation requires the following inputs:
- **Response**: The agent's response containing tool calls as a list of messages
- **Ground Truth**: List of expected tool/action steps as strings

### Initialize Path Efficiency Evaluator

In [None]:
from azure.ai.evaluation._evaluators._path_efficiency import PathEfficiencyEvaluator
from pprint import pprint

# Initialize with custom thresholds
path_efficiency_evaluator = PathEfficiencyEvaluator(
    precision_threshold=0.7,
    recall_threshold=0.8, 
    f1_score_threshold=0.75
)

# Or use default thresholds (0.5 for all metrics)
# path_efficiency_evaluator = PathEfficiencyEvaluator()

### Samples

#### Sample 1: Perfect Path (Exact Match)

In [None]:
# Agent follows the exact optimal path
response = [
    {
        "role": "assistant",
        "content": [{"type": "tool_call", "tool_call_id": "call_1", "name": "search", "arguments": {}}],
    },
    {
        "role": "assistant", 
        "content": [{"type": "tool_call", "tool_call_id": "call_2", "name": "analyze", "arguments": {}}],
    },
    {
        "role": "assistant",
        "content": [{"type": "tool_call", "tool_call_id": "call_3", "name": "report", "arguments": {}}],
    },
]

ground_truth = ["search", "analyze", "report"]

result = path_efficiency_evaluator(response=response, ground_truth=ground_truth)
print("Perfect Path Results:")
pprint(result)

#### Sample 2: Efficient Path with Extra Steps

In [None]:
# Agent includes extra steps but maintains correct order
response = [
    {
        "role": "assistant",
        "content": [{"type": "tool_call", "tool_call_id": "call_1", "name": "search", "arguments": {}}],
    },
    {
        "role": "assistant",
        "content": [{"type": "tool_call", "tool_call_id": "call_2", "name": "validate", "arguments": {}}],
    },
    {
        "role": "assistant", 
        "content": [{"type": "tool_call", "tool_call_id": "call_3", "name": "analyze", "arguments": {}}],
    },
    {
        "role": "assistant",
        "content": [{"type": "tool_call", "tool_call_id": "call_4", "name": "report", "arguments": {}}],
    },
]

ground_truth = ["search", "analyze", "report"]

result = path_efficiency_evaluator(response=response, ground_truth=ground_truth)
print("\nPath with Extra Steps Results:")
pprint(result)

#### Sample 3: Inefficient Path (Wrong Order)

In [None]:
# Agent performs all required steps but in wrong order
response = [
    {
        "role": "assistant",
        "content": [{"type": "tool_call", "tool_call_id": "call_1", "name": "report", "arguments": {}}],
    },
    {
        "role": "assistant",
        "content": [{"type": "tool_call", "tool_call_id": "call_2", "name": "search", "arguments": {}}],
    },
    {
        "role": "assistant", 
        "content": [{"type": "tool_call", "tool_call_id": "call_3", "name": "analyze", "arguments": {}}],
    },
]

ground_truth = ["search", "analyze", "report"]

result = path_efficiency_evaluator(response=response, ground_truth=ground_truth)
print("\nWrong Order Results:")
pprint(result)

#### Sample 4: Incomplete Path (Missing Steps)

In [None]:
# Agent misses some required steps
response = [
    {
        "role": "assistant",
        "content": [{"type": "tool_call", "tool_call_id": "call_1", "name": "search", "arguments": {}}],
    },
    {
        "role": "assistant",
        "content": [{"type": "tool_call", "tool_call_id": "call_2", "name": "analyze", "arguments": {}}],
    },
]

ground_truth = ["search", "analyze", "report"]

result = path_efficiency_evaluator(response=response, ground_truth=ground_truth)
print("\nMissing Steps Results:")
pprint(result)

#### Sample 5: Real-World Customer Service Scenario

In [None]:
# Customer service agent handling order inquiry
response = [
    {
        "role": "assistant",
        "content": [{"type": "tool_call", "tool_call_id": "call_1", "name": "lookup_order", "arguments": {"order_id": "12345"}}],
    },
    {
        "role": "assistant",
        "content": [{"type": "tool_call", "tool_call_id": "call_2", "name": "check_inventory", "arguments": {"product_id": "ABC123"}}],
    },
    {
        "role": "assistant", 
        "content": [{"type": "tool_call", "tool_call_id": "call_3", "name": "calculate_refund", "arguments": {"order_id": "12345"}}],
    },
    {
        "role": "assistant",
        "content": [{"type": "tool_call", "tool_call_id": "call_4", "name": "process_refund", "arguments": {"order_id": "12345", "amount": "29.99"}}],
    },
]

ground_truth = ["lookup_order", "calculate_refund", "process_refund"]

result = path_efficiency_evaluator(response=response, ground_truth=ground_truth)
print("\nCustomer Service Results:")
pprint(result)

#### Sample 6: Complex Path with Duplicates

In [None]:
# Agent repeats some steps and includes extra ones
response = [
    {
        "role": "assistant",
        "content": [{"type": "tool_call", "tool_call_id": "call_1", "name": "search", "arguments": {}}],
    },
    {
        "role": "assistant",
        "content": [{"type": "tool_call", "tool_call_id": "call_2", "name": "search", "arguments": {}}],  # duplicate
    },
    {
        "role": "assistant", 
        "content": [{"type": "tool_call", "tool_call_id": "call_3", "name": "validate", "arguments": {}}],  # extra step
    },
    {
        "role": "assistant",
        "content": [{"type": "tool_call", "tool_call_id": "call_4", "name": "analyze", "arguments": {}}],
    },
    {
        "role": "assistant",
        "content": [{"type": "tool_call", "tool_call_id": "call_5", "name": "report", "arguments": {}}],
    },
]

ground_truth = ["search", "analyze", "report"]

result = path_efficiency_evaluator(response=response, ground_truth=ground_truth)
print("\nComplex Path with Duplicates Results:")
pprint(result)

#### Sample 7: Edge Cases and Error Scenarios

In [None]:
# Test with empty agent response
try:
    response = []
    ground_truth = ["search", "analyze", "report"]
    
    result = path_efficiency_evaluator(response=response, ground_truth=ground_truth)
    print("\nEmpty Response Results:")
    pprint(result)
except Exception as e:
    print(f"Error with empty response: {e}")

# Test with empty ground truth (should raise error)
try:
    response = [
        {
            "role": "assistant",
            "content": [{"type": "tool_call", "tool_call_id": "call_1", "name": "search", "arguments": {}}],
        }
    ]
    ground_truth = []
    
    result = path_efficiency_evaluator(response=response, ground_truth=ground_truth)
    print("\nEmpty Ground Truth Results:")
    pprint(result)
except Exception as e:
    print(f"Error with empty ground truth: {e}")

#### Sample 8: Tuple Format with Parameters

In [None]:
# PathEfficiencyEvaluator also supports tuple format with parameters for exact parameter matching
response_with_params = [
    {
        "role": "assistant",
        "content": [{"type": "tool_call", "tool_call_id": "call_1", "name": "search", "arguments": {"query": "test"}}],
    },
]

# Ground truth using tuple format: (tool_names, parameters_dict)
# Parameters must match exactly for tools to be considered matching
ground_truth_with_params = (["search"], {"search": {"query": "test"}})

result = path_efficiency_evaluator(response=response_with_params, ground_truth=ground_truth_with_params)
print("\nTuple Format with Parameters Results:")
pprint(result)

### Evaluation Analysis Helper Function

In [None]:
def analyze_path_efficiency(response, ground_truth, scenario_name):
    """
    Helper function to analyze and display path efficiency results
    """
    result = path_efficiency_evaluator(response=response, ground_truth=ground_truth)
    
    print(f"\n{'='*50}")
    print(f"Analysis for: {scenario_name}")
    print(f"{'='*50}")
    
    print(f"Ground Truth Steps: {ground_truth}")
    print(f"{'='*50}")
    
    # Numeric scores
    print("Numeric Scores:")
    print(f"  Precision: {result['path_efficiency_precision_score']:.3f}")
    print(f"  Recall: {result['path_efficiency_recall_score']:.3f}")
    print(f"  F1 Score: {result['path_efficiency_f1_score']:.3f}")

    # Binary matches
    print("\nBinary Match Results:")
    print(f"  Exact Match: {result['path_efficiency_exact_match_result']}")
    print(f"  In-Order Match: {result['path_efficiency_in_order_match_result']}")
    print(f"  Any-Order Match: {result['path_efficiency_any_order_match_result']}")

    return result

### Example Usage of Helper Function

In [None]:
# Example: Using the helper function to analyze different scenarios

# Scenario 1: Perfect efficiency
perfect_response = [
    {"role": "assistant", "content": [{"type": "tool_call", "tool_call_id": "call_1", "name": "authenticate", "arguments": {}}]},
    {"role": "assistant", "content": [{"type": "tool_call", "tool_call_id": "call_2", "name": "fetch_data", "arguments": {}}]},
    {"role": "assistant", "content": [{"type": "tool_call", "tool_call_id": "call_3", "name": "process_result", "arguments": {}}]},
]
perfect_ground_truth = ["authenticate", "fetch_data", "process_result"]

analyze_path_efficiency(perfect_response, perfect_ground_truth, "Perfect Efficiency Example")

# Scenario 2: Inefficient with extra steps
inefficient_response = [
    {"role": "assistant", "content": [{"type": "tool_call", "tool_call_id": "call_1", "name": "authenticate", "arguments": {}}]},
    {"role": "assistant", "content": [{"type": "tool_call", "tool_call_id": "call_2", "name": "log_attempt", "arguments": {}}]},  # extra
    {"role": "assistant", "content": [{"type": "tool_call", "tool_call_id": "call_3", "name": "fetch_data", "arguments": {}}]},
    {"role": "assistant", "content": [{"type": "tool_call", "tool_call_id": "call_4", "name": "validate_data", "arguments": {}}]},  # extra
    {"role": "assistant", "content": [{"type": "tool_call", "tool_call_id": "call_5", "name": "process_result", "arguments": {}}]},
]
inefficient_ground_truth = ["authenticate", "fetch_data", "process_result"]

analyze_path_efficiency(inefficient_response, inefficient_ground_truth, "Inefficient Path with Extra Steps")