# Tool Call Accuracy Evaluator

The Tool Call Accuracy evaluator assesses how accurately an AI uses tools by examining:
- Relevance to the conversation
- Parameter correctness according to tool definitions
- Parameter value extraction from the conversation
- Potential usefulness of the tool call

The evaluator uses a binary scoring system (0 or 1):
    - Score 0: The tool call is irrelevant or contains information not in the conversation/definition
    - Score 1: The tool call is relevant with properly extracted parameters from the conversation

This evaluation focuses on measuring whether tool calls meaningfully contribute to addressing query while properly following tool definitions and using information present in the conversation history.

Tool Call Accuracy requires following input:
- Query - This can be a single query or a list of messages(conversation history with agent). Latter helps to determine if Agent used the information in history to make right tool calls.
- Tool Calls - Tool Call(s) made by Agent to answer the query. Optional - if response has tool calls, if not provided evaluator will look for tool calls in response.
- Response - (Optional)Response from Agent (or any GenAI App). This can be a single text response or a list or messages generated as part of Agent Response. If tool calls are not provide Tool Call Accuracy Evaluator will look at response for tool calls.
- Tool Definitions - Tool(s) definition used by Agent to answer the query. 


### Initialize Tool Call Accuracy Evaluator


In [1]:
from azure.ai.evaluation import ToolCallAccuracyEvaluator , AzureOpenAIModelConfiguration
from pprint import pprint

model_config = AzureOpenAIModelConfiguration(
    azure_endpoint="https://ai-anksingevaluatehub493534796867.openai.azure.com",
    api_key="9h4nJr7YARncVgGDBeI3KYTsKGUMnFTv965BIdAHG06HzuRgNA7qJQQJ99BAACYeBjFXJ3w3AAAAACOGELLv",
    api_version="2025-01-01-preview",
    azure_deployment="gpt-4o-mini",
)

tool_call_accuracy = ToolCallAccuracyEvaluator(model_config=model_config)

### Samples

#### Evaluating Single Tool Call

In [6]:
query = "How is the weather in Seattle ?"
tool_call = {
                "type": "tool_call",
                "tool_call": {
                    "id": "call_eYtq7fMyHxDWIgeG2s26h0lJ",
                    "type": "function",
                        "function": {
                            "name": "fetch_weather",
                            "arguments": {
                                "location": "Seattle"
                            }
                        }
                }
            }

tool_definition = {
                    "id": "fetch_weather",
                    "name": "fetch_weather",
                    "description": "Fetches the weather information for the specified location.",
                    "parameters": {
                        "type": "object",
                        "properties": {
                            "location": {
                                "type": "string",
                                "description": "The location to fetch weather for."
                            }
                        }
                    }
                }

In [7]:
response = tool_call_accuracy(query=query, tool_calls=tool_call, tool_definitions=tool_definition)
pprint(response)

{'per_tool_call_details': [{'tool_call_accurate': True,
                            'tool_call_accurate_reason': 'The TOOL CALL is '
                                                         'directly relevant to '
                                                         "the user's inquiry "
                                                         'about the weather in '
                                                         'Seattle, uses '
                                                         'appropriate '
                                                         'parameters that '
                                                         'match the TOOL '
                                                         'DEFINITION, and the '
                                                         'parameter values are '
                                                         'correctly inferred '
                                                         'from the '
                    

#### Multiple Tool Calls used by Agent to respond

In [None]:
query = "How is the weather in Seattle ?"
tool_calls = [{
                "type": "tool_call",
                "tool_call": {
                    "id": "call_eYtq7fMyHxDWIgeG2s26h0lJ",
                    "type": "function",
                        "function": {
                            "name": "fetch_weather",
                            "arguments": {
                                "location": "New York"
                            }
                        }
                }
            },
            {
                "type": "tool_call",
                "tool_call": {
                    "id": "call_eYtq7fMyHxDWIgeG2s26h0lJ",
                    "type": "function",
                        "function": {
                            "name": "fetch_weather",
                            "arguments": {
                                "location": "Seattle"
                            }
                        }
                }
            }]

tool_definition = {
                    "id": "fetch_weather",
                    "name": "fetch_weather",
                    "description": "Fetches the weather information for the specified location.",
                    "parameters": {
                        "type": "object",
                        "properties": {
                            "location": {
                                "type": "string",
                                "description": "The location to fetch weather for."
                            }
                        }
                    }
                }

In [5]:
response = tool_call_accuracy(query=query, tool_calls=tool_calls, tool_definitions=tool_definition)
pprint(response)

{'per_tool_call_details': [{'tool_call_accurate': False,
                            'tool_call_accurate_reason': 'The TOOL CALL is '
                                                         'irrelevant to the '
                                                         "user's query about "
                                                         "Seattle's weather, "
                                                         'as it attempts to '
                                                         'fetch weather '
                                                         'information for New '
                                                         'York instead. The '
                                                         'parameters used do '
                                                         'not align with the '
                                                         "user's needs, "
                                                         'leading to a score '
         

#### Tool Calls passed as part of `Response` (common for agent case)
- Tool Call Accuracy Evaluator extracts tool calls from response

In [8]:
query = "Can you send me an email with weather information for Seattle?"
response = [
    {
        "createdAt":"2025-03-19T03:27:23Z",
        "run_id":"run_er847tIPYcYO94i2C1vkfOeZ",
        "role":"assistant",
        "content":[
            {
                "type":"tool_call",
                "tool_call":{
                    "id":"call_WfANEHcwQ5E6nsDXWMzsoCXM",
                    "type":"function",
                    "function":{
                        "name":"fetch_weather",
                        "arguments":{
                            "location":"Seattle"
                            }
                        }
                    }
            }
        ]
    },
    {
        "createdAt":"2025-03-19T03:27:25Z",
        "run_id":"run_er847tIPYcYO94i2C1vkfOeZ",
        "tool_call_id":"call_WfANEHcwQ5E6nsDXWMzsoCXM",
        "role":"tool",
        "content":[
            {
                "type":"tool_result",
                "tool_result":{
                    "weather":"Weather data not available for this location."
                }
            }
        ]
    },
    {
        "createdAt":"2025-03-19T03:27:26Z",
        "run_id":"run_er847tIPYcYO94i2C1vkfOeZ",
        "role":"assistant",
        "content":[
            {
                "type":"tool_call",
                "tool_call":{
                    "id":"call_Q0XvtDRW43Fvgy012GzjjCek",
                    "type":"function",
                    "function":{
                        "name":"send_email",
                        "arguments":{
                            "recipient":"you@example.com",
                            "subject":"Weather Information for Seattle",
                            "body":"Weather data not available for this location."
                        }
                    }
                }
            }
        ]
    },
    {
        "createdAt":"2025-03-19T03:27:28Z",
        "run_id":"run_er847tIPYcYO94i2C1vkfOeZ",
        "tool_call_id":"call_Q0XvtDRW43Fvgy012GzjjCek",
        "role":"tool",
        "content":[
            {
                "type":"tool_result",
                "tool_result":{
                    "message":"Email successfully sent to you@example.com."
                }
            }
        ]
    },
    {
        "createdAt":"2025-03-19T03:27:29Z",
        "run_id":"run_er847tIPYcYO94i2C1vkfOeZ",
        "role":"assistant",
        "content":[
            {
                "type":"text",
                "text":"I have sent you an email with the weather information for Seattle. The weather data was not available for this location."
            }
            ]
    }]

tool_definitions = [
    {
		"name": "fetch_weather",
		"description": "Fetches the weather information for the specified location.",
		"parameters": {
			"type": "object",
			"properties": {
				"location": {
					"type": "string",
					"description": "The location to fetch weather for."
				}
			}
		}
	},
    {
		"name": "send_email",
		"description": "Sends an email with the specified subject and body to the recipient.",
		"parameters": {
			"type": "object",
			"properties": {
				"recipient": {
					"type": "string",
					"description": "Email address of the recipient."
				},
				"subject": {
					"type": "string",
					"description": "Subject of the email."
				},
				"body": {
					"type": "string",
					"description": "Body content of the email."
				}
			}
		}
	}
]

In [10]:
response = tool_call_accuracy(query=query, response=response, tool_definitions=tool_definitions)
pprint(response)

{'per_tool_call_details': [{'tool_call_accurate': True,
                            'tool_call_accurate_reason': 'The TOOL CALL is '
                                                         'directly relevant to '
                                                         "the user's request "
                                                         'for weather '
                                                         'information for '
                                                         'Seattle, includes '
                                                         'appropriate '
                                                         'parameters that '
                                                         'match the TOOL '
                                                         'DEFINITION, and uses '
                                                         'correct parameter '
                                                         'values inferred from '
                  