## Test Tool Calling from AI Agent

### Setting-up dotEnv & LLM 

In [1]:
from dotenv import load_dotenv
from langchain_ollama import ChatOllama

load_dotenv(override=True)
load_dotenv('./../.env')

llm = ChatOllama(
    base_url="http://localhost:11434",
    model="qwen3:latest",
    temperature=0.5,
    max_tokens=300
)

### Connect to DeepEval

In [2]:
import deepeval

deepeval.login_with_confident_api_key("confident_us_DNEaOTujz584SicCTCNGpP7snhHa3APERq7FhaTvpDI=")

### Creating the Tools for AI Agent

In [3]:
from langchain_core.tools import tool

@tool
def add_numbers(a: int, b: int) -> int:
    "Use this tool to Add two numbers and return the result"
    return a+b

@tool
def subtract_numbers(a: int, b: int) -> int:
    "Use this tool to Subtract two numbers and return the result"
    return a-b

@tool
def multiply_numbers(a: float, b: float) -> float:
    "Use this tool to Multiply two numbers and return the result"
    return a*b


In [4]:
list_of_tools = [add_numbers, subtract_numbers, multiply_numbers]

### Setting-up AI Agent to Invoke Custom Tooling

In [5]:
from langchain.agents import initialize_agent, AgentType

agent = initialize_agent(
    llm=llm,
    tools=list_of_tools,
    agent=AgentType.STRUCTURED_CHAT_ZERO_SHOT_REACT_DESCRIPTION,
    verbose=True,
    return_intermediate_steps=True
)

response = agent.invoke("I had 35 dollars. I spent 20 & got a 10 percent cashback, how much do I have left?")

print(response['intermediate_steps'])

  agent = initialize_agent(




[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m<think>
Okay, let's see. The user started with $35. They spent $20, so first I need to subtract 20 from 35. That would be 35 - 20 = 15. Then, they got a 10% cashback on the $20 spent. Wait, cashback is usually a percentage of the amount spent. So 10% of 20 is 2. So adding that back to the remaining 15, the total would be 15 + 2 = 17. Let me check if I need to use any tools here. The tools available are add_numbers, subtract_numbers, multiply_numbers. So subtract 20 from 35 first, then calculate 10% of 20, which is 2, then add that back. So the steps are subtract_numbers(35,20) gives 15, then multiply_numbers(20, 0.10) gives 2, then add_numbers(15,2) gives 17. So the final answer should be 17.
</think>

Thought: I need to calculate the remaining money after spending and adding the cashback. First, subtract the spent amount from the initial amount, then add the cashback which is 10% of the spent amount.

Action:
```
{
  "action

### Preparing Golden Dataset for DeepEval Testcase

In [6]:
from deepeval.test_case import ToolCall

golden_dataset = [

    {
        "question": "What is 10 - 4?",
        "expected_answer": "The difference between 10 and 4 is 6.",
        "tool_called": [
            ToolCall(name="subtract_numbers", input_parameters={"a": 10, "b": 4})
        ]
    },
    {
        "question": "What is 2.5 * 4?",
        "expected_answer": "The product of 2.5 and 4 is 10.0.",
        "tool_called": [
            ToolCall(name="multiply_numbers", input_parameters={"a": 2.5, "b": 4})
        ]
    },
    {
        "question": "My salary was $9500 & I got a 15 percent raise yesterday, how much do I earn now?",
        "expected_answer": "You now earn $10925 after a 15 percent raise on your previous salary of $9500."  ,
        "tool_called": [
            ToolCall(name="multiply_numbers", input_parameters={"a": 9500, "b": 1.15})
        ]
    },
    {
        "question": "I had 35 dollars. I spent 20 & got a 10 percent cashback, how much do I have left?",
        "expected_answer": "You have a total of 17 dollars left after spending 20 dollars and receiving a 10 percent cashback of 2 dollars.",
        "tool_called": [
            ToolCall(name="subtract_numbers", input_parameters={"a": 35, "b": 20}),
            ToolCall(name="multiply_numbers", input_parameters={"a": 20, "b": 0.1}),
            ToolCall(name="add_numbers", input_parameters={"a": 15, "b": 2})
        ]
    }
    
]

golden_dataset

[{'question': 'What is 10 - 4?',
  'expected_answer': 'The difference between 10 and 4 is 6.',
  'tool_called': [ToolCall(
       name="subtract_numbers",
       input_parameters={
           "a": 10,
           "b": 4
       }
   )]},
 {'question': 'What is 2.5 * 4?',
  'expected_answer': 'The product of 2.5 and 4 is 10.0.',
  'tool_called': [ToolCall(
       name="multiply_numbers",
       input_parameters={
           "a": 2.5,
           "b": 4
       }
   )]},
 {'question': 'My salary was $9500 & I got a 15 percent raise yesterday, how much do I earn now?',
  'expected_answer': 'You now earn $10925 after a 15 percent raise on your previous salary of $9500.',
  'tool_called': [ToolCall(
       name="multiply_numbers",
       input_parameters={
           "a": 9500,
           "b": 1.15
       }
   )]},
 {'question': 'I had 35 dollars. I spent 20 & got a 10 percent cashback, how much do I have left?',
  'expected_answer': 'You have a total of 17 dollars left after spending 20 dollar

### Upload Golden Dataset to DeepEval

In [7]:
from deepeval.dataset import EvaluationDataset, Golden

goldens = []

for data in golden_dataset:
    golden = Golden(
        input=data['question'],
        expected_output=data['expected_answer'],
        expected_tools=data['tool_called']
    )
    goldens.append(golden)
    
eval_dataset = EvaluationDataset(goldens=goldens)

eval_dataset

EvaluationDataset(test_cases=[], goldens=[Golden(input='What is 10 - 4?', actual_output=None, expected_output='The difference between 10 and 4 is 6.', context=None, retrieval_context=None, additional_metadata=None, comments=None, tools_called=None, expected_tools=[ToolCall(
    name="subtract_numbers",
    input_parameters={
        "a": 10,
        "b": 4
    }
)], source_file=None, name=None, custom_column_key_values=None), Golden(input='What is 2.5 * 4?', actual_output=None, expected_output='The product of 2.5 and 4 is 10.0.', context=None, retrieval_context=None, additional_metadata=None, comments=None, tools_called=None, expected_tools=[ToolCall(
    name="multiply_numbers",
    input_parameters={
        "a": 2.5,
        "b": 4
    }
)], source_file=None, name=None, custom_column_key_values=None), Golden(input='My salary was $9500 & I got a 15 percent raise yesterday, how much do I earn now?', actual_output=None, expected_output='You now earn $10925 after a 15 percent raise on y

In [8]:
eval_dataset.push("TestToolCalling Dataset")

Aborted.


### Creating LLMTestCase and Invoking the AI Agent

In [11]:
def query_agent(question):
    
    agent_response = agent.invoke(question)
    intermidiate_steps = agent_response['intermediate_steps']

    tools_called_list=[]
    
    for agent_action, result in intermidiate_steps:    
        tools_called_list.append(ToolCall(name=agent_action.tool, input_parameters=agent_action.tool_input, output=result))
    
    return tools_called_list, agent_response

In [12]:
from deepeval.test_case import LLMTestCase

for golden in goldens:
    input = golden.input
    tools_called_list, response = query_agent(input)

 
    test_case = LLMTestCase(
        input=input,
        actual_output=response,
        expected_tools=golden.tools_called,
        expected_output=golden.expected_output,
        tools_called=tools_called_list
    )

    print(tools_called_list)
    eval_dataset.add_test_case(test_case)

eval_dataset.test_cases

Output()



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m<think>
Okay, the user is asking "What is 10 - 4?" So I need to subtract 4 from 10. Let me check which tool to use. The available tools are add_numbers, subtract_numbers, and multiply_numbers. Since the question is about subtraction, the right tool is subtract_numbers. The parameters for subtract_numbers are a and b, both integers. So I should set a to 10 and b to 4. Let me make sure I'm using the correct syntax for the JSON blob. The action should be "subtract_numbers" and the action_input should be {"a": 10, "b": 4}. That should give the result of 6. Then I can present the final answer as 6.
</think>

Action:
```
{
  "action": "subtract_numbers",
  "action_input": {
    "a": 10,
    "b": 4
  }
}
```
[0m
Observation: [33;1m[1;3m6[0m
Thought:[32;1m[1;3m<think>
</think>

Action:
```
{
  "action": "Final Answer",
  "action_input": "The result of 10 - 4 is 6."
}
```[0m

[1m> Finished chain.[0m
[ToolCall(
    name="subtr

[LLMTestCase(input='What is 10 - 4?', actual_output={'input': 'What is 10 - 4?', 'output': 'The result of 10 - 4 is 6.', 'intermediate_steps': [(AgentAction(tool='subtract_numbers', tool_input={'a': 10, 'b': 4}, log='<think>\nOkay, the user is asking "What is 10 - 4?" So I need to subtract 4 from 10. Let me check which tool to use. The available tools are add_numbers, subtract_numbers, and multiply_numbers. Since the question is about subtraction, the right tool is subtract_numbers. The parameters for subtract_numbers are a and b, both integers. So I should set a to 10 and b to 4. Let me make sure I\'m using the correct syntax for the JSON blob. The action should be "subtract_numbers" and the action_input should be {"a": 10, "b": 4}. That should give the result of 6. Then I can present the final answer as 6.\n</think>\n\nAction:\n```\n{\n  "action": "subtract_numbers",\n  "action_input": {\n    "a": 10,\n    "b": 4\n  }\n}\n```\n'), 6)]}, expected_output='The difference between 10 and 