## Test Tool Calling from AI Agent

### Setting-up dotEnv & LLM 

In [19]:
from dotenv import load_dotenv
from langchain_ollama import ChatOllama

load_dotenv(override=True)
load_dotenv('./../.env')

llm = ChatOllama(
    base_url="http://localhost:11434",
    model="qwen3:latest",
    temperature=0.5,
    max_tokens=300
)

### Connect to DeepEval

In [20]:
import deepeval

deepeval.login_with_confident_api_key("confident_us_DNEaOTujz584SicCTCNGpP7snhHa3APERq7FhaTvpDI=")

### Creating the Tools for AI Agent

In [21]:
from langchain_core.tools import tool

@tool
def add_numbers(a: int, b: int) -> int:
    "Add two numbers and return the result"
    return a+b

@tool
def subtract_numbers(a: int, b: int) -> int:
    "Subtract two numbers and return the result"
    return a-b

@tool
def multiply_numbers(a: float, b: float) -> float:
    "Multiply two numbers and return the result"
    return a*b


In [22]:
list_of_tools = [add_numbers, subtract_numbers, multiply_numbers]

### Setting-up AI Agent to Invoke Custom Tooling

In [24]:
from langchain.agents import initialize_agent, AgentType

agent = initialize_agent(
    llm=llm,
    tools=list_of_tools,
    agent=AgentType.STRUCTURED_CHAT_ZERO_SHOT_REACT_DESCRIPTION,
    verbose=True,
    return_intermediate_steps=True
)

response = agent.invoke("I had 35 dollars. I spent 20 & got a 10 percent cashback, how much do I have left?")

print(response)



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m<think>
Okay, let's see. The user started with $35. They spent $20, so first I need to subtract 20 from 35. That would be 35 - 20 = 15. Then, they got a 10% cashback on the $20 spent. Wait, cashback usually refers to getting a percentage back on the amount spent. So 10% of 20 is 2. So adding that back to the remaining 15, the total would be 15 + 2 = 17. Let me check if the tools available can handle this. The subtract_numbers tool can handle the initial subtraction. Then, to calculate the cashback, maybe multiply 20 by 10% (which is 0.10). But the multiply_numbers tool takes two numbers. So 20 * 0.10 = 2. Then add that to the remaining amount. So first subtract 20 from 35, then calculate 10% of 20, and add that result to the remaining 15. So the steps are: 35 - 20 = 15, then 20 * 0.10 = 2, then 15 + 2 = 17. So the final answer should be 17.
</think>

Thought: I need to calculate the remaining money after spending and adding t

### Preparing Golden Dataset for DeepEval Testcase

In [25]:
from deepeval.test_case import ToolCall

golden_dataset = [

    {
        "question": "What is 10 - 4?",
        "expected_answer": "The difference between 10 and 4 is 6.",
        "tool_called": [
            ToolCall(name="subtract_numbers", input_parameters={"a": 10, "b": 4})
        ]
    },
    {
        "question": "What is 2.5 * 4?",
        "expected_answer": "The product of 2.5 and 4 is 10.0.",
        "tool_called": [
            ToolCall(name="multiply_numbers", input_parameters={"a": 2.5, "b": 4})
        ]
    },
    {
        "question": "My salary was $9500 & I got a 15 percent raise yesterday, how much do I earn now?",
        "expected_answer": "You now earn $10925 after a 15 percent raise on your previous salary of $9500."  ,
        "tool_called": [
            ToolCall(name="multiply_numbers", input_parameters={"a": 9500, "b": 1.15})
        ]
    },
    {
        "question": "I had 35 dollars. I spent 20 & got a 10 percent cashback, how much do I have left?",
        "expected_answer": "You have a total of 17 dollars left after spending 20 dollars and receiving a 10 percent cashback of 2 dollars.",
        "tool_called": [
            ToolCall(name="subtract_numbers", input_parameters={"a": 35, "b": 20}),
            ToolCall(name="multiply_numbers", input_parameters={"a": 20, "b": 0.1}),
            ToolCall(name="add_numbers", input_parameters={"a": 15, "b": 2})
        ]
    }
    
]

golden_dataset

[{'question': 'What is 10 - 4?',
  'expected_answer': 'The difference between 10 and 4 is 6.',
  'tool_called': [ToolCall(
       name="subtract_numbers",
       input_parameters={
           "a": 10,
           "b": 4
       }
   )]},
 {'question': 'What is 2.5 * 4?',
  'expected_answer': 'The product of 2.5 and 4 is 10.0.',
  'tool_called': [ToolCall(
       name="multiply_numbers",
       input_parameters={
           "a": 2.5,
           "b": 4
       }
   )]},
 {'question': 'My salary was $9500 & I got a 15 percent raise yesterday, how much do I earn now?',
  'expected_answer': 'You now earn $10925 after a 15 percent raise on your previous salary of $9500.',
  'tool_called': [ToolCall(
       name="multiply_numbers",
       input_parameters={
           "a": 9500,
           "b": 1.15
       }
   )]},
 {'question': 'I had 35 dollars. I spent 20 & got a 10 percent cashback, how much do I have left?',
  'expected_answer': 'You have a total of 17 dollars left after spending 20 dollar

### Upload Golden Dataset to DeepEval

In [26]:
from deepeval.dataset import EvaluationDataset, Golden

goldens = []

for data in golden_dataset:
    golden = Golden(
        input=data["question"],
        expected_output=data["expected_answer"]
    )
    goldens.append(golden)
    
eval_dataset = EvaluationDataset(goldens=goldens)

eval_dataset

EvaluationDataset(test_cases=[], goldens=[Golden(input='What is 10 - 4?', actual_output=None, expected_output='The difference between 10 and 4 is 6.', context=None, retrieval_context=None, additional_metadata=None, comments=None, tools_called=None, expected_tools=None, source_file=None, name=None, custom_column_key_values=None), Golden(input='What is 2.5 * 4?', actual_output=None, expected_output='The product of 2.5 and 4 is 10.0.', context=None, retrieval_context=None, additional_metadata=None, comments=None, tools_called=None, expected_tools=None, source_file=None, name=None, custom_column_key_values=None), Golden(input='My salary was $9500 & I got a 15 percent raise yesterday, how much do I earn now?', actual_output=None, expected_output='You now earn $10925 after a 15 percent raise on your previous salary of $9500.', context=None, retrieval_context=None, additional_metadata=None, comments=None, tools_called=None, expected_tools=None, source_file=None, name=None, custom_column_key_v

In [27]:
eval_dataset.push("TestToolCalling Dataset")

### Creating LLMTestCase and Invoking the AI Agent

In [None]:
def query_agent(question):
    
    response = agent.invoke(question)
    intermidiate_steps = response['intermediate_steps']

    for agent_action, result in intermidiate_steps:
        
        tool = agent_action.tool
        tool_input = agent_action.tool_input
        tool_output = result

        print(tool)
        print(tool_input)
        print(tool_output)
        
        return tool, tool_input, tool_output
        
query_agent(question="I had 35 dollars. I spent 20 & got a 10 percent cashback, how much do I have left?")




[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m<think>
Okay, let's see. The user started with $35. They spent $20, so first I need to subtract 20 from 35. That would leave them with $15. But then there's a 10% cashback. Wait, cashback on the spent amount or the remaining? Usually, cashback is on the amount spent. So 10% of $20 is $2. So adding that back to the remaining $15 would give $17. Let me check the steps again. Subtract 20 from 35, then add 10% of 20. So 35 - 20 = 15, then 20 * 0.10 = 2. 15 + 2 = 17. So the final amount should be $17.
</think>

Thought: I need to calculate the remaining money after spending and adding the cashback. First, subtract the spent amount from the initial amount, then add the cashback which is 10% of the spent amount.

Action:
```
{
  "action": "subtract_numbers",
  "action_input": {
    "a": 35,
    "b": 20
  }
}
```

[0m
Observation: [33;1m[1;3m15[0m
Thought:[32;1m[1;3m<think>
Okay, so the user had $35 and spent $20. After subtrac