# Tool Use Deep Dive

Tool calling refers to the ability of AI models to interact with external tools, application programming interfaces (APIs) or systems to enhance their functions. A crucial aspect of AI Agents is their ability to take actions. This happens through the use of Tools.

### What are AI Tools?

A Tool is a function given to the LLM. This function should fulfill a clear objective. Example:

- **Web Search**: Allows the agent to fetch up-to-date information from the internet.
- **Image Generation**: Creates images based on text descriptions.
- **Retrieval**:	Retrieves information from an external source.
- **API Interface**:	Interacts with an external API (GitHub, YouTube, Spotify, etc.).

LLMs predict the completion of a prompt based on their training data. So, to get up-to-date information, we need to ground the model to external data sources. This can done using tools.

In [None]:
from dotenv import load_dotenv
load_dotenv()

In [None]:
import os


API_KEY = os.getenv("GEMINI_API_KEY")
if not API_KEY:
    raise ValueError("Please set GEMINI_API_KEY environment variable.")

In [None]:
from google import genai
from google.genai import types


client = genai.Client(api_key=API_KEY)

In [None]:
response = client.models.generate_content(
    model="gemini-2.5-flash",
    contents="Who won the nobel peace prize in 2025?"
)

print(response.text)

In [None]:
grounding_tool = types.Tool(
    google_search=types.GoogleSearch()
)

config = types.GenerateContentConfig(
    tools=[grounding_tool]
)

response = client.models.generate_content(
    model="gemini-2.5-flash",
    contents="Who won the nobel peace prize in 2025?",
    config=config,
)

print(response.text)

### How to write a good tool?

A Tool should contain:

- A textual description of what the function does.
- A Callable (something to perform an action).
- Arguments with typings.
- (Optional) Outputs with typings.

In [None]:
def my_to_do_list(name: str) -> str:
    """Read the to-do list from a file starting with the name and return the list.
    
    Args:
        name: The name of the person whose to-do list to read.
    
    Returns:
        The to-do list as a string.
    """
    with open(f"{name}-To-do-list.txt", "r") as f:
        result = f.read()
    return result


In [None]:
to_do_list_declaration = {
    "name": "my_to_do_list",
    "description": "Reads the to-do list of a person and return the list.",
    "parameters": {
        "type": "object",
        "properties": {
            "name": {
                "type": "string",
                "description": "The name of person and the file prefix.",
            },
        },
        "required": ["name"],
    },
}

In [None]:
to_do_list_tool = types.Tool(
    function_declarations=[
        to_do_list_declaration
    ]
)

In [None]:
config = types.GenerateContentConfig(
    tools=[to_do_list_tool]
)

contents = [
    types.Content(
        role="user", parts=[types.Part(text="What is Shahad's to-do list?")]
    )
]

response = client.models.generate_content(
    model="gemini-2.5-flash",
    contents=contents,
    config=config,
)

tool_call = response.candidates[0].content.parts[0].function_call
print(tool_call)

In [None]:
if tool_call.name == "my_to_do_list":
    result = my_to_do_list(**tool_call.args)

In [None]:
function_response_part = types.Part.from_function_response(
    name=tool_call.name,
    response={"result": result},
)

contents.append(response.candidates[0].content)
contents.append(types.Content(role="user", parts=[function_response_part]))

In [None]:
final_response = client.models.generate_content(
    model="gemini-2.5-flash",
    config=config,
    contents=contents,
)

print(final_response.text)

# Actions

Actions are the concrete steps an AI agent takes to interact with its environment.

Whether it’s browsing the web for information or controlling a physical device, each action is a deliberate operation executed by the agent.

For example, an agent assisting with customer service might retrieve customer data, offer support articles, or transfer issues to a human representative.

There are multiple types of Agents that take actions differently:

- **JSON Agent**:	The Action to take is specified in JSON format.
- **Code Agent**:	The Agent writes a code block that is interpreted externally.
- **Function-calling Agent**:	It is a subcategory of the JSON Agent which has been fine-tuned to generate a new message for each action.


### The Stop and Parse Approach
One key method for implementing actions is the stop and parse approach. This method ensures that the agent’s output is structured and predictable:

**Generation in a Structured Format**:
The agent outputs its intended action in a clear, predetermined format (JSON or code).

**Halting Further Generation**:
Once the text defining the action has been emitted, the LLM stops generating additional tokens. This prevents extra or erroneous output.

**Parsing the Output**:
An external parser reads the formatted action, determines which Tool to call, and extracts the required parameters.

For example, an agent needing to check the weather might output:

```

Thought: I need to check the current weather for New York.
Action :
{
  "action": "get_weather",
  "action_input": {"location": "New York"}
}

```
The framework can then easily parse the name of the function to call and the arguments to apply.

This clear, machine-readable format minimizes errors and enables external tools to accurately process the agent’s command.

Note: Function-calling agents operate similarly by structuring each action so that a designated function is invoked with the correct arguments. We’ll dive deeper into those types of Agents in a future Unit.



### Code Agents


An alternative approach is using Code Agents. The idea is: instead of outputting a simple JSON object, a Code Agent generates an executable code block—typically in a high-level language like Python.

This approach offers several advantages:

- **Expressiveness**: Code can naturally represent complex logic, including loops, conditionals, and nested functions, providing greater flexibility than JSON.
- **Modularity and Reusability**: Generated code can include functions and modules that are reusable across different actions or tasks.
- **Enhanced Debuggability**: With a well-defined programming syntax, code errors are often easier to detect and correct.
- **Direct Integration**: Code Agents can integrate directly with external libraries and APIs, enabling more complex operations such as data processing or real-time decision making.