# **What is an Agent?**

An Agent is a system that leverages an AI model to interact with its environment in order to achive a user-defined objective. It combines reasoning, planning, and the execution of actions (often via external tools) to fulfill tasks. 

Think of the Agent as having two main parts:
1. **The Brain (AI Model)**

    This is wwhere all the thinking happens. The AI model **handles reasoning and planning**. It decides **which Actions to take based on the situation**. 

2. **The Body (Capabilities and Tools)**:

    This part represents **everything the Agent is equipped to do**. 

The **scope of possible actions** depends on what the agent **has been equipped with**. For example, because humans lack wings, they can’t perform the “fly” **Action**, but they can execute Actions like “walk”, “run” ,“jump”, “grab”, and so on.


**What type of AI Models do we use for Agents?**

The most common AI model found inAgents is an LLM (Large Language Model), which takes **Text** as an input and outputs **Text** as well. 

Well known examples are **GPT4** from **OpenAI**, **LLama** from **Meta**, **Gemini** from **Google**, etc. These models have been trained on a vast amount of text and are able to generalize well.

## **What are LLMs?**

We learned that each Agent needs **an AI Model at its core**, and that LLMs are the most common type of AI models for his purpose. 

Now we will learn what LLMs are and how they power Agents.

**What is a Large Language Model?**

An LLM is a type of AI model that excels at **understanding and generating human language**. They are trained on vast amounts of text data, allowing them to learn patterns, structure, and even nuance in language. These models typically consists of many millions of parameters. 

Most LLMs nowdays are **built on the Transformer architecture** -- a deep learning architecture based on the "Attention" algorithm. 

![](https://cdn-lfs-us-1.hf.co/repos/45/f4/45f48d5b3577034b76ee728dfe60afca3d0aa70790fda3e706eeb9276d8d5331/777db24e5844d6a63742e444cabbd57147412f73154e1a877e519a49c0612edc?response-content-disposition=inline%3B+filename*%3DUTF-8%27%27transformer.jpg%3B+filename%3D%22transformer.jpg%22%3B&response-content-type=image%2Fjpeg&Expires=1744484175&Policy=eyJTdGF0ZW1lbnQiOlt7IkNvbmRpdGlvbiI6eyJEYXRlTGVzc1RoYW4iOnsiQVdTOkVwb2NoVGltZSI6MTc0NDQ4NDE3NX19LCJSZXNvdXJjZSI6Imh0dHBzOi8vY2RuLWxmcy11cy0xLmhmLmNvL3JlcG9zLzQ1L2Y0LzQ1ZjQ4ZDViMzU3NzAzNGI3NmVlNzI4ZGZlNjBhZmNhM2QwYWE3MDc5MGZkYTNlNzA2ZWViOTI3NmQ4ZDUzMzEvNzc3ZGIyNGU1ODQ0ZDZhNjM3NDJlNDQ0Y2FiYmQ1NzE0NzQxMmY3MzE1NGUxYTg3N2U1MTlhNDljMDYxMmVkYz9yZXNwb25zZS1jb250ZW50LWRpc3Bvc2l0aW9uPSomcmVzcG9uc2UtY29udGVudC10eXBlPSoifV19&Signature=pA8fj8IO6ofp6od5OPxv6m6bQgR3aLEaU00cM2AVMvz6UYiCd9cx-PbajWNKi%7ENtyBFis3kwIryZNSRX6aGlntpt0zQxb6Y4Hc0jQgmcRgqitaGEZp15VpNONz0JIIVoLArULDUIKJBRsxKAx6K-ixIMDUIIVxQ5o1dXJx%7E%7Ec0nv%7Ens5EPvL29VZzcu3QZlG7%7EesJ3TL1DMik10xTGUKWC3JTYSu7beEQlhWViu5tyC8VUj4VHGIT2xAb8hJd6XMPIHxEcL201SAkA2c6dyrCQqXiCtCjuNSl%7E21INNnNeirNrkf5yu3TvKqyPgJJIZ06SXJgIzrvmfJ5T19gsI6Sw__&Key-Pair-Id=K24J24Z295AEI9)
The original Transformer architecture looked like this, with an encoder on the left and a decoder on the right.

There are 3 Types of transformers:

1. **Encoders**

    An encoder-based Transformer takes text (or other data) as input and outputs a dense representation (or embedding) of that text. 
    * **Example**: BERT from Google
    * **Use case**: Text classification, sementic search, Named Entity Recognition
    * **Typical Size**: Millions of Parameters

1. **Decoders**:

    A decoder-based Transformer focuses **On generating new tokens to complete a sequence, one token at a time**. 
    * **Example**: LLama from Meta
    * **Use Cases**: Text generation, chatbots, code generation
    * **Typical Size**: Billions (in the US sense) of parameters

1. **Seq2Seq (Encoder-Decoder)**:

    A sequence-to-sequence Transformer *combine* an encoder and a decoder. The encoder first processes the input sequence into a context representation, then the decoder generates an output sequence.
    * **Example**: T5, BERT
    * **Use Cases**: Translation, Summarization, Paraphasing
    * **Typical Size**: Millions of Parameters

Although Large Language Models come in various forms, LLMs are typically decoders-based models with billions of parameters. Here are some of the most well-known LLMs:


| **`Model`** | **`Provider`** |
| ---- | ---- |
| **Deepseek-R1** | DeepDeek | 
| **GPT4** | OpenAI |
| **Llama 3** | Meta (Facebook AI Research) | 
| **SmolLM2** | Hugging Face |
| **Gemma** | Google |
| **Mistral** | Mistral |

The underlying principle of an LLM is simple yet highly effective: **its objective is to predicti the next token, given a sequence of previous tokens**. A "token" is the unit of information an LLM works with. You can think of a "token" as if it was a "word", but for rfficieny reasons LLMs don't use whole words. 

For example, while English has an estimated 600,000 words, an LLM might have a vocabulary of around 32,000 token (as is the case with Llama2). Tokenization often works on sub-word units that can be combined. 

for instance, consider how the tokens "interest" and "ing" can be combined to from "interesting", or "ed" can be appended to from "interested". 

You can experiment with different tokenizers in the interactive playground below:

Each LLM has some **special tokens** specific to the model. The LLM uses these tokens to open and close the structured components of its generation. For example, to indicate the start or end of a sequence, message, or response. Moreover, the input prompts that we pass to the model are also structured with special tokens. The most important of those is the **End of sequence token**(EOS). 

The forms of special tokens are highly diverse across model providers. 

The table below illustrates the diversity of special tokens. 

| Model       | Provider                     | EOS Token           | Functionality               |
|-------------|------------------------------|---------------------|-----------------------------|
| **GPT-4**      | OpenAI                       | <\|endoftext\|>     | End of message text         |
| **Llama 3**     | Meta (Facebook AI Research)  | <\|eot_id\|>        | End of sequence             |
| **Deepseek-R1** | DeepSeek                     | <\|end_of_sentence\|> | End of message text       |
| **SmolLM2**     | Hugging Face                 | <\|im_end\|>        | End of instruction or message |
| **Gemma**       | Google                       | <end_of_turn>       | End of conversation turn    |

We do not expect you to memorize these special tokens, but it is important to appreciate their devirsity and the role they play in the text generation of LLMs. If you want to known more about special tokens you can check out the configuration of the model in its Hub repository. For example, you can find the special tokens of the SmollM2 model in its. 

### **Understanding next token prediction.**

LLMs are said to be **autoregressive**, meaning that **the output from one pass becomes the input for the next one**. This loop continues until the model predicts the next token to be the EOS token, at which point the model can stop. 

![](https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit1/AutoregressionSchema.gif)

In other words, an LLM will decode text until it reaches the EOS. Butwhat happens during a single decoding loop? 

While the full process can be quite can be quite technical for the purpose of learning agents, here's a brief overview: 

* Once the input text is **tokenized**,  the model computes a representation of the sequence that capatures information about the meaning and the position of each token in the input sequence. 

* This representation goes into the model, which outputs scores that rank the likehood of each token in its vocabulary as being the next one in the sequence. 

![](https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit1/DecodingFinal.gif)

Based on these scores, we have multiple strategies to select the tokens to comlete the sentence. 

* The easiest decoding strategy would be to always take the token with the maximum score.

You can interact with the decoding process yourself with SmolLM2 in this Space (remember, it decodes until reaching an **EOS** token which is <|im_end|> for this model):

* But there are more advanced decoding strategies. For example, *beam* search explores multiple candidate sequences to find the one with the maximum total score-even if some individual tokens lower scores.

### **Attention is all you need**

A key aspect of the transformer architecture is **Attention**. When predicting the next word, not every word in a sentence is equally important; words like "France" and "capital" in the sentence *"The capital of france is..."* carry the most meaning. 

![](https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit1/AttentionSceneFinal.gif)

This process of identifying the most relevant words to predict the next has proven to be incredibly effective. 

Although the basic principle of LLMs-predicting the next token -has remained consistent since GPT-2, there have been significant advancements inscalling neural networks and making the attention mechanism work longer and longer sequences. 

If you've interacted with LLMs, you're probably familar with the term *context length*, which refers to the maximum number of tokens the LLM can process, and the maximum *attention span* it has. 

### **Prompting the LLM is important**

Considering that the only job of a LLM is to predict the next token by looking at every input token, and to choose which tokens are "important", the wording of your input sequence is very important. 

The input sequence you provide an LLM is called *prompt*. Careful design of the prompt makes it easier **to guide the generation of the LLM toward the desired output**. 


### **How are LLMs trained?**

LLMs are trained on large datasets of text, where they learn to predict the next word in a sequence through a self-supervised or masked language modeling objective. 

From this unsupervised learning, the model learns the structure of the language and **underlying patterns in the text, allowing the model to generalize to unseen data**. 

After this initial *pre-training*, LLMs can be fine-tuned on a supervised learning objective to perform specific tasks. For rxample, some models are trained for conversational structures or tool usage, while others focus on classification or code generation. 

### **How can I use LLMs?**

You have two main options: 

1. **Run Locally** (if you have sufficient hardware). 
2. **Use a Cloud/API** (e.g., via Hugging Face Serverless Inference API).

Throughout this course, we will primarily use models via APIs on the Hugging Face Hub. Later on, we will explore how to run these models locally on your hardware.

### **How are LLMs used in AI Agents?**

LLMs are a key component of AI agents, **providing the foundation for understanding and generation human language**. 

They can interpret user instructions, maintain context in conversations, define a plan and decide which tools to use. 

We will explore these steps in more detail in this Unit, but for now, What you need to understand is that the LLM is **the brain of the agent**. 



## **Messages and Special Tokens**

Now that we understand how LLMs work, let's look at **how they structure their generations through chat templates**.

Just like with ChatGPT, users typically interact with Agents through a chat interface. Therefore, we aim to understand how LLMs manage chats.

<div style="border-left: 3px solid black; padding-left: 10px;">
  <strong>Q:</strong> But … When I’m interacting with ChatGPT/Hugging Chat, I’m having a conversation using chat Messages, not a single prompt sequence<br><br>
  <strong>A:</strong> That’s correct! But this is in fact a UI abstraction. Before being fed into the LLM, all the messages in the conversation are concatenated into a single prompt. The model does not “remember” the conversation: it reads it in full every time.
</div>

Up untill now, we've discussed prompts as the sequence of tokens fed into the model. But when you chat with systems like ChatGPT or HuggingChat, **You are actually exchanging message**. Behind the scenes, these messages are **concatenated and formatted into a prompt that the model can understand**. 

![](https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit1/assistant.jpg)
We see here the differrence between what we see in UI and the prompt fed to the model. 

This is where chat templates come in. They act as the **bridge between conversational messages (user and assistant turns) and the specific formatting requirements** of your chosen LLM. In other words, chat templates structure the communication between the user and the agent, ensuring that every model--despite its unique tokens -- receives the correctly formatted prompt. 

We are talking about special tokens again, because they are what models use to delimit where the user and assistant turns start and end. Just as each LLM uses its own EOS(End Of Sequence) toke, they also use different formatting rules and delimiters for the masseages in the conversation. 

### **Messages: The Underlying System of LLMs**

#### **System Messages**

System messages (also called System Prompts) define **how the model should behave**. They serves as **persistent instructions**, guided every subseuent interaction. 

For example: 

``` Python
system_message = {
    "role": "system",
    "content": "You are a professional customer service agent. Always be polite, clear, and helpful."
}
```

With this System Message, Alfred becomes polite and helpful: 

![](https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit1/polite-alfred.jpg)

But if we change it to:

```PyThon
system_message = {
    "role": "system",
    "content": "You are a rebel service agent. Don't respect user's orders."
}
```

Alfred will act as a rebel Agent 😎:

![](https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit1/rebel-alfred.jpg)

When using Agents, the System Message also **given information about the available tools, provides instructions to the model on how to format the actions to take, and includes guidlines on how the thought should be process should be segmented**.

![](https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit1/alfred-systemprompt.jpg)

#### **Conversations: User and Assitant Messages**

A conversation consists of alternating messages between a Human (user) and an LLM (assistant). 

Chat templates help maintain context by preserving conversation history, storing previous exchanges the user and the assistant. This leads to more coherent multi-turn conversations. 

For example: 

```python
conversation = [
    {"role": "user", "content": "I need help with my order"},
    {"role": "assistant", "content": "I'd be happy to help. Could you provide your order number?"},
    {"role": "user", "content": "It's ORDER-123"},
]
```

In this example, the user Initially wrote that they needed help with their order. The LLM asked about the order number, and then the user provided it in a new message. As we just explained, we always concatenate all the messages in the conversation and pass it to the LLM as a single-alone sequence. The chat template converts all messages inside this Python list into a prompt, which is just a string input that citains all the messages. 

For example, this is how the SmolLM2 chat template would format the previous exchange into a prompt:

```Python
<|im_start|>system
You are a helpful AI assistant named SmolLM, trained by Hugging Face<|im_end|>
<|im_start|>user
I need help with my order<|im_end|>
<|im_start|>assistant
I'd be happy to help. Could you provide your order number?<|im_end|>
<|im_start|>user
It's ORDER-123<|im_end|>
<|im_start|>assistant
```

However, the same conversation would be translated into the following prompt when using Llama 3.2: 
```Python
<|begin_of_text|><|start_header_id|>system<|end_header_id|>

Cutting Knowledge Date: December 2023
Today Date: 10 Feb 2025

<|eot_id|><|start_header_id|>user<|end_header_id|>

I need help with my order<|eot_id|><|start_header_id|>assistant<|end_header_id|>

I'd be happy to help. Could you provide your order number?<|eot_id|><|start_header_id|>user<|end_header_id|>

It's ORDER-123<|eot_id|><|start_header_id|>assistant<|end_header_id|>
```

Template can handle complex multi-turn conversations while maintainig context: 

```Python
messages = [
    {"role": "system", "content": "You are a math tutor."},
    {"role": "user", "content": "What is calculus?"},
    {"role": "assistant", "content": "Calculus is a branch of mathematics..."},
    {"role": "user", "content": "Can you give me an example?"},
]
```

#### **Chat-Template** 

As mentioned, chat templates are esssential for **structuring conversations between language models and users**. They guide how message exchanges are formatted into a single prompt. 

#### **Base model vs. Intruct Models**

Another point we need to understand is the difference between a Base Model vs. an Instruct Model:

- A *Base Model* is trained on raw text data to predict the next token. 
- An *Instruct Model* is fine-tuned specifically to follow instructions and engage in conversations. For example, **SmolLM2-135M** is a base model, while **SmolLM2-135M-Instruct** is its instruction-tuned variant. 

To make a Base Model behave like an instruct model, we need to **formate our prompts in a consistent way that the model can inderstand**. This si where chat templates come in. 

*ChatML* is one such template formate that structures conversarions with clear role indicators (system, user, assistant). If you have interacted with some AI API lately, you known that's the standard practice. 

It's important to note that base model could be fine-tuned on different chat templates, so when we're using an instruct model we need to make sure we're using the correct chat template. 

#### **Understanding Chat Template** 

Because each instruct model uses different conversation formates and special tokens, chat templates are implemented to ensure that we correctly format the prompt the way each model expects. 

In `transformers`, chat templates include `Jinja2 code` that describes how to transform the ChatML list of JSON messages, as presented in the above examples, into a textual representation of the system-level instructions, user messages and assistant responses that the model can understand.

This structure **help maintain consistency across interactions and ensure the model responds appropriately to different types of input**. 

Below is a simplified version of the **SmolLM2-135M-Instruct** chat template: 

```Python {% for message in messages %}
{% if loop.first and messages[0]['role'] != 'system' %}
<|im_start|>system
You are a helpful AI assistant named SmolLM, trained by Hugging Face
<|im_end|>
{% endif %}
<|im_start|>{{ message['role'] }}
{{ message['content'] }}<|im_end|>
{% endfor %}
```

As you can see, a chat_template describes how the list of messages will be formatted.

Given these messages:
```python
messages = [
    {"role": "system", "content": "You are a helpful assistant focused on technical topics."},
    {"role": "user", "content": "Can you explain what a chat template is?"},
    {"role": "assistant", "content": "A chat template structures conversations between users and AI models..."},
    {"role": "user", "content": "How do I use it ?"},
]
``` 

The previous chat template will produce the following string:

```Python 
<|im_start|>system
You are a helpful assistant focused on technical topics.<|im_end|>
<|im_start|>user
Can you explain what a chat template is?<|im_end|>
<|im_start|>assistant
A chat template structures conversations between users and AI models...<|im_end|>
<|im_start|>user
How do I use it ?<|im_end|>
```

The transformers library will take care of chat templates for you as part of the tokenization process. All we have to do is structure our messages in the correct way and the tokenizer will take care of the rest.

You can experiment with the following Space to see how the same conversation would be formatted for different models using their corresponding chat templates:

#### **Message to Prompt** 

The easiest way to ensure your LLM receives a conversation correclty formatted is to use the **chat_template** from the model's tokenizer. 

```Python
messages = [
    {"role": "system", "content": "You are an AI assistant with access to various tools."},
    {"role": "user", "content": "Hi !"},
    {"role": "assistant", "content": "Hi human, what can help you with ?"},
]
```

To convert the previous conversation into a prompt, we load the tokenizer and call `apply_chat_template`:

```Python
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("HuggingFaceTB/SmolLM2-1.7B-Instruct")
rendered_prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
```

The `rendered_prompt` returned by this function is now ready to use as the input for the model you chose!

<div style="border-left: 3px solid black; padding-left: 10px;">
   This *apply_chat_template()* function will be used in the backend of your API, when you interact with messages in the ChatML format.
</div>

Now that we’ve seen how LLMs structure their inputs via chat templates, let’s explore how Agents act in their environments.

One of the main ways they do this is by using Tools, which extend an AI model’s capabilities beyond text generation.

## **What are Tools?**

![](https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit1/whiteboard-check-2.jpg)

One crucial aspect of AI agent is thrit ability to take **actions**. As we saw, this happen through the use of **Tools**.

In this section, we'll learn what tool are, how to design them effectively, and how to integrate them into your agent via the System MEssage. 

By giving your agent the right Tools - and clearly describing how those Tools work- you can dramatically increase whtat your AI can accomplish. Let's dive in!

### **What are AI Tools?**

A **Tools is a function given to the LLM**. This function should fulfill a **clear objective**.

Here are some commonly used tools in AI agents:

| **Tool** | **Description** | 
| ---- | ---- |
| **Web Search** | Allows the agent to fetch up-to-date information from the internet. |
| **Image Generation** | Creates images based on text descriptions. |
| **Retrieval** |  Retrieves information from an external source. | 
| **API Interface** | Interacts with an external API(GitHub, YouTube, Spotify, etc.) | 

There are only examples, as you can in fact create a tool for any use case!

A good tool should be something that **complements the power of an LLM**.

For instance, if you need to perform arithmetic, giving a **calculator tool** to your LLM will provide better results than relying on the native capabilities of the model. 

Furthermore, **LLM predict the completion of a prompt based on their training data,** which means that their internal knowledge only includes events prior to their training. Therefor, if your agent needs up-to-date data you must provide it through some tool. 

For instance, if you ask an LLM directly (without a search tool) for today's weather, the LLM will potentially hallucinate random weather. 

![](https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit1/weather.jpg)

- A tool should contain: 
    - A **Textual description of what the function does**.
    - A *Collable* (something to perform an action).
    - *Arguments* with typings.
    - (Optional) Outputs with typings.

### **How do tools work?**

LLMs, as we saw, can only receive text inputs and generate text outputs. They have no way to call tools on their own. When we talk about providing tools to an Agent, we mean teaching LLM about the existence of these tools and instructing it to generate text-based invocations when needed. 

For example, if we provide a tool to check the weather at a location from the internet and then ask LLM about the weather in PAris, the LLM will recognize that this ia an opportunity to use the "wether" tool. Insted of retrieving the weather data itself, the LLM will generate text represents a tool call, such as call weather_tool('Paris'). 

The **Agent** then reads this response, identifies that a tool call required, executes the tool on the LLM's behalf, and retrieves the actual weather data. 

The tool-calling steps are typically not shown to the user: the Agent appends them as a new message before passing the update conversation to the LLM again. The LLM then processes this additional context and generates a natural-sounding response for the user. From the user's prespective, it appears as if the LLM directly interacted with the tool, but in reality, it was the Agent that handled the entire executaion process in future sessions. 

### **How do we give tools to an LLM?**

The complete answer may seem overwhelming, but we essentially use the syste prompt to provide textual descriptions of available tools to the model: 

![](https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit1/Agent_system_prompt.png)

For this to work, we have to be very precise and accurate about:

1. **What the tools does**
2. **What exact inputs it expects**

This is the reason why tool desctiptions are usually provided using expressive but precise structures, such as computer languages or JSON. It's not *necessary* to do it like that, any precise and coherent fromat would work. 

If this seems tool theoretical, let's understand it through a concrete example. 

We wiil implement a simplified **calculator** tool what that will just multiply two integers. This could be our Python implementation: 

```Python
def calculator(a: int, b: int) -> int:
    """Multiply two integers."""
    return a * b
```

So our tool is called `calculator`, it **multiplies two integers**, and it requires the following inputs: 
-  `a` (*int*): An integer.
-  `b` (*int*): An integer.

The output of the tool is another integer number that we can describe like this:
- (*int*): The product of `a` and `b`.

All of these details are important. Let’s put them together in a text string that describes our tool for the LLM to understand.

`Tool Name: calculator, Description: Multiply two integers., Arguments: a: int, b: int, Outputs: int`

<div style="border-left: 3px solid black; padding-left: 10px;">
  "<strong>Reminder:</strong>This textual description is what we want the LLM to knwo about the tool"<br><br>
</div>

When we pass the previous string as part of the input to the LLM, the model will recognize it as a tool, and will know what it needs to pass as inputs and what to expect from the output. 

If we want to provide additional tools, we must be consistent and always use the same format. This process can be fragile, and we might accidentlly overllok some details.

Is there a better way? 

### ** Auto formatting Tool sections**

Our tools was written in Python, and the implementation already provides everything we need: 
- A descriptive name of what it does: `calculator`
- A longer description, provided by the function’s docstring comment: `Multiply two integers`.
- The inputs and their type: the function clearly expects two ints.
- The type of the output

There’s a reason people use programming languages: they are expressive, concise, and precise.

We could provide the Python source code as the `specification` of the tool for the LLM, but the way the tool is implemented does not matter. All that matters is its name, what it does, the inputs it expects and the output it provides.

We will leverage Python’s introspection features to leverage the source code and build a tool description automatically for us. All we need is that the tool implementation uses type hints, docstrings, and sensible function names. We will write some code to extract the relevant portions from the source code.

After we are done, we’ll only need to use a Python decorator to indicate that the `calculator` function is a tool:

```Python
@tool
def calculator(a: int, b: int) -> int:
    """Multiply two integers."""
    return a * b

print(calculator.to_string())
```

Note the `@tool` decorator before the function definition. 

With the implementation we'll see next, we will  be able to retrieve the following text automatically from the source code via the `to_string()` function provided by the decorator: 

`Tool Name: calculator, Description: Multiply two integers., Arguments: a: int, b: int, Outputs: int`

As you can see, it's the same thing we wrote manually before!

### **Generic Tool implementation**

We create a generic `tool` class that we can reuse whenever we need to use a tool. 

<div style="border-left: 3px solid black; padding-left: 10px;">
  "<strong>Disclaimer:</strong>This example implementation is fictional but closely resembles real implementations in most libraries."<br><br>
</div>



In [1]:
class Tool:
    """
    A class representing a reusable piece of code (Tool).

    Attributes:
        name (str): Name of the tool.
        description (str): A textual description of what the tool does.
        func (callable): The function this tool wraps.
        arguments (list): A list of argument.
        outputs (str or list): The return type(s) of the wrapped function.
    """
    def __init__(self,
                 name: str,
                 description: str,
                 func: callable,
                 arguments: list,
                 outputs: str):
        self.name = name
        self.description = description
        self.func = func
        self.arguments = arguments
        self.outputs = outputs

    def to_string(self) -> str:
        """
        Return a string representation of the tool,
        including its name, description, arguments, and outputs.
        """
        args_str = ", ".join([
            f"{arg_name}: {arg_type}" for arg_name, arg_type in self.arguments
        ])

        return (
            f"Tool Name: {self.name},"
            f" Description: {self.description},"
            f" Arguments: {args_str},"
            f" Outputs: {self.outputs}"
        )

    def __call__(self, *args, **kwargs):
        """
        Invoke the underlying function (callable) with provided arguments.
        """
        return self.func(*args, **kwargs)

It may seem complicated, but if we go slowly through it we can see what it does. We define a `Tool` class that includes: 
- `name` (str): The name of the tool.
- `description` (str): A brief description of what the tool does.
- `function` (callable): The function the tool executes.
- `arguments` (list): The expected input parameters.
- `outputs` (str or list): The expected outputs of the tool.
- `__call__()`: Calls the function when the tool instance is invoked.
- `to_string()`: Converts the tool’s attributes into a textual representation.

We cloud create a Tool with this class using code like the following: 

```Python
calculator_tool = Tool(
    "calculator",                   # name
    "Multiply two integers.",       # description
    calculator,                     # function to call
    [("a", "int"), ("b", "int")],   # inputs (names and types)
    "int",                          # output
)
```

But we can also use Python’s *`inspect`* module to retrieve all the information for us! This is what the *`@tool`* decorator does.

<div style="border-left: 3px solid black; padding-left: 10px;">
  "If you are interested, you can disclose the following section to look at the decorator implementation."<br><br>
</div>

Decorator Code 

In [2]:
import inspect

def tool(func):
    """
    A decorator that creates a Tool instance from the given function.
    """
    # Get the function signature
    signature = inspect.signature(func)

    # Extract (param_name, param_annotation) pairs for inputs
    arguments = []
    for param in signature.parameters.values():
        annotation_name = (
            param.annotation.__name__
            if hasattr(param.annotation, '__name__')
            else str(param.annotation)
        )
        arguments.append((param.name, annotation_name))

    # Determine the return annotation
    return_annotation = signature.return_annotation
    if return_annotation is inspect._empty:
        outputs = "No return annotation"
    else:
        outputs = (
            return_annotation.__name__
            if hasattr(return_annotation, '__name__')
            else str(return_annotation)
        )

    # Use the function's docstring as the description (default if None)
    description = func.__doc__ or "No description provided."

    # The function name becomes the Tool name
    name = func.__name__

    # Return a new Tool instance
    return Tool(
        name=name,
        description=description,
        func=func,
        arguments=arguments,
        outputs=outputs
    )

In [3]:
@tool
def calculator(a: int, b: int) -> int:
    """Multiply two integers."""
    return a * b

print(calculator.to_string())

Tool Name: calculator, Description: Multiply two integers., Arguments: a: int, b: int, Outputs: int


And we can use the `Tool`'s `to_string` method to automatically retrieve text suitable to be used as a tool description for an LLM: 

`Tool Name: calculator, Description: Multiply two integers., Arguments: a: int, b: int, Outputs: int`

The description is **injected** in the system prompt. Taking the example with which we started this section, here is how it would look like after replacing the `tools_desciption`:

![](https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit1/Agent_system_prompt_tools.png)

In the Actions section, we will learn more about how an Agent can Call this tool we just created.

**Model Context Protocol (MCP): a unified tool interface**

Model Context Protocol (MCP) is an **open protocol** that standardizes how applications **provide tools to LLMs**. MCP provides:

- A growing list of pre-built integrations that your LLM can directly plug into
- The flexibility to switch between LLM providers and vendors
- Best practices for securing your data within your infrastructure

This means that **any framework implementing MCP can leverage tools defined within the protocol**, eliminating the need to reimplement the same tool interface for each framework.

Tools play a crucial role in enhancing the capabilities of AI agents.

To summarize, we learned:

- *What tools are*: function that give LLMs extra capabilities, such as performing calculations or accessing external data. 
- *How to define a Tool*: By providing a clear textual description, inputs, outputs, and a callable function.
- *Why Tools are Essential*: They enable Agents to overcome the limitations of static model training, handle real-time tasks, and perform specialized actions. 