# Unit 1. Introduction to Agents

## Messages and Special Tokens

- [Chat Templates](https://huggingface.co/docs/transformers/v4.48.2/en/chat_templating)

In [1]:
from transformers import AutoTokenizer

In [2]:
tokenizer = AutoTokenizer.from_pretrained("HuggingFaceTB/SmolLM2-1.7B-Instruct")
tokenizer

GPT2TokenizerFast(name_or_path='HuggingFaceTB/SmolLM2-1.7B-Instruct', vocab_size=49152, model_max_length=8192, is_fast=True, padding_side='right', truncation_side='right', special_tokens={'bos_token': '<|im_start|>', 'eos_token': '<|im_end|>', 'unk_token': '<|endoftext|>', 'pad_token': '<|im_end|>', 'additional_special_tokens': ['<|im_start|>', '<|im_end|>']}, clean_up_tokenization_spaces=False, added_tokens_decoder={
	0: AddedToken("<|endoftext|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
	1: AddedToken("<|im_start|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
	2: AddedToken("<|im_end|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
	3: AddedToken("<repo_name>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
	4: AddedToken("<reponame>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
	5: AddedToken("<file_sep>"

In [3]:
messages = [
    {"role": "system", "content": "You are an AI assistant with access to various tools."},
    {"role": "user", "content": "Hi !"},
    {"role": "assistant", "content": "Hi human, what can help you with ?"},
]

In [4]:
rendered_prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
print(rendered_prompt)

<|im_start|>system
You are an AI assistant with access to various tools.<|im_end|>
<|im_start|>user
Hi !<|im_end|>
<|im_start|>assistant
Hi human, what can help you with ?<|im_end|>
<|im_start|>assistant



## What are Tools?

### Generic Tool implementation

In [5]:
class Tool:
    """
    A class representing a reusable piece of code (Tool).

    Attributes:
        name (str): Name of the tool.
        description (str): A textual description of what the tool does.
        func (callable): The function this tool wraps.
        arguments (list): A list of argument.
        outputs (str or list): The return type(s) of the wrapped function.
    """

    def __init__(
        self, name: str, description: str, func: callable, arguments: list, outputs: str
    ):
        self.name = name
        self.description = description
        self.func = func
        self.arguments = arguments
        self.outputs = outputs

    def to_string(self) -> str:
        """
        Return a string representation of the tool,
        including its name, description, arguments, and outputs.
        """
        args_str = ", ".join(
            [f"{arg_name}: {arg_type}" for arg_name, arg_type in self.arguments]
        )

        return (
            f"Tool Name: {self.name},"
            f" Description: {self.description},"
            f" Arguments: {args_str},"
            f" Outputs: {self.outputs}"
        )

    def __call__(self, *args, **kwargs):
        """
        Invoke the underlying function (callable) with provided arguments.
        """
        return self.func(*args, **kwargs)

In [11]:
def calculator(a: int, b: int) -> int:
    """Multiply two integers."""
    return a * b

calculator_tool = Tool(
    "calculator",                   # name
    "Multiply two integers.",       # description
    calculator,                     # function to call
    [("a", "int"), ("b", "int")],   # inputs (names and types)
    "int",                          # output
)

print(calculator_tool.to_string())

Tool Name: calculator, Description: Multiply two integers., Arguments: a: int, b: int, Outputs: int


### Use Decorator

In [16]:
import inspect

def tool(func):
    """
    A decorator that creates a Tool instance from the given function.
    """
    # Get the function signature
    signature = inspect.signature(func)

    # Extract (param_name, param_annotation) pairs for inputs
    arguments = []
    for param in signature.parameters.values():
        annotation_name = (
            param.annotation.__name__
            if hasattr(param.annotation, "__name__")
            else str(param.annotation)
        )
        arguments.append((param.name, annotation_name))

    # Determine the return annotation
    return_annotation = signature.return_annotation
    if return_annotation is inspect._empty:
        outputs = "No return annotation"
    else:
        outputs = (
            return_annotation.__name__
            if hasattr(return_annotation, "__name__")
            else str(return_annotation)
        )

    # Use the function's docstring as the description (default if None)
    description = func.__doc__ or "No description provided."

    # The function name becomes the Tool name
    name = func.__name__

    # Return a new Tool instance
    return Tool(
        name=name,
        description=description,
        func=func,
        arguments=arguments,
        outputs=outputs,
    )

In [None]:
@tool
def calculator(a: int, b: int) -> int:
    """Multiply two integers."""
    return a * b

print(calculator.to_string())

Tool Name: calculator, Description: Multiply two integers., Arguments: a: int, b: int, Outputs: int


## Dummy Agents

> TODO: result is not aligned with the given example...

- [dummy_agent_library.ipynb · agents-course/notebooks at main](https://huggingface.co/agents-course/notebooks/blob/main/dummy_agent_library.ipynb)
- [My Modified Version](https://colab.research.google.com/drive/1SzN6jyQpldIXF-eNPP1_vnKv4ivLJRG7?usp=sharing)
  - Basically same as this notebook but with less modification
  - Still got the hallucination issue (not talking about weather...)

In [18]:
from dotenv import load_dotenv
load_dotenv()

True

In [19]:
from huggingface_hub import InferenceClient

In [21]:
client = InferenceClient("meta-llama/Llama-3.2-3B-Instruct")
client

<InferenceClient(model='meta-llama/Llama-3.2-3B-Instruct', timeout=None)>

Try `text_generation` (manually setup special token and with `tokenizer.apply_chat_template`) and `chat` API

In [None]:
# As seen in the LLM section, if we just do decoding, **the model will only stop when it predicts an EOS token**,
# and this does not happen here because this is a conversational (chat) model and we didn't apply the chat template it expects.
client.text_generation(
    "The capital of france is",
    max_new_tokens=100,
)

" a great way to get started with the basics of photography. It's a great way to learn about composition, lighting, and other fundamental concepts of photography.\n\nHere are some tips to get you started with photography:\n\n1.  **Understand your camera**: Familiarize yourself with your camera's settings and features. Read the manual or online resources to learn about the different modes, such as manual, aperture priority, and shutter priority.\n2.  **Practice, practice, practice**: The more you"

In [42]:
# If we now add the special tokens related to Llama3.2 model, the behavior changes and is now the expected oen.
prompt = """<|begin_of_text|><|start_header_id|>user<|end_header_id|>

The capital of france is<|eot_id|><|start_header_id|>assistant<|end_header_id|>

"""

client.text_generation(
    prompt,
    max_new_tokens=100,
)

'...Paris!'

In [40]:
# You need to agree to share your contact information to access this model
# https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct
# https://huggingface.co/settings/gated-repos
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.2-3B-Instruct")
tokenizer

tokenizer_config.json:   0%|          | 0.00/54.5k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/9.09M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/296 [00:00<?, ?B/s]

PreTrainedTokenizerFast(name_or_path='meta-llama/Llama-3.2-3B-Instruct', vocab_size=128000, model_max_length=131072, is_fast=True, padding_side='right', truncation_side='right', special_tokens={'bos_token': '<|begin_of_text|>', 'eos_token': '<|eot_id|>'}, clean_up_tokenization_spaces=True, added_tokens_decoder={
	128000: AddedToken("<|begin_of_text|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
	128001: AddedToken("<|end_of_text|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
	128002: AddedToken("<|reserved_special_token_0|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
	128003: AddedToken("<|reserved_special_token_1|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
	128004: AddedToken("<|finetune_right_pad_id|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
	128005: AddedToken("<|reserved_special_token_2|>", r

In [None]:
print(tokenizer.apply_chat_template([{"role": "user", "content": "The capital of france is"}],
    tokenize=False,
    add_generation_prompt=True,
))

print("-" * 30, "\n")

print(prompt)

<|begin_of_text|><|start_header_id|>system<|end_header_id|>

Cutting Knowledge Date: December 2023
Today Date: 11 Feb 2025

<|eot_id|><|start_header_id|>user<|end_header_id|>

The capital of france is<|eot_id|><|start_header_id|>assistant<|end_header_id|>


------------------------------ 

<|begin_of_text|><|start_header_id|>user<|end_header_id|>

The capital of france is<|eot_id|><|start_header_id|>assistant<|end_header_id|>




In [25]:
output = client.chat.completions.create(
    messages=[
        {"role": "user", "content": "The capital of france is"},
    ],
    stream=False,
    max_tokens=1024,
)

output

ChatCompletionOutput(choices=[ChatCompletionOutputComplete(finish_reason='stop', index=0, message=ChatCompletionOutputMessage(role='assistant', content='Paris.', tool_calls=None), logprobs=None)], created=1739236933, id='', model='meta-llama/Llama-3.2-3B-Instruct', system_fingerprint='3.0.1-sha-bb9095a', usage=ChatCompletionOutputUsage(completion_tokens=3, prompt_tokens=40, total_tokens=43))

In [26]:
output.choices[0].message.content

'Paris.'

Basic ReAct example using self-construct prompt and `text_generation`

In [28]:
# This system prompt is a bit more complex and actually contains the function description already appended.
# Here we suppose that the textual description of the tools have already been appended
SYSTEM_PROMPT = """Answer the following questions as best you can. You have access to the following tools:

get_weather: Get the current weather in a given location

The way you use the tools is by specifying a json blob.
Specifically, this json should have a `action` key (with the name of the tool to use) and a `action_input` key (with the input to the tool going here).

The only values that should be in the "action" field are:
get_weather: Get the current weather in a given location, args: {"location": {"type": "string"}}
example use :
```
{{
  "action": "get_weather",
  "action_input": {"location": "New York"}
}}

ALWAYS use the following format:

Question: the input question you must answer
Thought: you should always think about one action to take. Only one action at a time in this format:
Action:
```
$JSON_BLOB
```
Observation: the result of the action. This Observation is unique, complete, and the source of truth.
... (this Thought/Action/Observation can repeat N times, you should take several steps when needed. The $JSON_BLOB must be formatted as markdown and only use a SINGLE action at a time.)

You must always end your output with the following format:

Thought: I now know the final answer
Final Answer: the final answer to the original input question

Now begin! Reminder to ALWAYS use the exact characters `Final Answer:` when you provide a definitive answer. """

In [49]:
# Since we are running the "text_generation", we need to add the right special tokens.
prompt = f"""<|begin_of_text|><|start_header_id|>system<|end_header_id|>
{SYSTEM_PROMPT}
<|eot_id|><|start_header_id|>user<|end_header_id|>
What's the weather in London ?
<|eot_id|><|start_header_id|>assistant<|end_header_id|>
"""

In [34]:
thought_output = client.text_generation(
    prompt,
    max_new_tokens=200,
    stop=["Observation:"],  # Let's stop before any actual function is called
)

print(thought_output)

 
defence decisions about the content of your posts. Here's an example of how you could rephrase the question to focus on the topic of university campuses and the potential for cyber threats:

"How can universities in India prevent cyber threats on their campuses, such as exploring ways to detect and prevent cyber attacks, and how can they educate students and faculty about online safety and cyber threats, and cyber attacks, and online safety, and cyber security, and online safety, and cyber threats, and cyber attacks, and online safety, and cyber security, and online safety, and cyber attacks, and cyber threats, and cyber attacks, and cyber threats, and cyber attacks, and cyber threats, and cyber attacks, and cyber attacks, and cyber attacks, and cyber attacks, and cyber attacks, and cyber attacks, and cyber attacks, and cyber attacks, and cyber attacks, and cyber attacks, and cyber attacks, and cyber attacks, and cyber attacks, and cyber attacks, and cyber attacks, and cyber attacks,

In [50]:
print(tokenizer.apply_chat_template(
    [
        {"role": "system", "content": SYSTEM_PROMPT},
        {"role": "user", "content": "What's the weather in London ?"},
    ],
    tokenize=False,
    add_generation_prompt=True,
))

print("-" * 30, "\n")

print(prompt)

<|begin_of_text|><|start_header_id|>system<|end_header_id|>

Cutting Knowledge Date: December 2023
Today Date: 11 Feb 2025

Answer the following questions as best you can. You have access to the following tools:

get_weather: Get the current weather in a given location

The way you use the tools is by specifying a json blob.
Specifically, this json should have a `action` key (with the name of the tool to use) and a `action_input` key (with the input to the tool going here).

The only values that should be in the "action" field are:
get_weather: Get the current weather in a given location, args: {"location": {"type": "string"}}
example use :
```
{{
  "action": "get_weather",
  "action_input": {"location": "New York"}
}}

ALWAYS use the following format:

Question: the input question you must answer
Thought: you should always think about one action to take. Only one action at a time in this format:
Action:
```
$JSON_BLOB
```
Observation: the result of the action. This Observation is uniq

In [35]:
# Dummy function
def get_weather(location):
    return f"the weather in {location} is sunny with low temperatures. \n"

get_weather("London")

'the weather in London is sunny with low temperatures. \n'

In [38]:
# Let's concatenate the base prompt, the completion until function execution and the result of the function as an Observation
prompt_with_action_result = prompt + thought_output + get_weather("London")
print(prompt_with_action_result)

<|begin_of_text|><|start_header_id|>system<|end_header_id|>
Answer the following questions as best you can. You have access to the following tools:

get_weather: Get the current weather in a given location

The way you use the tools is by specifying a json blob.
Specifically, this json should have a `action` key (with the name of the tool to use) and a `action_input` key (with the input to the tool going here).

The only values that should be in the "action" field are:
get_weather: Get the current weather in a given location, args: {"location": {"type": "string"}}
example use :
```
{{
  "action": "get_weather",
  "action_input": {"location": "New York"}
}}

ALWAYS use the following format:

Question: the input question you must answer
Thought: you should always think about one action to take. Only one action at a time in this format:
Action:
```
$JSON_BLOB
```
Observation: the result of the action. This Observation is unique, complete, and the source of truth.
... (this Thought/Action/

In [39]:
final_output_with_observation = client.text_generation(
    prompt_with_action_result,
    max_new_tokens=200,
)

print(final_output_with_observation)

The best way to prevent cyber threats on university campuses is to implement robust security measures, such as firewalls, intrusion detection systems, and encryption. Additionally, educating students and faculty about online safety and cyber security can help prevent cyber threats. Universities can also establish a cyber security team to monitor and respond to cyber threats in real-time. Furthermore, implementing a Bring Your Own Device (BYOD) policy can help prevent cyber threats, and cyber attacks, and cyber threats, and cyber attacks, and cyber attacks, and cyber attacks, and cyber attacks, and cyber attacks, and cyber attacks, and cyber attacks, and cyber attacks, and cyber attacks, and cyber attacks, and cyber attacks, and cyber attacks, and cyber attacks, and cyber attacks, and cyber attacks, and cyber attacks, and cyber attacks, and cyber attacks, and cyber attacks, and cyber attacks, and cyber attacks, and cyber attacks, and cyber attacks, and cyber attacks, and cyber attacks, 

Equivalent ReAct example using `chat`

In [56]:
# Create the initial conversation messages
messages = [
    {"role": "system", "content": SYSTEM_PROMPT},
    {"role": "user", "content": "What's the weather in London ?"},
]

# Call the chat completions API to get the assistant's first response
# We use a stop sequence to halt before an actual function call is produced.
first_response = client.chat.completions.create(
    messages=messages, max_tokens=200, stop=["Observation:"]
)

# Print the assistant's chain-of-thought up to the point of a function call
print("Assistant's initial response:")
print(first_response["choices"][0]["message"]["content"])

Assistant's initial response:
Question: What's the weather in London?

Action:
```
{
  "action": "get_weather",
  "action_input": {"location": "London"}
}
```
Observation:


In [57]:
# Simulate the function call by obtaining the observation
weather_observation = get_weather("London")

# Append the assistant’s initial response and the function's observation as new messages.
# (Here we assume the chain-of-thought expects an Observation with the result of get_weather.)
messages.append(
    {"role": "assistant", "content": first_response["choices"][0]["message"]["content"]}
)
messages.append({"role": "assistant", "content": f"Observation: {weather_observation}"})

messages

[{'role': 'system',
  'content': 'Answer the following questions as best you can. You have access to the following tools:\n\nget_weather: Get the current weather in a given location\n\nThe way you use the tools is by specifying a json blob.\nSpecifically, this json should have a `action` key (with the name of the tool to use) and a `action_input` key (with the input to the tool going here).\n\nThe only values that should be in the "action" field are:\nget_weather: Get the current weather in a given location, args: {"location": {"type": "string"}}\nexample use :\n```\n{{\n  "action": "get_weather",\n  "action_input": {"location": "New York"}\n}}\n\nALWAYS use the following format:\n\nQuestion: the input question you must answer\nThought: you should always think about one action to take. Only one action at a time in this format:\nAction:\n```\n$JSON_BLOB\n```\nObservation: the result of the action. This Observation is unique, complete, and the source of truth.\n... (this Thought/Action/O

In [59]:
print(
    tokenizer.apply_chat_template(
        messages,
        tokenize=False,
        add_generation_prompt=True,
    )
)

<|begin_of_text|><|start_header_id|>system<|end_header_id|>

Cutting Knowledge Date: December 2023
Today Date: 11 Feb 2025

Answer the following questions as best you can. You have access to the following tools:

get_weather: Get the current weather in a given location

The way you use the tools is by specifying a json blob.
Specifically, this json should have a `action` key (with the name of the tool to use) and a `action_input` key (with the input to the tool going here).

The only values that should be in the "action" field are:
get_weather: Get the current weather in a given location, args: {"location": {"type": "string"}}
example use :
```
{{
  "action": "get_weather",
  "action_input": {"location": "New York"}
}}

ALWAYS use the following format:

Question: the input question you must answer
Thought: you should always think about one action to take. Only one action at a time in this format:
Action:
```
$JSON_BLOB
```
Observation: the result of the action. This Observation is uniq

In [58]:
# Now call the chat completions API again to produce the final answer
final_response = client.chat.completions.create(
    messages=messages,
    max_tokens=200,
)

# Print the final output from the assistant
print("\nAssistant's final output:")
print(final_response["choices"][0]["message"]["content"])


Assistant's final output:
Thought: I now know the current weather conditions in London.
