[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)]("https://colab.research.google.com/github/christianmoiola/agentic_lecture/multi_agent_system.ipynb)

## 1 What is an AI Agent?

An **AI Agent** is a system capable of autonomously performing tasks on behalf of a user or another system.  
It achieves this by **planning**, **reasoning**, and **interacting** with the external environment through various tools and components.

Different types of AI agents exist, such as simple reflex agents or model-based agents.

Today, however, the term “AI agent” is often used to refer specifically to LLM-based agents, which use a Large Language Model as the central controller that coordinates other components such as memory and tools.

## 2 Core Components of an AI Agent

### 1. Model
- Acts as the **brain** of the agent.  
- Responsible for reasoning, understanding, and generating responses.  
- Maintains context and makes decisions based on knowledge and prior interactions.

### 2. Tools
- External functions, APIs, or plugins that allow the agent to perform specific actions.  
- Extend the agent’s capabilities (e.g., web search, code execution, image generation, data analysis).

### 3. Instructions
- Define the rules, goals, and behavioral constraints of the agent.  
- Guide the agent’s tone, scope of allowed actions, and ethical boundaries.

### 4. Memory
- Stores the **state of the conversation**, including past messages, tool outputs, and contextual information.  
- Enables continuity across multiple user interactions.

## 3 How a Basic AI Agent Works

When a user provides an input or question, the AI Agent processes it as follows:

1. The **user query** is combined with the **system instructions** and any relevant **context or memory**.
2. The **LLM** (foundation model) interprets the request and decides which **tool** (or set of tools) to use.
3. The model generates a **JSON output** specifying:
   - The **tool name** to be executed.
   - The **parameters** required for execution.
4. The specified tool is executed, and the resulting output is returned to the user.

### Limitations of the Basic Agent
A limitation of this approach is that it operates in a **single-shot** manner — it selects and executes tools based only on the initial input.  
This means it cannot handle **multi-step reasoning** tasks where the output of one tool is required as input for another.

To overcome this limitation, a more advanced architecture was introduced: the **ReAct Framework**.

## 4 The ReAct Framework

The **ReAct Framework** (Reason + Act) extends the capabilities of AI Agents by integrating **iterative reasoning** and **tool use** in a looped process.

- **Reasoning (Reason)**:  
  The model first reflects on the problem and determines the steps needed to solve it.  
  This encourages **step-by-step logical planning** before any action is taken.

- **Action (Act)**:  
  Based on its reasoning, the model selects and executes the appropriate tools.  
  The results of these actions are then fed back into the reasoning loop.

This iterative process continues until the agent produces a **final answer**.

## 5 How a ReAct AI Agent Works

[![HMD-2.png](https://i.postimg.cc/qMVsdKD4/HMD-2.png)](https://postimg.cc/LhDYtJMb)

Below is the general workflow of a ReAct-based AI Agent:

1. **Initialization**
   - The system prompt (instructions), user query, and any prior observations (results from earlier tool executions) are concatenated into a single input prompt.

2. **Reasoning and Action Generation**
   - The LLM receives this prompt and outputs two components sequentially:
     1. **Thought**: A reasoning step describing how the model plans to approach the problem.
     2. **Action**: A JSON-formatted instruction specifying which tool to call and with what parameters.

3. **Tool Execution**
   - The system parses the generated action, executes the corresponding tool, and stores the tool’s result as a new **observation**.

4. **Looping Process**
   - The updated prompt (including the new observation) is passed back to the LLM.  
   - This loop continues until the model produces an action of type **`final_answer`**, signaling that the reasoning process is complete.

## 6 Example: Population Comparison Task

**System prompt:**  
> "You are an agent that can browse the web and call tools. Use reasoning and then act."

**User query:**  
> "Find the current population of Venice and compare it to that of Trento."


### Step-by-Step Execution

#### 1. Initialization
   - The system stores the user’s query as a `TaskStep`.

#### 2. First Iteration
   - **Thought:** “To compare populations, I first need the population of Venice, then that of Trento, and finally compute their ratio.”  
   - **Action:**  
     ```json
     {"tool": "search_engine", "query": "population of Venice Italy 2025"}
     ```
   - **Observation:** “Venice population: 254,000 (2024 estimate).”

#### 3. Second Iteration
   - **Thought:** “Now I need the population of Trento.”  
   - **Action:**  
     ```json
     {"tool": "search_engine", "query": "population of Trento Italy 2025"}
     ```
   - **Observation:** “Trento population: 118,000 (2023 estimate).”

#### 4. Third Iteration
   - **Thought:** “I have both values; now compute the ratio.”  
   - **Action:**  
     ```json
     {"tool": "calculator", "expression": "118000 / 254000"}
     ```
   - **Observation:** “Result: 0.46.”

#### 5. Final Iteration
   - **Thought:** “I now have the answer.”  
   - **Action:**  
     ```json
     {"tool": "final_answer", "output": "The population of Trento is approximately 46% of that of Venice."}
     ```

## 7 Introduction to Smolagents

`smolagents` is one of the simplest and most lightweight frameworks for building *ReAct-style AI Agents*.  
It supports multiple Large Language Models (LLMs), including models from:

- Hugging Face Hub  
- OpenAI  
- Anthropic  
- And others

The library implements the **ReAct framework** and allows you to:

- Define custom tools
- Use pre-built tools
- Build multi-agent systems where agents collaborate on tasks

`smolagents` provides two main agent types:

- **ToolCallingAgent**: uses JSON/text-structured tool calls  
- **CodeAgent** — writes Python code as its actions and executes it in a sandbox

### 7.1 Installation

In [None]:
%pip install "smolagents[toolkit]"

from huggingface_hub import login
login()

: 

### 7.2 ToolCallingAgent

The `ToolCallingAgent` provides a direct implementation of the ReAct framework.
It generates:

1. A **reasoning step** (“Thought”)
2. An **action step**, expressed as a JSON tool call

It then selects and executes tools based on the reasoning produced by the LLM.

Below is an example using a predefined search tool:

In [None]:
from smolagents import ToolCallingAgent, InferenceClientModel, DuckDuckGoSearchTool

model = InferenceClientModel(
    model_id="meta-llama/Llama-3.3-70B-Instruct",
    tool_choice="auto"
)

agent = ToolCallingAgent(
    tools=[DuckDuckGoSearchTool()],
    model=model
)

agent.run("Who is the CEO of Hugging Face?")
agent.run("What is the capital of France?")

In this example, the agent only has access to a single tool for web search,
so every action requiring external information will rely on that tool.

### 7.3 Creating a Custom Tool

We can create new tools using the `@tool` decorator.
It is important that:

* The **function name** is descriptive
* The **parameters** clearly represent the intended inputs
* The **docstring** explains exactly what the tool does

The LLM relies heavily on this descriptive information to decide when the tool is relevant.

Example:

In [None]:
from smolagents import ToolCallingAgent, InferenceClientModel, tool, DuckDuckGoSearchTool

@tool
def turn_on_lights(room: str) -> str:
    """
    Turns on the lights in the specified room.

    Args:
        room: One of "living room", "kitchen", "bedroom", or "bathroom".
    """
    if room not in ["living room", "kitchen", "bedroom", "bathroom"]:
        raise ValueError("Invalid room name.")
    return f"The lights in the {room} have been turned on."

Using it with a ToolCallingAgent:

In [None]:
model = InferenceClientModel(
    model_id="meta-llama/Llama-3.3-70B-Instruct",
    tool_choice="auto"
)

agent = ToolCallingAgent(
    tools=[turn_on_lights, DuckDuckGoSearchTool()],
    model=model
)

agent.run("Based on the weather in New York, should I turn on the lights in the living room?")
agent.run("What is 458x123?")

#### Limitation of ToolCallingAgent

If the agent needs functionality that has not been implemented as a tool,
then:

* it **cannot fall back to native reasoning**, and
* it **cannot tell the user that it lacks a tool**

Instead, it will produce an answer based only on its internal language model capabilities.

For example, without a calculator tool, `458×123` will be computed directly by the LLM.

To support more flexible and expressive actions, we can use the **CodeAgent**.

### 7.4 CodeAgent

Unlike the `ToolCallingAgent`, which produces JSON tool calls,
the **CodeAgent writes Python code** as its action.
This code may:

* Call custom tools
* Perform arbitrary computations
* Use Python’s built-in capabilities

All code is executed in a **sandboxed environment**, and anything printed (`print(...)`) is treated as an **observation** and fed back into the ReAct loop.

#### Example: ToolCallingAgent vs CodeAgent Actions

In [None]:
# Example ToolCallingAgent action
tool_calling_agent_actions = {
  "tool_call": {
    "name": "turn_on_lights",
    "arguments": {
      "room": "kitchen"
    }
  }
}

# Example CodeAgent action
result = turn_on_lights("kitchen")
print(result)
final_answer(result)

Because the CodeAgent can write code, it can compute operations (like multiplication)
even without a calculator tool.

#### CodeAgent Example

In [None]:
from smolagents import CodeAgent, InferenceClientModel, DuckDuckGoSearchTool

model = InferenceClientModel(model_id="Qwen/Qwen2.5-Coder-32B-Instruct")

agent = CodeAgent(
    tools=[turn_on_lights, DuckDuckGoSearchTool()],
    model=model,
    additional_authorized_imports=[]
)

agent.run("Turn on the lights in the kitchen.")
agent.run("What is 34x76?")
agent.run("What is the current date and time?")
agent.run("Search for the latest news about space exploration.")

With this setup, the agent becomes extremely powerful,
but also potentially too large in scope if many tools are added.

This motivates the use of **multi-agent systems**.

### 7.5 Multi-Agent Systems in Smolagents

A multi-agent system consists of **multiple AI agents**,
each with:

* A dedicated role
* A specific set of tools
* A specialized responsibility

This design increases:

* **Modularity**
* **Scalability**
* **Robustness**

Instead of giving one massive agent too many capabilities,
tasks are distributed across specialized sub-agents.

### 7.6 Multi-Agent Architecture

A common pattern is:

* An **orchestrator agent** 
* Several **specialized sub-agents**, such as:

  * Web search agent
  * Code execution agent
  * Memory agent
  * User-interaction agent

The orchestrator delegates tasks to the appropriate agents.

#### Example

In [None]:
from smolagents import CodeAgent, InferenceClientModel, WebSearchTool, UserInputTool

model = InferenceClientModel(model_id="Qwen/Qwen2.5-Coder-32B-Instruct")

web_agent = CodeAgent(
   tools=[WebSearchTool()],
   model=model,
   name="web_search_agent",
   description="Runs web searches for you. Provide a query."
)

user_agent = CodeAgent(
   tools=[UserInputTool()],
   model=model,
   name="user_agent",
   description="Interacts with the user to understand their requests."
)

memory_agent = CodeAgent(
   tools=[],
   model=model,
   name="memory_agent",
   description="Stores and retrieves information as needed."
)

code_agent = CodeAgent(
   tools=[],
   model=model,
   name="code_agent",
   description="Executes Python code to perform computational tasks."
)

manager_agent = CodeAgent(
   tools=[],
   model=model,
   managed_agents=[web_agent, user_agent, memory_agent, code_agent]
)

manager_agent.run("Who is the CEO of Hugging Face?")
manager_agent.run("What is the capital of France?")
manager_agent.run("Remember that my favorite color is blue.")

### 7.7 Problems

1. **AI-Agents do not truly plan:** They only output a “thought” message before generating an action or code, but this does not necessarily mean they are actually planning or reasoning.

2. **AI agents attempt tasks they cannot perform:** They try to execute actions they are not capable of handling.

3. **Lack of fallback policies:** When executing tool calls, if the model generates an incorrect json, errors occur because no fallback or recovery mechanisms exist.

4. **Potential infinite loops:** As seen in some code executions or tool-calling sequences, ReAct-style loops of “thought, action, observation” can continue indefinitely.

5. **Difficulty handling complex tasks:** How can these systems manage complex tasks? Can they genuinely reason?