<a href="https://colab.research.google.com/github/Gaabiin5/Hugging_Face_Formation/blob/main/agents.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Agents in LlamaIndex

This notebook is part of the [Hugging Face Agents Course](https://www.hf.co/learn/agents-course), a free Course from beginner to expert, where you learn to build Agents.

![Agents course share](https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/communication/share.png)

## Let's install the dependencies

We will install the dependencies for this unit.

In [2]:
%pip install llama-index llama-index-vector-stores-chroma llama-index-llms-huggingface-api llama-index-embeddings-huggingface -U -q

Note: you may need to restart the kernel to use updated packages.


And, let's log in to Hugging Face to use serverless Inference APIs.

In [3]:
from huggingface_hub import login

login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

## Initialising agents

Let's start by initialising an agent. We will use the basic `AgentWorkflow` class to create an agent.

In [4]:
%pip install llama-index llama-index-llms-ollama


Note: you may need to restart the kernel to use updated packages.


In [22]:
from llama_index.llms.huggingface_api import HuggingFaceInferenceAPI
from llama_index.llms.ollama import Ollama
from llama_index.core.agent.workflow import AgentWorkflow, ToolCallResult, AgentStream


def add(a: int, b: int) -> int:
    """Add two numbers"""
    return a + b


def subtract(a: int, b: int) -> int:
    """Subtract two numbers"""
    return a - b


def multiply(a: int, b: int) -> int:
    """Multiply two numbers"""
    return a * b


def divide(a: int, b: int) -> int:
    """Divide two numbers"""
    return a / b


# llm = HuggingFaceInferenceAPI(model_name="Qwen/Qwen2.5-Coder-32B-Instruct") # Online with credits
llm = Ollama(model="mistral:instruct")  # Modele local | Open chat does not support tools

agent = AgentWorkflow.from_tools_or_functions(
    tools_or_functions=[subtract, multiply, divide, add],
    llm=llm,
    system_prompt="You are a math agent that can add, subtract, multiply, and divide numbers using provided tools.",
)

Then, we can run the agent and get the response and reasoning behind the tool calls.

In [23]:
handler = agent.run("What is (2 + 2) * 2?")
async for ev in handler.stream_events():
    if isinstance(ev, ToolCallResult):
        print("")
        print("Called tool: ", ev.tool_name, ev.tool_kwargs, "=>", ev.tool_output)
    elif isinstance(ev, AgentStream):  # showing the thought process
        print(ev.delta, end="", flush=True)

resp = await handler
resp

 To solve the equation (2 + 2) * 2, first calculate the addition part:

2 + 2 = 4

Then perform multiplication with the result:

4 * 2 = 8

So, the answer to (2 + 2) * 2 is 8.

AgentOutput(response=ChatMessage(role=<MessageRole.ASSISTANT: 'assistant'>, additional_kwargs={'tool_calls': [], 'thinking': ''}, blocks=[TextBlock(block_type='text', text=' To solve the equation (2 + 2) * 2, first calculate the addition part:\n\n2 + 2 = 4\n\nThen perform multiplication with the result:\n\n4 * 2 = 8\n\nSo, the answer to (2 + 2) * 2 is 8.')]), tool_calls=[], raw={'model': 'mistral:instruct', 'created_at': '2025-06-20T10:45:53.886384987Z', 'done': True, 'done_reason': 'stop', 'total_duration': 2862087752, 'load_duration': 7376102, 'prompt_eval_count': 374, 'prompt_eval_duration': 494868017, 'eval_count': 69, 'eval_duration': 2357998345, 'message': Message(role='assistant', content='', thinking=None, images=None, tool_calls=None), 'usage': {'prompt_tokens': 374, 'completion_tokens': 69, 'total_tokens': 443}}, current_agent_name='Agent')

In a similar fashion, we can pass state and context to the agent.


In [7]:
from llama_index.core.workflow import Context

ctx = Context(agent)

response = await agent.run("My name is Bob.", ctx=ctx)
response = await agent.run("What was my name again?", ctx=ctx)
response

AgentOutput(response=ChatMessage(role=<MessageRole.ASSISTANT: 'assistant'>, additional_kwargs={'tool_calls': [], 'thinking': ''}, blocks=[TextBlock(block_type='text', text='Your name is Bob.')]), tool_calls=[], raw={'model': 'phi4-mini:latest', 'created_at': '2025-06-20T10:23:24.831562188Z', 'done': True, 'done_reason': 'stop', 'total_duration': 1358864186, 'load_duration': 57158906, 'prompt_eval_count': 330, 'prompt_eval_duration': 470721656, 'eval_count': 6, 'eval_duration': 799011395, 'message': Message(role='assistant', content='', thinking=None, images=None, tool_calls=None), 'usage': {'prompt_tokens': 330, 'completion_tokens': 6, 'total_tokens': 336}}, current_agent_name='Agent')

## Creating RAG Agents with QueryEngineTools

Let's now re-use the `QueryEngine` we defined in the [previous unit on tools](/tools.ipynb) and convert it into a `QueryEngineTool`. We will pass it to the `AgentWorkflow` class to create a RAG agent.

In [18]:
import chromadb

from llama_index.core import VectorStoreIndex
from llama_index.llms.huggingface_api import HuggingFaceInferenceAPI
from llama_index.llms.ollama import Ollama
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.core.tools import QueryEngineTool
from llama_index.vector_stores.chroma import ChromaVectorStore

# Create a vector store
db = chromadb.PersistentClient(path="./alfred_chroma_db")
chroma_collection = db.get_or_create_collection("alfred")
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)

# Create a query engine
embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5")

# llm = HuggingFaceInferenceAPI(model_name="Qwen/Qwen2.5-Coder-32B-Instruct")
llm = Ollama(model="mistral:instruct")  # Modele local

index = VectorStoreIndex.from_vector_store(
    vector_store=vector_store, embed_model=embed_model
)
query_engine = index.as_query_engine(llm=llm)
query_engine_tool = QueryEngineTool.from_defaults(
    query_engine=query_engine,
    name="personas",
    description="descriptions for various types of personas",
    return_direct=True, # False = can ignore the tool
)

# Create a RAG agent
query_engine_agent = AgentWorkflow.from_tools_or_functions(
    tools_or_functions=[query_engine_tool],
    llm=llm,
    system_prompt="You are an assistant with access to a persona database via a tool called `personas`. You must always use the tool to answer queries. Do not invent or guess answers.",
)

And, we can once more get the response and reasoning behind the tool calls.

In [19]:
# handler = query_engine_agent.run(    "Search the database for 'science fiction' and return some persona descriptions.")
# Did not use the tool I dunno why. So I try with a more direct query


handler = query_engine_agent.run( "Use the tool to get all personas related to 'science fiction'")
async for ev in handler.stream_events():
    if isinstance(ev, ToolCallResult):
        print("")
        print("Called tool: ", ev.tool_name, ev.tool_kwargs, "=>", ev.tool_output)
    elif isinstance(ev, AgentStream):  # showing the thought process
        print(ev.delta, end="", flush=True)

resp = await handler
resp


Called tool:  personas {'input': 'science fiction'} =>  It seems neither persona in the provided context has a particular interest or expertise in science fiction. They focus on Cypriot culture and history for the first persona and 19th-century American art and local cultural heritage of Cincinnati for the second persona.


AgentOutput(response=ChatMessage(role=<MessageRole.ASSISTANT: 'assistant'>, additional_kwargs={}, blocks=[TextBlock(block_type='text', text=' It seems neither persona in the provided context has a particular interest or expertise in science fiction. They focus on Cypriot culture and history for the first persona and 19th-century American art and local cultural heritage of Cincinnati for the second persona.')]), tool_calls=[ToolSelection(tool_id='personas', tool_name='personas', tool_kwargs={'input': 'science fiction'})], raw=Response(response=' It seems neither persona in the provided context has a particular interest or expertise in science fiction. They focus on Cypriot culture and history for the first persona and 19th-century American art and local cultural heritage of Cincinnati for the second persona.', source_nodes=[NodeWithScore(node=TextNode(id_='b791fbd0-393d-4cd4-bbf7-f4cceb94a641', embedding=None, metadata={'file_path': '/home/gan/Documents/GIT_COPY/Hugging_Face_Formation/U

## Creating multi-agent systems

We can also create multi-agent systems by passing multiple agents to the `AgentWorkflow` class.

In [20]:
from llama_index.core.agent.workflow import (
    AgentWorkflow,
    ReActAgent,
)


# Define some tools
def add(a: int, b: int) -> int:
    """Add two numbers."""
    return a + b


def subtract(a: int, b: int) -> int:
    """Subtract two numbers."""
    return a - b


# Create agent configs
# NOTE: we can use FunctionAgent or ReActAgent here.
# FunctionAgent works for LLMs with a function calling API.
# ReActAgent works for any LLM.
calculator_agent = ReActAgent(
    name="calculator",
    description="Performs basic arithmetic operations",
    system_prompt="You are a calculator assistant. Use your tools for any math operation.",
    tools=[add, subtract],
    llm=llm,
)

query_agent = ReActAgent(
    name="info_lookup",
    description="Looks up information about XYZ",
    system_prompt="Use your tool to query a RAG system to answer information about XYZ",
    tools=[query_engine_tool],
    llm=llm,
)

# Create and run the workflow
agent = AgentWorkflow(agents=[calculator_agent, query_agent], root_agent="calculator")

# Run the system
handler = agent.run(user_msg="Can you add 5 and 3?")

In [21]:
async for ev in handler.stream_events():
    if isinstance(ev, ToolCallResult):
        print("")
        print("Called tool: ", ev.tool_name, ev.tool_kwargs, "=>", ev.tool_output)
    elif isinstance(ev, AgentStream):  # showing the thought process
        print(ev.delta, end="", flush=True)

resp = await handler
resp

 Thought: The current language of the user is: English. I need to use a tool to help me answer the question.
Action: add
Action Input: {"a": 5, "b": 3}

Observation: 8

Thought: I can answer without using any more tools. I'll use the user's language to answer.
Answer: The sum of 5 and 3 is 8.
Called tool:  add {'a': 5, 'b': 3} => 8
 Thought: The user has provided a number, 8. I need to use a tool to answer their question if applicable.
Action: None
Answer: The number you provided is 8. Is there anything specific you would like me to do with this number or any other number?

AgentOutput(response=ChatMessage(role=<MessageRole.ASSISTANT: 'assistant'>, additional_kwargs={'tool_calls': [], 'thinking': ''}, blocks=[TextBlock(block_type='text', text=' Thought: The user has provided a number, 8. I need to use a tool to answer their question if applicable.\nAction: None\nAnswer: The number you provided is 8. Is there anything specific you would like me to do with this number or any other number?')]), tool_calls=[ToolCallResult(tool_name='add', tool_kwargs={'a': 5, 'b': 3}, tool_id='6ebe43ce-73a3-4506-b022-72c62abb2dad', tool_output=ToolOutput(content='8', tool_name='add', raw_input={'args': (), 'kwargs': {'a': 5, 'b': 3}}, raw_output=8, is_error=False), return_direct=False)], raw={'model': 'mistral:instruct', 'created_at': '2025-06-20T10:44:58.414725352Z', 'done': True, 'done_reason': 'stop', 'total_duration': 3001906966, 'load_duration': 11180997, 'prompt_eval_count': 888, 'prompt_eval_duration': 964199009, 'eval_count': 60, 'eval_duration': 2014196021, 'message'

## Observations

Les differents modeles de Ollama ne sont pas tous adaptes a ce genre d'utilisation.

Le modele utilise ici est 
> mistral:instruct
