# Agents in LlamaIndex

## Let's install the dependencies

We will install the dependencies for this unit.

In [1]:
!pip install llama-index llama-index-vector-stores-chroma llama-index-llms-huggingface-api llama-index-embeddings-huggingface -U -q

[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/67.3 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m67.3/67.3 kB[0m [31m5.1 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m19.5/19.5 MB[0m [31m58.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.6/7.6 MB[0m [31m84.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m303.3/303.3 kB[0m [31m23.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m41.0/41.0 kB[0m [31m3.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m284.2/284.2 kB[0m [31m23.3 MB/s[0m eta [36m0:00:

And, let's log in to Hugging Face to use serverless Inference APIs.

In [2]:
from huggingface_hub import login

login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

## Initialising agents

Let's start by initialising an agent. We will use the basic `AgentWorkflow` class to create an agent.

In [5]:
from llama_index.llms.huggingface_api import HuggingFaceInferenceAPI
from llama_index.core.agent.workflow import AgentWorkflow, ToolCallResult, AgentStream


def add(a: int, b: int) -> int:
    """Add two numbers"""
    return a + b


def subtract(a: int, b: int) -> int:
    """Subtract two numbers"""
    return a - b


def multiply(a: int, b: int) -> int:
    """Multiply two numbers"""
    return a * b


def divide(a: int, b: int) -> int:
    """Divide two numbers"""
    return a / b


# llm = HuggingFaceInferenceAPI(model_name="Qwen/Qwen2.5-Coder-32B-Instruct")
llm = HuggingFaceInferenceAPI(model_name="HuggingFaceTB/SmolLM3-3B")


agent = AgentWorkflow.from_tools_or_functions(
    tools_or_functions=[subtract, multiply, divide, add],
    llm=llm,
    system_prompt="You are a math agent that can add, subtract, multiply, and divide numbers using provided tools.",
)

Then, we can run the agent and get the response and reasoning behind the tool calls.

In [12]:
handler = agent.run("What is (2 + 2) * 2?")
async for ev in handler.stream_events():
    if isinstance(ev, ToolCallResult):
        print("")
        print("Called tool: ", ev.tool_name, ev.tool_kwargs, "=>", ev.tool_output)
    elif isinstance(ev, AgentStream):  # showing the thought process
        print(ev.delta, end="", flush=True)

resp = await handler
resp

<think>
Okay, let's see. The user is asking for the result of (2 + 2) multiplied by 2. First, I need to break this down step by step. According to the order of operations, I should handle the addition first before the multiplication. So, 2 plus 2 equals 4. Then, take that result and multiply it by 2. 4 times 2 is 8. I don't need any tools for this since it's a straightforward arithmetic calculation. I can just compute it manually. The answer should be 8. Let me make sure I didn't miss any steps. Yeah, addition first, then multiplication. Yep, that's right. So I can confidently say the result is 8 without using any tools.
</think>

Thought: I can answer without using any more tools. I'll use the user's language to answer  
Answer: 8

ERROR:asyncio:Task exception was never retrieved
future: <Task finished name='Task-67' coro=<<async_generator_athrow without __name__>()> exception=KeyError(<aiohttp.client.ClientSession object at 0x7dc442d4ab90>)>
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/dist-packages/huggingface_hub/inference/_common.py", line 379, in _async_yield_from
    yield byte_payload.strip()
GeneratorExit

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.11/asyncio/tasks.py", line 277, in __step
    result = coro.send(None)
             ^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/huggingface_hub/inference/_common.py", line 382, in _async_yield_from
    await client.close()
  File "/usr/local/lib/python3.11/dist-packages/huggingface_hub/inference/_generated/_async_client.py", line 3389, in close_session
    for response in self._sessions[session]:
                    ~~~~~~~~~~~~~~^^^

AgentOutput(response=ChatMessage(role=<MessageRole.ASSISTANT: 'assistant'>, additional_kwargs={}, blocks=[TextBlock(block_type='text', text='8')]), structured_response=None, current_agent_name='Agent', raw=ChatCompletionStreamOutput(choices=[ChatCompletionStreamOutputChoice(delta=ChatCompletionStreamOutputDelta(role=None, content='8', tool_call_id=None, tool_calls=None), index=0, finish_reason=None, logprobs=None)], created=1753958690, id='chatcmpl-Root=1-688b4922-4f4a65a849788b770a930c6e', model='HuggingFaceTB/SmolLM3-3B', system_fingerprint=None, usage=None, object='chat.completion.chunk'), tool_calls=[], retry_messages=[])

In a similar fashion, we can pass state and context to the agent.


In [15]:
from llama_index.core.workflow import Context

ctx = Context(agent)

response = await agent.run("My name is Bob.", ctx=ctx)
response = await agent.run("What was my name again?", ctx=ctx)
response

ERROR:asyncio:Task exception was never retrieved
future: <Task finished name='Task-281' coro=<<async_generator_athrow without __name__>()> exception=KeyError(<aiohttp.client.ClientSession object at 0x7dc442880e90>)>
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/dist-packages/huggingface_hub/inference/_common.py", line 379, in _async_yield_from
    yield byte_payload.strip()
GeneratorExit

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.11/asyncio/tasks.py", line 277, in __step
    result = coro.send(None)
             ^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/huggingface_hub/inference/_common.py", line 382, in _async_yield_from
    await client.close()
  File "/usr/local/lib/python3.11/dist-packages/huggingface_hub/inference/_generated/_async_client.py", line 3389, in close_session
    for response in self._sessions[session]:
                    ~~~~~~~~~~~~~~^^

WorkflowRuntimeError: Error in step 'parse_agent_output': Max iterations of 20 reached! Either something went wrong, or you can increase the max iterations with `.run(.., max_iterations=...)`

## Creating RAG Agents with QueryEngineTools

Let's now re-use the `QueryEngine` we defined in the [previous unit on tools](/tools.ipynb) and convert it into a `QueryEngineTool`. We will pass it to the `AgentWorkflow` class to create a RAG agent.

In [16]:
from datasets import load_dataset
from pathlib import Path
from llama_index.core import SimpleDirectoryReader

dataset = load_dataset(path="dvilasuero/finepersonas-v0.1-tiny", split="train")

Path("data").mkdir(parents=True, exist_ok=True)

for i, persona in enumerate(dataset):
    with open(Path("data") / f"persona_{i}.txt", "w") as f:
        f.write(persona["persona"])


reader = SimpleDirectoryReader(input_dir="data")
documents = reader.load_data()

README.md:   0%|          | 0.00/618 [00:00<?, ?B/s]

data/train-00000-of-00001.parquet:   0%|          | 0.00/35.0M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/5000 [00:00<?, ? examples/s]

In [17]:
import chromadb

from llama_index.core import VectorStoreIndex
from llama_index.llms.huggingface_api import HuggingFaceInferenceAPI
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.core.tools import QueryEngineTool
from llama_index.vector_stores.chroma import ChromaVectorStore

# Create a vector store
db = chromadb.PersistentClient(path="./alfred_chroma_db")
chroma_collection = db.get_or_create_collection("alfred")
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)

# Create a query engine
embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5")
# llm = HuggingFaceInferenceAPI(model_name="Qwen/Qwen2.5-Coder-32B-Instruct")
llm = HuggingFaceInferenceAPI(model_name="HuggingFaceTB/SmolLM3-3B")

# index = VectorStoreIndex.from_vector_store(
#     vector_store=vector_store, embed_model=embed_model
# )
index = VectorStoreIndex.from_documents(
    documents,
    embed_model=embed_model,
    vector_store=vector_store
)

query_engine = index.as_query_engine(llm=llm)
query_engine_tool = QueryEngineTool.from_defaults(
    query_engine=query_engine,
    name="personas",
    description="descriptions for various types of personas",
    return_direct=False,
)

# Create a RAG agent
query_engine_agent = AgentWorkflow.from_tools_or_functions(
    tools_or_functions=[query_engine_tool],
    llm=llm,
    system_prompt="You are a helpful assistant that has access to a database containing persona descriptions. ",
)

modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

README.md: 0.00B [00:00, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/52.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/743 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/133M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/366 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

  return forward_call(*args, **kwargs)


And, we can once more get the response and reasoning behind the tool calls.

In [18]:
handler = query_engine_agent.run(
    "Search the database for 'science fiction' and return some persona descriptions."
)
async for ev in handler.stream_events():
    if isinstance(ev, ToolCallResult):
        print("")
        print("Called tool: ", ev.tool_name, ev.tool_kwargs, "=>", ev.tool_output)
    elif isinstance(ev, AgentStream):  # showing the thought process
        print(ev.delta, end="", flush=True)

resp = await handler
resp

<think>
Okay, the user wants me to search a database for 'science fiction' and return some persona descriptions. Let me start by understanding what they need. They mentioned using a tool called personas, which I assume is a tool that provides descriptions for different types of personas. 

First, I need to check if the tool is available. The tool name is personas, and the description says it provides descriptions for various types of personas. The required input is a title, which in this case is 'science fiction'. So I should use the personas tool with the input 'science fiction'.

Next, I need to make sure I'm using the correct tool and input. The user might be referring to a specific database, but since they didn't specify, I'll proceed with the assumption that the personas tool is designed to handle such queries. 

I should also consider if there are any additional parameters needed. The tool's description mentions a properties object with input and type, so the input should be a st

ERROR:asyncio:Task exception was never retrieved
future: <Task finished name='Task-462' coro=<<async_generator_athrow without __name__>()> exception=KeyError(<aiohttp.client.ClientSession object at 0x7dc2b7dcbd10>)>
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/dist-packages/huggingface_hub/inference/_common.py", line 379, in _async_yield_from
    yield byte_payload.strip()
GeneratorExit

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.11/asyncio/tasks.py", line 277, in __step
    result = coro.send(None)
             ^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/huggingface_hub/inference/_common.py", line 382, in _async_yield_from
    await client.close()
  File "/usr/local/lib/python3.11/dist-packages/huggingface_hub/inference/_generated/_async_client.py", line 3389, in close_session
    for response in self._sessions[session]:
                    ~~~~~~~~~~~~~~^^

AgentOutput(response=ChatMessage(role=<MessageRole.ASSISTANT: 'assistant'>, additional_kwargs={}, blocks=[TextBlock(block_type='text', text='<think>\nOkay, the user wants me to search a database for \'science fiction\' and return some persona descriptions. Let me start by understanding what they need. They mentioned using a tool called personas, which I assume is a tool that provides descriptions for different types of personas. \n\nFirst, I need to check if the tool is available. The tool name is personas, and the description says it provides descriptions for various types of personas. The required input is a title, which in this case is \'science fiction\'. So I should use the personas tool with the input \'science fiction\'.\n\nNext, I need to make sure I\'m using the correct tool and input. The user might be referring to a specific database, but since they didn\'t specify, I\'ll proceed with the assumption that the personas tool is designed to handle such queries. \n\nI should also

## Creating multi-agent systems

We can also create multi-agent systems by passing multiple agents to the `AgentWorkflow` class.

In [19]:
from llama_index.core.agent.workflow import (
    AgentWorkflow,
    ReActAgent,
)


# Define some tools
def add(a: int, b: int) -> int:
    """Add two numbers."""
    return a + b


def subtract(a: int, b: int) -> int:
    """Subtract two numbers."""
    return a - b


# Create agent configs
# NOTE: we can use FunctionAgent or ReActAgent here.
# FunctionAgent works for LLMs with a function calling API.
# ReActAgent works for any LLM.
calculator_agent = ReActAgent(
    name="calculator",
    description="Performs basic arithmetic operations",
    system_prompt="You are a calculator assistant. Use your tools for any math operation.",
    tools=[add, subtract],
    llm=llm,
)

query_agent = ReActAgent(
    name="info_lookup",
    description="Looks up information about XYZ",
    system_prompt="Use your tool to query a RAG system to answer information about XYZ",
    tools=[query_engine_tool],
    llm=llm,
)

# Create and run the workflow
agent = AgentWorkflow(agents=[calculator_agent, query_agent], root_agent="calculator")

# Run the system
handler = agent.run(user_msg="Can you add 5 and 3?")

In [20]:
async for ev in handler.stream_events():
    if isinstance(ev, ToolCallResult):
        print("")
        print("Called tool: ", ev.tool_name, ev.tool_kwargs, "=>", ev.tool_output)
    elif isinstance(ev, AgentStream):  # showing the thought process
        print(ev.delta, end="", flush=True)

resp = await handler
resp

<think>
Okay, let's see. The user is asking to add 5 and 3. I need to figure out which tool to use here. The available tools are add, subtract, and handoff. Since the question is about adding two numbers, the add tool should be the right choice.

First, I should check if the user's language is specified. The question is in English, so I don't need to handle a different language. Next, I need to call the add tool with the arguments a=5 and b=3. The action input should be a JSON object with those values. 

Wait, the tool's description says the add function takes two integers. The user provided 5 and 3, which are both integers, so that's good. I don't need to use handoff here because the question is straightforward and doesn't require looking up information or passing to another agent. 

So the steps are: use the add tool, pass 5 and 3 as arguments, and then based on the tool's response, I can provide the answer. Since the add tool is simple, the response should be 8. I can then present t

AgentOutput(response=ChatMessage(role=<MessageRole.ASSISTANT: 'assistant'>, additional_kwargs={}, blocks=[TextBlock(block_type='text', text="<think>\nOkay, let's see. The user is asking to add 5 and 3. I need to figure out which tool to use here. The available tools are add, subtract, and handoff. Since the question is about adding two numbers, the add tool should be the right choice.\n\nFirst, I should check if the user's language is specified. The question is in English, so I don't need to handle a different language. Next, I need to call the add tool with the arguments a=5 and b=3. The action input should be a JSON object with those values. \n\nWait, the tool's description says the add function takes two integers. The user provided 5 and 3, which are both integers, so that's good. I don't need to use handoff here because the question is straightforward and doesn't require looking up information or passing to another agent. \n\nSo the steps are: use the add tool, pass 5 and 3 as argu