# Introduction

In this notebook we will explore the concept of agents in the LlamaIndex framework. Creating an agent-based pipeline includes integrating our RAG-based application with data sources along with various tools. It is essential to remember that developing these tools for the agents requires are likely to engage with the application and predict potential usage patterns.

The LlamaIndex framework offers numerous possibilites for combining agents and tools to enhance the abilities or LLM's. So, in this lesson we will demonstrate how these agents are capable of making decisions and integrating various resources to formulate a response.

Before start coding, we have to prepare the enviroment and settings.

In [1]:
from llama_index.core import Settings
from langchain_ollama import OllamaEmbeddings, OllamaLLM

Settings.embed_model = OllamaEmbeddings(model="llama3.1:8b")
Settings.llm = OllamaLLM(model="llama3.1:8b")


KeyboardInterrupt: 

# Creation of a RAG agent with custom tools

It's always interesting to tag and track data sources from the start. This mean giving a tag and it's source/origin to each document. By doing this, we will improve the chatbot's efficiency. To to this, we will introduce the `routers` to focus on the related information source to answer a question.

A key step in building a data-driven application with the LlamIndex RAG system is selecting the appropiate dataset. The quality and relevance of the data are fundamental. It's a good practice to start your RAG pipeline design with a small dataset, such as web articles. This way, you can quickly test, debug and understand your RAG system.

In this notebook we will use the dataset of Nikola Tesla's life, work and legacy. We employ two text documents: the first with bold future prediction that Tesla mentioned during his lifetime and the second file with biographical details about his life. We will store both documents into local files.

In the following code, we create a folder and we download the files to save them into the recently created folder:

In [65]:
!mkdir data\1k
!curl -o data\1k\tesla.txt https://raw.githubusercontent.com/idontcalculate/data-repo/main/machine_to_end_war.txt
!curl -o data\1k\web.txt https://raw.githubusercontent.com/idontcalculate/data-repo/main/prodigal_chapter10.txt

Ya existe el subdirectorio o el archivo data\1k.
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed

  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100 13902  100 13902    0     0  21352      0 --:--:-- --:--:-- --:--:-- 21420
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed

  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100 23243  100 23243    0     0  41686      0 --:--:-- --:--:-- --:--:-- 41803


In [2]:
from llama_index.core import Settings
from langchain_ollama import OllamaEmbeddings
from langchain_ollama import OllamaLLM

Settings.embed_model = OllamaEmbeddings(model="llama3.1:8b") # Load it into the setting of llama index
Settings.llm = OllamaLLM(model="llama3.1:8b")

Now, we read the first document and process it to store in Deep Lake.

The next step is to establish a database on the Activeloop platform and upload the embeddings.

In [3]:
from llama_index.core import load_index_from_storage
from llama_index.core import SimpleDirectoryReader
from llama_index.core import StorageContext, VectorStoreIndex



tesla_docs = SimpleDirectoryReader(input_files=["data/1k/web.txt"]).load_data()

try:
    # Try to load the index if it is already calculated
    storage_context = StorageContext.from_defaults(persist_dir="storage/tesla")
    tesla_index = load_index_from_storage(storage_context=storage_context)
    print("Loaded the pre-computed index.")
except:
    # Otherwise, generate the indexes
    tesla_index = VectorStoreIndex.from_documents(tesla_docs)
    tesla_index.storage_context.persist(persist_dir="storage/tesla")
    print("Generated the index.")



Loaded the pre-computed index.


The next step is to store the embeddings of the other document locally.

In [4]:
from llama_index.core import load_index_from_storage

webtext_docs = SimpleDirectoryReader(input_files=["data/1k/web.txt"]).load_data()

try:
    # Try to load the index if it is already calculated
    storage_context = StorageContext.from_defaults(persist_dir="storage/webtext")
    webtext_index = load_index_from_storage(storage_context=storage_context)
    print("Loaded the pre-computed index.")
except:
    # Otherwise, generate the indexes
    webtext_index = VectorStoreIndex.from_documents(webtext_docs)
    webtext_index.storage_context.persist(persist_dir="storage/webtext")
    print("Generated the index.")



Loaded the pre-computed index.


### RAG tools

After generating the vector store indexes, the next step is to create the query engine.

In [5]:
tesla_engine = tesla_index.as_query_engine(similarity_top_k=3)
webtext_engine = webtext_index.as_query_engine(similarity_top_k=3)

The `tesla_engine` variable handles queries about general information, and `webtext_engine` variable processes biographical data, focusing on inputs with a factual content.

Now that the query engines are constructed, the tools can be configured. We can use a combination of the `QueryEngineTool` class to create a new tool that includes a query engine and the `ToolMetaData` class, which assists in assingning names and descriptions to the tools. These descriptions will help the agent to determine the most suitable data source based on the user's query. We will create a list of two tools, each representing one of out data sources.

In [6]:
from llama_index.core.tools import QueryEngineTool, ToolMetadata

query_engine_tool_tesla = QueryEngineTool(
    query_engine=tesla_engine,
    metadata=ToolMetadata(
        name="tesla_1k",
        description=(
            "Provides information about Tesla's statements that refers to future times and predictions."
            "Use a detailed plain text text question as input to the tool."
        )
    )
)
query_engine_tool_webtext = QueryEngineTool(
    query_engine=webtext_engine,
    metadata=ToolMetadata(
        name="webtext_1k",
        description=(
            "Provides information about tesla's life and biographical data."
            "Use a detailed plain text question as input to the tool."
        )
    )
)

query_engine_tools =[query_engine_tool_tesla, query_engine_tool_webtext]

Finally, we create the agent.

In [13]:
query_engine_tools_llamaindex_to_langchain[0]

Tool(name='tesla_1k', description="Provides information about Tesla's statements that refers to future times and predictions.Use a detailed plain text text question as input to the tool.", args_schema=<class 'llama_index.core.tools.types.DefaultToolFnSchema'>, func=<BoundFunctionWrapper at 0x000001D51A4D5660 for method at 0x000001D5180EC5C0>)

In [7]:
from langchain_core.prompts.chat import ChatPromptTemplate
from langchain.agents import create_react_agent, create_tool_calling_agent
from langchain.agents.agent import AgentExecutor
from langchain_ollama import ChatOllama


# First, we convert the LlamaIndex tools to LangChain tools
query_engine_tools_llamaindex_to_langchain = [t.to_langchain_tool() for t in query_engine_tools]

# After that, we create a system context for the agent 
system_context = "You are an expert about Nikola Tesla.\
You will answer questions about his life and future predictions"

# We create the prompt for the chat
prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            system_context,
        ),
        ("placeholder", "{chat_history}"),
        ("human", "{input}"),
        ("placeholder", "{agent_scratchpad}"),
    ]
)

llm_chat = ChatOllama(model="llama3.1:8b")

# Construct the Tools agent
agent = create_tool_calling_agent(llm=llm_chat, tools=query_engine_tools_llamaindex_to_langchain, prompt=prompt)
# Create an agent executor by passing in the agent and tools
agent_executor = AgentExecutor(agent=agent, tools=query_engine_tools_llamaindex_to_langchain, verbose=True, return_intermediate_steps=True, handle_parsing_errors=True, max_iterations=10)

Once we've created the agent executor, we can pass a query to the agent.

In [8]:
question =  "What influenced Nikola Tesla to become an inventor?"

response = agent_executor.invoke({"input": question})
print("\nFinal Response:", response['output'])



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m
Invoking: `webtext_1k` with `{'input': 'What influenced Nikola Tesla to become an inventor?'}`


[0m

  output_str = self._llm.predict(prompt, **kwargs)


[33;1m[1;3mBased on the provided text, it does not explicitly state what influenced Nikola Tesla to become an inventor. However, it can be inferred that his curiosity and fascination with electricity and vibrations likely played a significant role in shaping his interest in inventing.

Tesla's observation of objects responding differently to vibrations and his desire to explore this phenomenon further suggest a natural inclination towards scientific inquiry and experimentation. His focus on understanding the properties of sustained powerful vibrations and attempting to demonstrate its effects on a large scale indicate a passion for discovery and innovation, which are hallmarks of an inventor.

Therefore, while the text does not provide direct evidence of what influenced Tesla's interest in inventing, his actions and thought processes depicted in the narrative suggest that his curiosity about electricity and vibrations likely played a significant role in shaping his inventive endeavor

We can observe how the agent invoked the `webtext` retriever to get information, analyzed the provided information and gave a final response. 

### Custom tools

Finally, we can create the some custom tools to solve mathematical operations that LLM's tend to fail. To to do that, we must define a custom fuction tailored to each task.

In [43]:
from langchain_core.pydantic_v1 import BaseModel, Field
from langchain_core.tools import tool, Tool

class multiply_input(BaseModel):
    a : int = Field(..., description="First integer for the multiply operation")
    b : int = Field(..., description="Second integer for the multiply operation")

@tool(args_schema=multiply_input)
def multiply(a:int, b:int)->int:
    """
    Multiply two integers.
    """
    return a*b

class add_input(BaseModel):
    a : int = Field(..., description="Firs integer for add operation")
    b : int = Field(..., description="Second integer for the add operation")

@tool(args_schema=add_input)
def add(a:int, b:int)->int:
    """
    Add two integers.
    """
    return a+b

Once we have created the methods, we transform them to tools that can be used by the agent.

In [44]:
all_tools = [multiply, add]
tool_names = ["multiply", "add"]

These tools can be used to construct an `ObjectIndex`, which is a wrapper class linking a `VectorStoreIndex` with multiple possible tools. Intially, it's necessary to utilize the `SimpleToolNodeMapping` tool to transform the tool implementatiosn into nodes and then tie everything together.

In [45]:
from langchain_core.output_parsers.string import StrOutputParser

system_context = "You are an assistant for mathematical operations.\
    You will answer to mathematical questions."

# We create the prompt for the chat
prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            system_context,
        ),
        ("placeholder", "{chat_history}"),
        ("human", "{input}"),
        ("placeholder", "{agent_scratchpad}"),
    ]
)

llm_chat = ChatOllama(model="llama3.1:8b")

# Construct the Tools agent
agent = create_tool_calling_agent(llm=llm_chat, tools=all_tools, prompt=prompt)
# Create an agent executor by passing in the agent and tools
agent_executor = AgentExecutor(agent=agent, tools=all_tools, verbose=True, return_intermediate_steps=True, handle_parsing_errors=True, max_iterations=10)

In [53]:
question_add =  "What is the result of adding 3 to 5"
response_add= agent_executor.invoke({"input": question_add})
print("\nFinal Response:", response_add['output'])

question_mult =  "What is the result of multiplying 3 to 5"
response_mult= agent_executor.invoke({"input": question_mult})
print("\nFinal Response:", response_mult['output'])



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m
Invoking: `add` with `{'a': 3, 'b': 5}`


[0m[33;1m[1;3m8[0m
[32;1m[1;3m[0m

[1m> Finished chain.[0m

Final Response: 8


[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m
Invoking: `multiply` with `{'a': 3, 'b': 5}`


[0m[36;1m[1;3m15[0m
[32;1m[1;3m[0m

[1m> Finished chain.[0m

Final Response: 15


As we can see, we have been able to create an agent that is able to solver mathematical questions. As we can see, we've created a new agent that has no access to the RAG tools that we implemented before.

However, **can we implement a multi-agent system?. We will show the alternatives for this problem in the next section.**