# Llama 3.1 Rag Agent with LlamaIndex

<a target="_blank" href="https://colab.research.google.com/github/ytang07/ai_agents_cookbooks/blob/main/llamaindex/llama31_8b_rag_agent.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

This notebook will walk you through building a LlamaIndex ReactAgent using Llama 3.1 70b. We will be using [OctoAI](https://octo.ai) as our embeddings and llm provider.

## Install Dependencies

In [12]:
# ! pip install -qU llama-index llama-index-llms-openai llama-index-readers-file octoai llama-index-llms-octoai llama-index-embeddings-octoai llama-index-embeddings-openai llama-index-llms-openai-like

# ! pip freeze | grep llama-index-core
# ! pip freeze | grep embeddings-openai

## Setup API Keys
To run the rest of the notebook you will need access to an OctoAI API key. You can sign up for an account [here](https://octoai.cloud/). If you need further guidance you can check OctoAI's [documentation page](https://octo.ai/docs/getting-started/how-to-create-octoai-access-token).

In [29]:
from os import environ
from getpass import getpass
environ["OCTOAI_API_KEY"] = "eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCIsImtpZCI6IjNkMjMzOTQ5In0.eyJzdWIiOiJjOTMwYzBkZC0zYzkyLTQyZDEtYTc1My1lN2ViNjgzOGU1MzIiLCJ0eXBlIjoidXNlckFjY2Vzc1Rva2VuIiwidGVuYW50SWQiOiJmMmRiOTI5Mi05NjM0LTQ1ZmEtYWUyOS05ODAyYTdkZWNkYzQiLCJ1c2VySWQiOiI3ZDk3ZTIzOS0xNzQzLTQ2MWUtYjRiOS05N2Q1YTAyNTYxNTEiLCJhcHBsaWNhdGlvbklkIjoiYTkyNmZlYmQtMjFlYS00ODdiLTg1ZjUtMzQ5NDA5N2VjODMzIiwicm9sZXMiOlsiRkVUQ0gtUk9MRVMtQlktQVBJIl0sInBlcm1pc3Npb25zIjpbIkZFVENILVBFUk1JU1NJT05TLUJZLUFQSSJdLCJhdWQiOiIzZDIzMzk0OS1hMmZiLTRhYjAtYjdlYy00NmY2MjU1YzUxMGUiLCJpc3MiOiJodHRwczovL2lkZW50aXR5Lm9jdG8uYWkiLCJpYXQiOjE3MjQ1MjE3NDZ9.MfKO6we42NX5vLGuVZwwCo46x4nT8BUX42gVGK3YcWZ3nLGye98hIgVC_p0cYelVRvg6yfxTXo--XL-0NJ3Nc3A6W9IvpiKZLhW9oBq2QzSGZxn3yK-0JqNANPqp6BRiV6V-dcu_XgL2fN3rVhkaDHJU-MIXRsnVxfESc1Ks2G_jpTDMmYRDSx3vNkoQzHTOJ5FsdVcdD33E6LzbHC_Q7hhlqUsaLgR-WwP4p6HchaQlo_2aTvCiLmG1xy8vZlcWJ-DmQoXU-BYuHubTyYhZ8gpeC3_nuqMgZW_pVPd-Y38EO9eewqlRfYLo8XZdLlqCYcCkbfHspfBjgKrn01tQIw"
from dotenv import load_dotenv

load_dotenv()

OCTOAI_API_KEY = environ["OCTOAI_API_KEY"]

In [30]:
#!pip install load_dotenv

## Import libraries and setup LlamaIndex

In [31]:
from llama_index.core import (
    SimpleDirectoryReader,
    VectorStoreIndex,
    StorageContext,
    load_index_from_storage,
)
from llama_index.core.tools import QueryEngineTool, ToolMetadata
from llama_index.embeddings.octoai import OctoAIEmbedding
from llama_index.core import Settings as LlamaGlobalSettings
from llama_index.core.agent import ReActAgent
from llama_index.llms.openai_like import OpenAILike

# Set the default model to use for embeddings
LlamaGlobalSettings.embed_model = OctoAIEmbedding()

# Create an llm object to use for the QueryEngine and the ReActAgent
llm = OpenAILike(
    model="meta-llama-3.1-70b-instruct",
    api_base="https://text.octoai.run/v1",
    api_key=environ["OCTOAI_API_KEY"],
    context_window=40000,
    is_function_calling_model=True,
    is_chat_model=True,
)


## Load Documents

In [32]:
try:
    storage_context = StorageContext.from_defaults(
        persist_dir="./storage2/msim"
    )
    msim_index = load_index_from_storage(storage_context)

    index_loaded = True
except:
    index_loaded = False

This is the point we create our vector indexes, by calculating the embedding vectors for each of the chunks. You only need to run this once.

In [33]:
import os
if not index_loaded:
    # load data
    msim_docs = SimpleDirectoryReader(
        input_files=["pdfs/"+ x for x in os.listdir("pdfs/")], filename_as_id=True).load_data()
    print("hello")
    # build index
    msim_index = VectorStoreIndex.from_documents(msim_docs, show_progress=True)

    # persist index
    msim_index.storage_context.persist(persist_dir="./storage2/msim")

hello


Parsing nodes:   0%|          | 0/204 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/204 [00:00<?, ?it/s]

In [34]:
from llama_index.core.query_engine import CitationQueryEngine

Now create the query engines.

In [35]:
msim_engine = CitationQueryEngine.from_args(
    msim_index,
    similarity_top_k=3,
    citation_chunk_size=1024,llm=llm
)

We can now define the query engines as tools that will be used by the agent.

As there is a query engine per document we need to also define one tool for each of them.

In [36]:
query_engine_tools = [
    QueryEngineTool(
        query_engine=msim_engine,
        metadata=ToolMetadata(
            name="msim",
            description=(
                "Provides information about MSIM program "
                "Use a detailed plain text question as input to the tool."
            ),
        ),
    )
]

## Creating the Agent
Now we have all the elements to create a LlamaIndex ReactAgent

In [37]:
agent = ReActAgent.from_tools(
    query_engine_tools,
    llm=llm,
    max_turns=10,
    verbose=True
)

Now we can interact with the agent and ask a question.

In [38]:
response = agent.chat("What are the different tracks of MSIM? Cite the file of the source")
print(response.response)

> Running step dde75dbb-fc67-477e-9c58-8e11eda7b143. Step input: What are the different tracks of MSIM? Cite the file of the source
[1;3;38;5;200mThought: The current language of the user is: English. I need to use a tool to help me answer the question.
Action: msim
Action Input: {'input': 'MSIM tracks'}
[0m[1;3;34mObservation: The MSIM program is offered on three tracks: Early-Career, Early-Career Accelerated, and Mid-Career [3].
[0m> Running step c549df63-c192-4eb9-bb5b-b151293a92a2. Step input: None
[1;3;38;5;200mThought: I can answer without using any more tools. I'll use the user's language to answer
Answer: The different tracks of MSIM are Early-Career, Early-Career Accelerated, and Mid-Career.
[0mThe different tracks of MSIM are Early-Career, Early-Career Accelerated, and Mid-Career.


In [41]:
from os import environ, listdir, path
from getpass import getpass
from dotenv import load_dotenv
from llama_index.core import (
    SimpleDirectoryReader,
    VectorStoreIndex,
    StorageContext,
    load_index_from_storage,
)
from llama_index.core.tools import QueryEngineTool, ToolMetadata
from llama_index.embeddings.octoai import OctoAIEmbedding
from llama_index.core import Settings as LlamaGlobalSettings
from llama_index.core.agent import ReActAgent
from llama_index.llms.openai_like import OpenAILike
from llama_index.core.query_engine import CitationQueryEngine

# Load environment variables
load_dotenv()

# Set API Key for OctoAI
environ["OCTOAI_API_KEY"] = "eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCIsImtpZCI6IjNkMjMzOTQ5In0.eyJzdWIiOiJjOTMwYzBkZC0zYzkyLTQyZDEtYTc1My1lN2ViNjgzOGU1MzIiLCJ0eXBlIjoidXNlckFjY2Vzc1Rva2VuIiwidGVuYW50SWQiOiJmMmRiOTI5Mi05NjM0LTQ1ZmEtYWUyOS05ODAyYTdkZWNkYzQiLCJ1c2VySWQiOiI3ZDk3ZTIzOS0xNzQzLTQ2MWUtYjRiOS05N2Q1YTAyNTYxNTEiLCJhcHBsaWNhdGlvbklkIjoiYTkyNmZlYmQtMjFlYS00ODdiLTg1ZjUtMzQ5NDA5N2VjODMzIiwicm9sZXMiOlsiRkVUQ0gtUk9MRVMtQlktQVBJIl0sInBlcm1pc3Npb25zIjpbIkZFVENILVBFUk1JU1NJT05TLUJZLUFQSSJdLCJhdWQiOiIzZDIzMzk0OS1hMmZiLTRhYjAtYjdlYy00NmY2MjU1YzUxMGUiLCJpc3MiOiJodHRwczovL2lkZW50aXR5Lm9jdG8uYWkiLCJpYXQiOjE3MjQ1MjE3NDZ9.MfKO6we42NX5vLGuVZwwCo46x4nT8BUX42gVGK3YcWZ3nLGye98hIgVC_p0cYelVRvg6yfxTXo--XL-0NJ3Nc3A6W9IvpiKZLhW9oBq2QzSGZxn3yK-0JqNANPqp6BRiV6V-dcu_XgL2fN3rVhkaDHJU-MIXRsnVxfESc1Ks2G_jpTDMmYRDSx3vNkoQzHTOJ5FsdVcdD33E6LzbHC_Q7hhlqUsaLgR-WwP4p6HchaQlo_2aTvCiLmG1xy8vZlcWJ-DmQoXU-BYuHubTyYhZ8gpeC3_nuqMgZW_pVPd-Y38EO9eewqlRfYLo8XZdLlqCYcCkbfHspfBjgKrn01tQIw"

# Set the default model to use for embeddings
LlamaGlobalSettings.embed_model = OctoAIEmbedding()

# Create an LLM object to use for the QueryEngine and the ReActAgent
llm = OpenAILike(
    model="meta-llama-3.1-70b-instruct",
    api_base="https://text.octoai.run/v1",
    api_key=environ["OCTOAI_API_KEY"],
    context_window=40000,
    is_function_calling_model=True,
    is_chat_model=True,
)

# Set up storage and loading of the index
try:
    storage_context = StorageContext.from_defaults(persist_dir="./storage/msim")
    msim_index = load_index_from_storage(storage_context)
    print("Index loaded successfully.")
    index_loaded = True
except Exception as e:
    print(f"Failed to load index: {e}")
    index_loaded = False

# Load documents from the 'pdfs' directory if the index isn't loaded
if not index_loaded:
    try:
        pdf_dir = "pdfs/"
        
        # Check if the directory exists and contains files
        if not path.exists(pdf_dir):
            raise FileNotFoundError(f"Directory does not exist: {pdf_dir}")
        
        pdf_files = [f for f in listdir(pdf_dir) if path.isfile(path.join(pdf_dir, f))]
        
        if not pdf_files:
            raise FileNotFoundError(f"No PDF files found in directory: {pdf_dir}")
        
        # Load documents using SimpleDirectoryReader
        msim_docs = SimpleDirectoryReader(input_files=[path.join(pdf_dir, x) for x in pdf_files], filename_as_id=True).load_data()
        
        # Debugging print: Check if documents are loaded properly
        print(f"Loaded documents: {[doc.metadata['filename'] for doc in msim_docs]}")

        # Build the index from the loaded documents
        msim_index = VectorStoreIndex.from_documents(msim_docs, show_progress=True)
        print("Index built successfully.")
        
        # Persist the index for future use
        msim_index.storage_context.persist(persist_dir="./storage/msim")
        print("Index persisted successfully.")
    except Exception as e:
        print(f"Error loading PDF files or building the index: {e}")

# Create the query engine with proper citation handling
try:
    query_engine = CitationQueryEngine.from_args(
        msim_index,
        similarity_top_k=3,
        citation_chunk_size=1024,
        llm=llm,
        citation_formatter=lambda doc_ids: [doc.metadata['filename'] for doc in msim_docs if doc.metadata['doc_id'] in doc_ids]
    )
    msim_engine = CitationQueryEngine.from_args(
        msim_index,
        similarity_top_k=3,
        citation_chunk_size=1024,
        llm=llm,
        citation_formatter=lambda doc_ids: [doc.metadata['filename'] for doc in msim_docs if doc.metadata['doc_id'] in doc_ids]
    )

    query_engine_tools = [
        QueryEngineTool(
            query_engine=msim_engine,
            metadata=ToolMetadata(
                name="msim",
                description=(
                    "Provides information about MSIM program "
                    "Use a detailed plain text question as input to the tool."
                ),
            ),
        )
    ]

    # Set up the agent with the tools
    new_agent = ReActAgent.from_tools(
        query_engine_tools,
        llm=llm,
        max_turns=10,
        verbose=True
    )
    print("Query engine and agent set up successfully.")
except Exception as e:
    print(f"Error setting up query engine or agent: {e}")


Enter your OctoAI API key:  ········


Index loaded successfully.
Query engine and agent set up successfully.


In [42]:
response = new_agent.chat("What are the different tracks of MSIM? Cite the file of the source")
print(response.response)

> Running step 15ae44c1-b5dc-49fb-8b47-f61a15cbbaca. Step input: What are the different tracks of MSIM? Cite the file of the source
[1;3;38;5;200mThought: The current language of the user is: English. I need to use a tool to help me answer the question.
Action: msim
Action Input: {'input': 'MSIM tracks'}
[0m[1;3;34mObservation: The MSIM program is offered on three tracks: Early-Career, Early-Career Accelerated, and Mid-Career [3].
[0m> Running step c1329057-dc1f-42a9-b252-94d67ffdd340. Step input: None
[1;3;38;5;200mThought: I can answer without using any more tools. I'll use the user's language to answer
Answer: The different tracks of MSIM are Early-Career, Early-Career Accelerated, and Mid-Career.
[0mThe different tracks of MSIM are Early-Career, Early-Career Accelerated, and Mid-Career.


In [45]:
from os import environ, listdir, path
from getpass import getpass
from dotenv import load_dotenv
from llama_index.core import (
    SimpleDirectoryReader,
    VectorStoreIndex,
    StorageContext,
    load_index_from_storage,
)
from llama_index.core.tools import QueryEngineTool, ToolMetadata
from llama_index.embeddings.octoai import OctoAIEmbedding
from llama_index.core import Settings as LlamaGlobalSettings
from llama_index.core.agent import ReActAgent
from llama_index.llms.openai_like import OpenAILike
from llama_index.core.query_engine import CitationQueryEngine

# Load environment variables
load_dotenv()

# Set API Key for OctoAI
environ["OCTOAI_API_KEY"] = "eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCIsImtpZCI6IjNkMjMzOTQ5In0.eyJzdWIiOiJjOTMwYzBkZC0zYzkyLTQyZDEtYTc1My1lN2ViNjgzOGU1MzIiLCJ0eXBlIjoidXNlckFjY2Vzc1Rva2VuIiwidGVuYW50SWQiOiJmMmRiOTI5Mi05NjM0LTQ1ZmEtYWUyOS05ODAyYTdkZWNkYzQiLCJ1c2VySWQiOiI3ZDk3ZTIzOS0xNzQzLTQ2MWUtYjRiOS05N2Q1YTAyNTYxNTEiLCJhcHBsaWNhdGlvbklkIjoiYTkyNmZlYmQtMjFlYS00ODdiLTg1ZjUtMzQ5NDA5N2VjODMzIiwicm9sZXMiOlsiRkVUQ0gtUk9MRVMtQlktQVBJIl0sInBlcm1pc3Npb25zIjpbIkZFVENILVBFUk1JU1NJT05TLUJZLUFQSSJdLCJhdWQiOiIzZDIzMzk0OS1hMmZiLTRhYjAtYjdlYy00NmY2MjU1YzUxMGUiLCJpc3MiOiJodHRwczovL2lkZW50aXR5Lm9jdG8uYWkiLCJpYXQiOjE3MjQ1MjE3NDZ9.MfKO6we42NX5vLGuVZwwCo46x4nT8BUX42gVGK3YcWZ3nLGye98hIgVC_p0cYelVRvg6yfxTXo--XL-0NJ3Nc3A6W9IvpiKZLhW9oBq2QzSGZxn3yK-0JqNANPqp6BRiV6V-dcu_XgL2fN3rVhkaDHJU-MIXRsnVxfESc1Ks2G_jpTDMmYRDSx3vNkoQzHTOJ5FsdVcdD33E6LzbHC_Q7hhlqUsaLgR-WwP4p6HchaQlo_2aTvCiLmG1xy8vZlcWJ-DmQoXU-BYuHubTyYhZ8gpeC3_nuqMgZW_pVPd-Y38EO9eewqlRfYLo8XZdLlqCYcCkbfHspfBjgKrn01tQIw"

# Set the default model to use for embeddings
LlamaGlobalSettings.embed_model = OctoAIEmbedding()

# Create an LLM object to use for the QueryEngine and the ReActAgent
llm = OpenAILike(
    model="meta-llama-3.1-70b-instruct",
    api_base="https://text.octoai.run/v1",
    api_key=environ["OCTOAI_API_KEY"],
    context_window=40000,
    is_function_calling_model=True,
    is_chat_model=True,
)

# Set up storage and loading of the index
try:
    storage_context = StorageContext.from_defaults(persist_dir="./storage/msim")
    msim_index = load_index_from_storage(storage_context)
    print("Index loaded successfully.")
    index_loaded = True
except Exception as e:
    print(f"Failed to load index: {e}")
    index_loaded = False

# Load documents from the 'pdfs' directory if the index isn't loaded
if not index_loaded:
    try:
        pdf_dir = "pdfs/"
        
        # Check if the directory exists and contains files
        if not path.exists(pdf_dir):
            raise FileNotFoundError(f"Directory does not exist: {pdf_dir}")
        
        pdf_files = [f for f in listdir(pdf_dir) if path.isfile(path.join(pdf_dir, f))]
        
        if not pdf_files:
            raise FileNotFoundError(f"No PDF files found in directory: {pdf_dir}")
        
        # Load documents using SimpleDirectoryReader
        msim_docs = SimpleDirectoryReader(input_files=[path.join(pdf_dir, x) for x in pdf_files], filename_as_id=True).load_data()
        
        # Debugging print: Check if documents are loaded properly
        print(f"Loaded documents: {[doc.metadata['filename'] for doc in msim_docs]}")

        # Build the index from the loaded documents
        msim_index = VectorStoreIndex.from_documents(msim_docs, show_progress=True)
        print("Index built successfully.")
        
        # Persist the index for future use
        msim_index.storage_context.persist(persist_dir="./storage/msim")
        print("Index persisted successfully.")
    except Exception as e:
        print(f"Error loading PDF files or building the index: {e}")

# Create the query engine with proper citation handling
def format_citations(doc_ids):
    citations = []
    for doc_id in doc_ids:
        # Match doc_id with the loaded documents
        for doc in msim_docs:
            if doc.metadata['doc_id'] == doc_id:
                citations.append(doc.metadata['filename'])
    return citations

try:
    query_engine = CitationQueryEngine.from_args(
        msim_index,
        similarity_top_k=3,
        citation_chunk_size=1024,
        llm=llm,
        citation_formatter=format_citations  # Use custom citation formatter
    )
    msim_engine = CitationQueryEngine.from_args(
        msim_index,
        similarity_top_k=3,
        citation_chunk_size=1024,
        llm=llm,
        citation_formatter=format_citations  # Use custom citation formatter
    )

    query_engine_tools = [
        QueryEngineTool(
            query_engine=msim_engine,
            metadata=ToolMetadata(
                name="msim",
                description=(
                    "Provides information about MSIM program "
                    "Use a detailed plain text question as input to the tool."
                ),
            ),
        )
    ]

    # Set up the agent with the tools
    news_agent = ReActAgent.from_tools(
        query_engine_tools,
        llm=llm,
        max_turns=10,
        verbose=True
    )
    print("Query engine and agent set up successfully.")
except Exception as e:
    print(f"Error setting up query engine or agent: {e}")


Index loaded successfully.
Query engine and agent set up successfully.


In [46]:
response = news_agent.chat("What are the different tracks of MSIM? Cite the file of the source")
print(response.response)

> Running step bfda174d-6242-4779-880f-a0119a083274. Step input: What are the different tracks of MSIM? Cite the file of the source
[1;3;38;5;200mThought: The current language of the user is: English. I need to use a tool to help me answer the question.
Action: msim
Action Input: {'input': 'MSIM tracks'}
[0m[1;3;34mObservation: The MSIM program is offered on three tracks: Early-Career, Early-Career Accelerated, and Mid-Career [3].
[0m> Running step 29a2c860-908e-46a7-a590-578c4392c1dc. Step input: None
[1;3;38;5;200mThought: I can answer without using any more tools. I'll use the user's language to answer
Answer: The different tracks of MSIM are Early-Career, Early-Career Accelerated, and Mid-Career.
[0mThe different tracks of MSIM are Early-Career, Early-Career Accelerated, and Mid-Career.
