# Llama 3.1 Rag Agent with LlamaIndex

<a target="_blank" href="https://colab.research.google.com/github/ytang07/ai_agents_cookbooks/blob/main/llamaindex/llama31_8b_rag_agent.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

This notebook will walk you through building a LlamaIndex ReactAgent using Llama 3.1 70b. We will be using [OctoAI](https://octo.ai) as our embeddings and llm provider.

## Install Dependencies

In [2]:
! pip install -qU llama-index llama-index-llms-openai llama-index-readers-file octoai llama-index-llms-octoai llama-index-embeddings-octoai llama-index-embeddings-openai llama-index-llms-openai-like

! pip freeze llama-index-core
! pip freeze embeddings-openai


aiohappyeyeballs==2.4.0
aiohttp==3.10.5
aiosignal==1.3.1
annotated-types==0.7.0
anyio==4.4.0
argon2-cffi==23.1.0
argon2-cffi-bindings==21.2.0
arrow==1.3.0
asttokens==2.4.1
async-lru==2.0.4
attrs==24.2.0
babel==2.16.0
beautifulsoup4==4.12.3
bleach==6.1.0
boto3==1.35.5
boto3-stubs==1.35.5
botocore==1.35.5
botocore-stubs==1.35.5
by==0.0.4
certifi==2024.7.4
cffi==1.17.0
chainmap==1.0.3
charset-normalizer==3.3.2
click==8.1.7
colorama==0.4.6
combomethod==1.0.12
comm==0.2.2
dataclasses-json==0.6.7
debugpy==1.8.5
decorator==5.1.1
defusedxml==0.7.1
Deprecated==1.2.14
dirtyjson==1.0.8
distro==1.9.0
docutils==0.21.2
executing==2.0.1
fastjsonschema==2.20.0
filelock==3.15.4
fqdn==1.5.1
frozenlist==1.4.1
fsspec==2024.6.1
greenlet==3.0.3
h11==0.14.0
httpcore==1.0.5
httpx==0.27.0
httpx-sse==0.4.0
huggingface-hub==0.24.6
idna==3.8
ipykernel==6.29.5
ipython==8.26.0
ipywidgets==8.1.5
isoduration==20.11.0
jedi==0.19.1
Jinja2==3.1.4
jiter==0.5.0
jmespath==1.0.1
joblib==1.4.2
json5==0.9.25
jsonpointer==3.0.

## Setup API Keys
To run the rest of the notebook you will need access to an OctoAI API key. You can sign up for an account [here](https://octoai.cloud/). If you need further guidance you can check OctoAI's [documentation page](https://octo.ai/docs/getting-started/how-to-create-octoai-access-token).

In [3]:
from os import environ
from getpass import getpass
environ["OCTOAI_API_KEY"] = getpass("Input your OCTOAI API key: ")
from dotenv import load_dotenv

load_dotenv()

OCTOAI_API_KEY = environ["OCTOAI_API_KEY"]

## Import libraries and setup LlamaIndex

In [4]:
from llama_index.core import (
    SimpleDirectoryReader,
    VectorStoreIndex,
    StorageContext,
    load_index_from_storage,
)
from llama_index.core.tools import QueryEngineTool, ToolMetadata
from llama_index.embeddings.octoai import OctoAIEmbedding
from llama_index.core import Settings as LlamaGlobalSettings
from llama_index.core.agent import ReActAgent
from llama_index.llms.openai_like import OpenAILike

# Set the default model to use for embeddings
LlamaGlobalSettings.embed_model = OctoAIEmbedding()

# Create an llm object to use for the QueryEngine and the ReActAgent
llm = OpenAILike(
    model="meta-llama-3.1-70b-instruct",
    api_base="https://text.octoai.run/v1",
    api_key=environ["OCTOAI_API_KEY"],
    context_window=40000,
    is_function_calling_model=True,
    is_chat_model=True,
)


None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.


## Load Documents

In [5]:
try:
    storage_context = StorageContext.from_defaults(
        persist_dir="./storage/disease"
    )
    disease_index = load_index_from_storage(storage_context)

    storage_context = StorageContext.from_defaults(
        persist_dir="./storage/treatments"
    )
    treatment_index = load_index_from_storage(storage_context)

    index_loaded = True
except:
    index_loaded = False

This is the point we create our vector indexes, by calculating the embedding vectors for each of the chunks. You only need to run this once.

In [6]:
#if not index_loaded:

import os

# Get the current directory
current_directory = os.getcwd()

# load data
disease_docs = SimpleDirectoryReader(
    input_files=["./Disease_symptom_and_patient_profile_dataset_flat.csv"]
).load_data()

treatments_docs = SimpleDirectoryReader(
    input_files=["./Disease_treatments_flat.csv"]
).load_data()

# build index
disease_index = VectorStoreIndex.from_documents(disease_docs, show_progress=True)
treatments_index = VectorStoreIndex.from_documents(treatments_docs, show_progress=True)

# persist index
disease_index.storage_context.persist(persist_dir="./storage/disease")
treatments_index.storage_context.persist(persist_dir="./storage/treatments")


Parsing nodes:   0%|          | 0/1 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/14 [00:00<?, ?it/s]

Parsing nodes:   0%|          | 0/1 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/3 [00:00<?, ?it/s]

Now create the query engines.

In [7]:
disease_engine = disease_index.as_query_engine(similarity_top_k=3, llm=llm)
treatments_engine = treatments_index.as_query_engine(similarity_top_k=3, llm=llm)


We can now define the query engines as tools that will be used by the agent.

As there is a query engine per document we need to also define one tool for each of them.

In [8]:
query_engine_tools = [
    QueryEngineTool(
        query_engine=disease_engine,
        metadata=ToolMetadata(
            name="disease_csv",
            description=(
                "Provides symptom and disease data from many patients."
                "Use a detailed plain text question as input to the tool."
            ),
        ),
    ),
    QueryEngineTool(
        query_engine=treatments_engine,
        metadata=ToolMetadata(
            name="treatments_csv",
            description=(
                "Provides diseases and matching treatment recommendations."
                "Use a detailed plain text question as input to the tool."
            ),
        ),
    )
]

## Creating the Agent
Now we have all the elements to create a LlamaIndex ReactAgent

In [9]:
agent = ReActAgent.from_tools(
    query_engine_tools,
    llm=llm,
    verbose=True,
    max_turns=10,
)

Now we can interact with the agent and ask a question.

In [11]:
prompt1 = input("what are your symptoms to diognise?")


response = agent.chat(prompt1)
print(str(response))

> Running step d169f815-1275-4026-a6a4-a3a9382f3217. Step input: i have fever and cough no fatigue and low blood pressure
[1;3;38;5;200mThought: The current language of the user is: English. I need to use a tool to help me narrow down the possible diseases based on the symptoms provided.
Action: disease_csv
Action Input: {'input': 'fever and cough and no fatigue and low blood pressure'}
[0m[1;3;34mObservation: None
[0m> Running step a3b02b53-e825-484a-a1e3-f53f16fd5e0b. Step input: None
[1;3;38;5;200mThought: The disease_csv tool did not return any results. I will try rephrasing the input to see if I can get any results.
Action: disease_csv
Action Input: {'input': 'fever, cough, no fatigue, low blood pressure'}
[0m[1;3;34mObservation: Anxiety Disorders
[0m> Running step 2ebfba44-eaee-4c4f-a081-75bb121936cd. Step input: None
[1;3;38;5;200mThought: The disease_csv tool returned Anxiety Disorders as a possible match, but I'm not sure if this is the correct diagnosis. I would like

ValueError: Reached max iterations.