<a href="https://colab.research.google.com/github/StrategicalIT/PiedPiperAI/blob/MLBHigherEdTraining/Lab09.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# LAB 9: Full RAG solution with LlamaIndex
In this lab we are going to build a full RAG solution using the LlamaIndex framework that leverages models from the Nvidia NIM API catalog (including LLM and embedding models). For this, you will need an API key from NVIDIA

## Install dependencies

The first step is to install the necessary libraries. This installs the core llama-index package which draws a lot of dependencies.

In [None]:
!pip install llama-index

If you look carefully at the previous output you will notice that the only llm interface that has been installed is OpenAI. We are going to use NVIDIA NIMs including LLMs and Embedding models, so we need to install their corresponding modules.

You can see what LLM modules are available in LlamaIndex in [https://docs.llamaindex.ai/en/stable/module_guides/models/llms/modules/](https://docs.llamaindex.ai/en/stable/module_guides/models/llms/modules/)

In [None]:
!pip install llama-index-llms-nvidia llama-index-embeddings-nvidia

Now we can import the components we need for this lab.

In [None]:
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.core import Settings
from llama_index.llms.nvidia import NVIDIA
from llama_index.embeddings.nvidia import NVIDIAEmbedding
import urllib3
urllib3.disable_warnings()

## Instantiate the LLM

We need to setup the NVIDIA API key.

In [None]:
import os
#apikey = os.environ["NVIDIA_API_KEY"]
#change from OS variable import to using Google Colab secret
from google.colab import userdata
apikey = userdata.get('apikey')
os.environ["NVIDIA_API_KEY"] = apikey
#print(apikey)

LlamaIndex provides a "Settings" object that stores the most commonly used resources in a LlamaIndex workflow, ex: llm, embed_model. It is a sort of a global storage place for all default settings. If one attribute is not provided anywhere else in the code, the "Settings" object will be queried. As you will see there are several default values that are assumed if not specified. This makes the code look cleaner.

The following line will be sufficient to instantiate an LLM from the Nvidia NIM API if the right defaults apply.

In [None]:
Settings.llm = NVIDIA()

When the parameter ```base_url``` is not explicitly defined and we are using ```NVIDIA()``` then it assumes ```base_url = "https://integrate.api.nvidia.com/v1"```

If the parameter ```api_key``` is not omitted then it will try to read the variable ```NVIDIA_API_KEY``` from the environment

Also, like in previous NIM examples, if ```model``` is not present then it will assume ```meta/llama3-8b-instruct```

So for example, let's say you want to connect to a NIM that is running locally in your datacenter, you want to hard-code the key explicitly in your code and you want to use a Mistral model. Then, the Settings would look like this
```
Settings.llm = NVIDIA(
    base_url="http://nim-host-address:8000/v1",
    api_key = "nvapi-123456789abcdefg",
    model="mistralai/mistral-7b-instruct-v0.2"
```


We can verify what model we are pointing to

In [None]:
print("... Using: ", Settings.llm.model)

## Load the documents

Let's use "Simple Directory Reader" to load the documents in the "data" directory. The data directory could also be a mounted directory pointing to PowerScale.

SimpleDirectoryReader will ingest Markdown, PDFs, Word documents, PowerPoint decks, images, audio and video from the specified directory

In [None]:
import os

folder_name = "PiedPiperAIData"

# Create the folder if it doesn't exist
os.makedirs(folder_name, exist_ok=True)
!wget -O PiedPiperAIData/r760xa.pdf https://www.delltechnologies.com/asset/en-us/products/servers/technical-support/poweredge-r760xa-spec-sheet.pdf
print(f"Folder '{folder_name}' created successfully!")

Now upload documents into the folder created in the above step. You do this via the Colab UI by clicking on the folder icon (on the left side menu), then uploading docs.

Wait for the documents to **fully upload** before running the next code cell!

In [None]:
folder_name = os.path.join("/content/", folder_name)
print(folder_name)
documents = SimpleDirectoryReader(folder_name).load_data()
#print(documents)

## Instantiate the embedding model

We are going to use the "Settings" object again but this time for the embedding model. Instead of requesting a specific embed model, we leave ```model``` blank and it will select the default one for the NVIDIA module. Also, since we didn't specify ```api_key``` it will attempt to find ```NVIDIA_API_KEY``` in the environment

In [None]:
Settings.embed_model = NVIDIAEmbedding(truncate="END")
print("... Using embedding model: ", Settings.embed_model.model_name)

## Create an Index

Index is a key contruct in the LlamaIndex framework. To build it we need "Documents", an "Embedding model" and a "Storage Context". "Storage Context" is optional because when not specified, LlamaIndex uses a simple in-memory vector store that's great for quick experimentation.

Notice also how we are not specifying the ```embed_model``` parameter because it is already defined in the ```Settings``` object

In [None]:
index = VectorStoreIndex.from_documents(documents)
print(f"Number of chunks in the index: {len(index.docstore.docs)}")
# Assuming 'folder_name' still holds the path to your directory
file_count = len([f for f in os.listdir(folder_name) if os.path.isfile(os.path.join(folder_name, f))])
print(f"Number of physical files in the folder: {file_count}")

## Build the query engine

The final step is to use the "LLM" and the "Index" to build the "Query Engine". Thanks again to the ```Setttings``` object we don't need to specify ```llm=llm``` or ```embed_model=embed_model```

In [None]:
query_engine = index.as_query_engine()
print(f"Type of query_engine: {type(query_engine)}")

## Query the RAG solution

Everything is ready to start querying our RAG solution. We use the ```.query()``` method from the ```query_engine```

In [None]:
response = query_engine.query("How many pcie slots are there in a Dell R760xa?")
print(response)

## End of Lab 9