# LAB 9: Full RAG solution with LlamaIndex
In this lab we are going to build a full RAG solution using the LlamaIndex framework that leverages models from the Nvidia NIM API catalog (including LLM and embedding models). For this, you will need an API key from NVIDIA

<font color="red"><b>IMPORTANT</b></font>: Do not use the Dell corporate VPN. It produces an error with validation of self-signed certificate

## Install dependencies

The first step is to install the necessary libraries. This installs the core llama-index package which draws a lot of dependencies.

In [1]:
!pip install llama-index

[0m

If you look carefully at the previous output you will notice that the only llm interface that has been installed is OpenAI. We are going to use NVIDIA NIMs including LLMs and Embedding models, so we need to install their corresponding modules.

You can see what LLM modules are available in LlamaIndex in [https://docs.llamaindex.ai/en/stable/module_guides/models/llms/modules/](https://docs.llamaindex.ai/en/stable/module_guides/models/llms/modules/)

In [2]:
!pip install llama-index-llms-nvidia llama-index-embeddings-nvidia llama-index-vector-stores-chroma

[0m

Now we can import the components we need for this lab.

In [3]:
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, StorageContext
from llama_index.core import Settings
from llama_index.llms.nvidia import NVIDIA
from llama_index.embeddings.nvidia import NVIDIAEmbedding
from llama_index.vector_stores.chroma import ChromaVectorStore
import chromadb
import os
import urllib3
urllib3.disable_warnings()

## Read environment variables

A best practice for managing information like the API KEY is to provide it as an environment variable to the application and get the application to read it from the environment. You can run the following command in a terminal to create the environment variable.


Important: Make sure the environment variables have been exported before starting the Jupyter server. If you haven't done so, you can shutdown the Jupyter server, create the environment variable and start it again.

    ```export NVIDIA_API_KEY=nvapi-12345abcdefgh```

Let's read the variables from the environment and store it in a variables we can use throughout our code.

In [5]:
apikey = os.environ["NVIDIA_API_KEY"]
llmurl = os.environ["LLM_URL"]
embedurl = os.environ["EMBED_URL"]
dbip = os.environ["DB_IP"]
dbport = os.environ["DB_PORT"]
#print(llmurl, embedurl, apikey, dbip, dbport)

## Instantiate the LLM

LlamaIndex provides a "Settings" object that stores the most commonly used resources in a LlamaIndex workflow, ex: llm, embed_model. It is a sort of a global storage place for all default settings. If one attribute is not provided anywhere else in the code, the "Settings" object will be queried. As you will see there are several default values that are assumed if not specified. This makes the code look cleaner.

The following line will be sufficient to instantiate an LLM from the Nvidia NIM API if the right defaults apply.

In [7]:
Settings.llm = NVIDIA(base_url = llmurl, api_key = apikey)

When the parameter ```base_url``` is not explicitly defined and we are using ```NVIDIA()``` then it assumes ```base_url = "https://integrate.api.nvidia.com/v1"```

If the parameter ```api_key``` is not omitted then it will try to read the variable ```NVIDIA_API_KEY``` from the environment

Also, like in previous NIM examples, if ```model``` is not present then it will assume ```meta/llama3-8b-instruct```

So for example, let's say you want to connect to a NIM that is running locally in your datacenter, you want to hard-code the key explicitly in your code and you want to use a Mistral model. Then, the Settings would look like this
```
Settings.llm = NVIDIA(
    base_url="http://nim-host-address:8000/v1",
    api_key = "nvapi-123456789abcdefg",
    model="mistralai/mistral-7b-instruct-v0.2")
```


We can verify what model we are pointing to

In [8]:
print("... Using: ", Settings.llm.model)

... Using:  meta/llama3-8b-instruct


## Instantiate the embedding model

We are going to use the "Settings" object again but this time for the embedding model. Instead of requesting a specific embed model, we leave ```model``` blank and it will select the default one for the Nvidia module. Also, since we didn't specify ```api_key``` it will attempt to find ```NVIDIA_API_KEY``` in the environment

In [9]:
Settings.embed_model = NVIDIAEmbedding(base_url = embedurl, api_key = apikey, truncate="END")
print("... Using: ", Settings.embed_model.model)

... Using:  nvidia/nv-embedqa-e5-v5


## ChromaDB section

remote_db = chromadb.HttpClient(host='172.24.164.68', port=8001)

In [10]:
remote_db = chromadb.HttpClient(host=dbip, port=dbport)
chroma_collection = remote_db.get_or_create_collection("quickstart")
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)  ### The vector store is linked to a specific collection
storage_context = StorageContext.from_defaults(vector_store=vector_store)

None
None
None
chromadb.telemetry.product.posthog.Posthog
chromadb.api.fastapi.FastAPI
172.24.164.68
8001
chromadb.telemetry.product.posthog.Posthog
chromadb.api.fastapi.FastAPI
None
None
None
chromadb.telemetry.product.posthog.Posthog
chromadb.api.fastapi.FastAPI
172.24.164.68
8001
chromadb.telemetry.product.posthog.Posthog
chromadb.api.fastapi.FastAPI
chromadb.telemetry.product.posthog.Posthog
chromadb.api.segment.SegmentAPI
False
False
APIVersion.V2


## Load the documents

Let's use "Simple Directory Reader" to load the documents in the "data" directory. The data directory could also be a mounted directory pointing to PowerScale.

SimpleDirectoryReader will ingest Markdown, PDFs, Word documents, PowerPoint decks, images, audio and video from the specified directory

In [11]:
documents = SimpleDirectoryReader("data").load_data()

## Create an Index

Index is a key contruct in the LlamaIndex framework. To build it we need "Documents", an "Embedding model" and a "Storage Context". "Storage Context" is optional because when not specified, LlamaIndex uses a simple in-memory vector store that's great for quick experimentation.

Notice also how we are not specifying the ```embed_model``` parameter because it is already defined in the ```Settings``` object

In [12]:
index = VectorStoreIndex.from_documents(documents, storage_context=storage_context)

## Build the query engine

The final step is to use the "LLM" and the "Index" to build the "Query Engine". Thanks again to the ```Setttings``` object we don't need to specify ```llm=llm``` or ```embed_model=embed_model``` 

In [13]:
query_engine = index.as_query_engine()

## Query the RAG solution

Everything is ready to start querying our RAG solution. We use the ```.query()``` method from the ```query_engine```

In [14]:
response = query_engine.query("What did the author do growing up?")
print(response)

The author grew up poor, which caused them to make their startup, Viaweb, even more inexpensive than they realized.


## End of Lab 9