# LangChain RAG with Local Models

This is based on Pixegami's tutorial. ([original repo](https://github.com/pixegami/rag-tutorial-v2/))

## Download data and folder setup

On the **Docker host side**, run the following to set up the `jetson-containers`' `/data` directory.

```
cd jetson-containers
mkdir -p data/documents/L4T-README
cp /media/jetson/L4T-README/*.txt data/documents/L4T-README/
```

This in turn creates the mounted volume `/data/documents/L4T-README` inside the container.<br> 
Your directory structure should look like this:

```
└── ./data/documents/L4T-README
    ├── INDEX.txt
    ├── README-usb-dev-mode.txt
    ├── README-vnc.txt
    └── README-wifi.txt
```

You can check this with running a following bash command in the following cell.

In [1]:
!ls -Rl /data/documents/L4T-README

/data/documents/L4T-README:
total 24
-rw-rw-r-- 1 1000 1000  1104 May  6 22:42 INDEX.txt
-rw-rw-r-- 1 1000 1000 11126 May  6 22:42 README-usb-dev-mode.txt
-rw-rw-r-- 1 1000 1000  3590 May  6 22:42 README-vnc.txt
-rw-rw-r-- 1 1000 1000  1940 May  6 22:42 README-wifi.txt


## Loading The Data

In [2]:
from langchain_community.document_loaders import DirectoryLoader

DATA_PATH = '/data/documents/L4T-README'

def load_documents():
    document_loader = DirectoryLoader(DATA_PATH, glob="*.txt")
    return document_loader.load()

In [3]:
documents = load_documents()
print(len(documents))
print(documents[0])

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /root/nltk_data...
[nltk_data]   Unzipping taggers/averaged_perceptron_tagger.zip.


4


## Split The Documents 

In [4]:
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain.schema.document import Document

def split_documents(documents: list[Document]):
    text_splitter = RecursiveCharacterTextSplitter(
        chunk_size=800,
        chunk_overlap=50,
        length_function=len,
        is_separator_regex=False,
    )
    return text_splitter.split_documents(documents)

In [5]:
chunks = split_documents(documents)
print(len(chunks))
print(chunks[0])

27


## Embedding Models

There are couple of options for the embedding model.

### Running **Local** Embedding Model

#### Option 1: Open embedding model running locally

Besides the LLM, you can run the embedding model locally on Jetson as well.

Example embedding models available on `ollama` are listed on an [Ollama blog article](https://ollama.com/blog/embedding-models).

| Model | Parameter Size | Link |
| ----- | --------------:| ---- |
|`mxbai-embed-large`|334M|[link](https://ollama.com/library/mxbai-embed-large)|
|`snowflake-arctic-embed`|334M|[link](https://ollama.com/library/snowflake-arctic-embed)|
|`nomic-embed-text`|137M|[link](https://ollama.com/library/nomic-embed-text)|
|`all-minilm`|23M|[link](https://ollama.com/library/all-minilm)|

Here, we try `mxbai-embed-large`, which had proved to generate good enough embeddings for our sample documents.

First, check you have already downloaded the embedding model.

In [6]:
!ollama list

NAME                    	ID          	SIZE  	MODIFIED       
mxbai-embed-large:latest	468836162de7	669 MB	45 minutes ago	
llama3:70b              	be39eb53a197	39 GB 	16 hours ago  	
llama3:latest           	a6990ed6be41	4.7 GB	16 hours ago  	
nomic-embed-text:latest 	0a109f422b47	274 MB	23 hours ago  	
llama2:latest           	78e26419b446	3.8 GB	10 days ago   	
mistral:latest          	61e88e884507	4.1 GB	2 weeks ago   	
llama2:70b              	e7f6c06ffef4	38 GB 	2 weeks ago   	
llama2:13b              	d475bf4c50bc	7.4 GB	2 weeks ago   	


If not, use `ollama pull` command to pull the model.

In [None]:
!ollama pull nomic-embed-text

In [7]:
from langchain_community.embeddings.ollama import OllamaEmbeddings

def get_embedding_model():
    embedding_model = OllamaEmbeddings(model="mxbai-embed-large")
    return embedding_model

### Using Cloud Embedding Model

Some people believe cloud hosted embedding modles perform more accurately.

#### Option 2: OpenAI Embedding model

[API Reference: `OpenAIEbmeddings`](https://api.python.langchain.com/en/latest/embeddings/langchain_openai.embeddings.base.OpenAIEmbeddings.html)

> **Remember to put YOUR own OpenAI API key in the following cell.** 

In [None]:
OPENAI_API_KEY = ""

In [None]:
from langchain_openai import OpenAIEmbeddings

def get_embedding_model():
    embedding_model = OpenAIEmbeddings(
        model = "text-embedding-3-large",
        openai_api_key = OPENAI_API_KEY
    )
    return embedding_model

#### Option 3: AWS Bedrock Embedding model

[API Reference: `langchain_community.embeddings.bedrock`](https://api.python.langchain.com/en/latest/llms/langchain_community.llms.bedrock.Bedrock.html)

> **Note, you need to set up AWS profile separately.**

In [None]:
from langchain_community.embeddings.bedrock import BedrockEmbeddings

def get_embedding_model():
    embedding_model = BedrockEmbeddings(
        credentials_profile_name="default", region_name="us-east-1"
    )
    return embedding_model

## Creating The Vector Store

We are going to create the vector store with embeddings and save it in a directory as files.<br>
Here, the directory is defined to be "**chromadb**".

In [8]:
CHROMA_PATH = "chromadb"

Remove the directory if it has previously been created and populated.

> If you are re-running with different embedding model, removing the persisted directory may not be enough.<br>The work-around would be to restart the Python kernel.


In [9]:
%%bash -s "$CHROMA_PATH"
rm -rf $1

### Vector store to be created with embedding model specified

In [10]:
from langchain.vectorstores.chroma import Chroma

def add_to_chroma(chunks: list[Document]):
    vectorstore = Chroma(
        persist_directory=CHROMA_PATH, 
        embedding_function=get_embedding_model()
    )
    vectorstore.add_documents(chunks)
    vectorstore.persist()

In [11]:
add_to_chroma(chunks)

  warn_deprecated(


Let's check what files are saved and the size of each file.

In [12]:
%%bash -s "$CHROMA_PATH"
du -ah ./$1

4.0K	./chromadb/451ecaa3-56ba-4076-afb8-793a024dce9e/length.bin
4.0K	./chromadb/451ecaa3-56ba-4076-afb8-793a024dce9e/header.bin
4.1M	./chromadb/451ecaa3-56ba-4076-afb8-793a024dce9e/data_level0.bin
0	./chromadb/451ecaa3-56ba-4076-afb8-793a024dce9e/link_lists.bin
4.1M	./chromadb/451ecaa3-56ba-4076-afb8-793a024dce9e
404K	./chromadb/chroma.sqlite3
4.5M	./chromadb


## Running RAG Query Locally

Below defines the template for the prompt to eventually sent to our LLM.

In [13]:
PROMPT_TEMPLATE = """
Answer the question based only on the following context:
{context}

---
Answer the question based on the above context: {question}
"""

The actual question is supplied as below.

In [14]:
query_text="What IPv4 address Jetson device gets assigned when connected to a PC with a USB cable? \
    And what file to edit in order to change the IP address to be assigned to Jetson itself in USB device mode? \
    Plesae state which section you find the answer for each question."

### Load vector store from persisted files with embedding model specified

In [15]:
vectorstore = Chroma(
    persist_directory=CHROMA_PATH, 
    embedding_function=get_embedding_model()
)

### Search the vector store for retrieving relevant context

Top 5 relevant chunks are retrieved.

In [16]:
results = vectorstore.similarity_search_with_score(query_text, k=5)

In [17]:
from langchain.prompts import ChatPromptTemplate

context_text = "\n\n---\n\n".join([doc.page_content for doc, _score in results])
prompt_template = ChatPromptTemplate.from_template(PROMPT_TEMPLATE)
prompt = prompt_template.format(context=context_text, question=query_text)

Final `prompt` is generated with context filled.

If you decide to print out the generated prompt, run the following cell.

In [18]:
print(prompt)

Human: 
Answer the question based only on the following context:
Linux for Tegra assigns a static IPv4 address of 192.168.55.1 to Jetson, and runs a DHCP server to automatically assign an IPv4 address of 192.168.55.100 to your host machine. This provides point-to-point connectivity. If a Jetson device experiences very high CPU or disk IO load, this DHCP server may fail to respond in a timely manner to requests from the host machine. This may cause IPv4 connections to drop. If this problem occurs, configure your host machine to use a static IPv4 address of 192.168.55.100 with netmask 255.255.255.0 and no gateway or DNS servers.

---

If you connect multiple Jetson devices to the same host machine, each Jetson device uses the same IPv4 address. This prevents IPv4-based communication with all but one Jetson device, since your host operating system determines which Jetson device it communicates with. To solve this, edit the Jetson-based script that sets up the network and assign a unique n

### Define LLM

Define the local LLM using `Ollama` to be invoked with the prompt.

In [19]:
from langchain_community.llms.ollama import Ollama

model = Ollama(model="llama3")

If you have not downloaded `llama3` model and the above cell failed, run the following cell and come back to execute the above cell again.

In [None]:
!ollama pull llama3

#### Alternative: Using OpenAI LLM

In case you wanted to try OpenAI LLM, you can run the following cell. 

In [None]:
from langchain_openai import OpenAI

model = OpenAI(
    model="gpt-3.5-turbo-instruct",
    openai_api_key = OPENAI_API_KEY
)

#### Running the LLM 

The following cell runs the Llama3 model on Ollama with the prompt.

> If you open a Terminal on the side (you can dock a pane on the side on JupyterLab) and runs `jtop`, you can check the GPU and other resource usage.

In [20]:
response_text = model.invoke(prompt)

Let's check the LLM output.

In [21]:
from IPython.display import display, Markdown, Latex
display(Markdown(response_text))

Based on the provided context:

**What IPv4 address Jetson device gets assigned when connected to a PC with a USB cable?**

According to Section "Linux for Tegra", Linux for Tegra assigns a static IPv4 address of 192.168.55.1 to Jetson, and runs a DHCP server to automatically assign an IPv4 address of 192.168.55.100 to the host machine.

**What file to edit in order to change the IP address to be assigned to Jetson itself in USB device mode?**

According to Section "Changing the IPv4 Address", you need to edit `/opt/nvidia/l4t-usb-device-mode/nv-l4t-usb-device-mode-config.sh` on Jetson to change the IPv4 network parameters.