# RAG using OpenAI LLM

Source: [Vipra Singh](<https://medium.com/@vipra_singh/building-llm-applications-introduction-part-1-1c90294b155b#4d28>)

Libraries:
1. `sentence-transformers`: For embedding
2. `PyTorch`: For CUDA operation
3. `OpenAI`: For LLM
4. `Langchain`: For chaining prompts and LLM
5. `LlamaIndex`: For indexing
6. `FAISS`: For vector storage and retrieval

In [30]:
import os
from dotenv import load_dotenv

load_dotenv()

openai_api_key = os.getenv("OPENAI_API_KEY")
file_name = os.getenv("FILE_NAME")
file_url = os.getenv("FILE_URL")

## Loading the document



### Downloading the source document

In [31]:
import requests

# Check if file already exists, if not we fetch
if not os.path.exists(file_name):
    response = requests.get(file_url, stream=True)

    with open(file_name, mode='wb') as file:
        for chunk in response.iter_content(chunk_size=256): # Bytes
            file.write(chunk)
    print(f"The file has been downloaded successfully.")
else:
    print(f"File already exists.")

File already exists.


### Loading the document into memory

In [37]:
# https://python.langchain.com/v0.2/docs/how_to/document_loader_directory/#auto-detect-file-encodings-with-textloader
# https://docs.kanaries.net/topics/LangChain/langchain-document-loader
from langchain_community.document_loaders import TextLoader

loader = TextLoader(file_path=file_name)
document = loader.load()
print(f"Loaded {len(document)} documents: ")
for file in document:
    print(f"file_name: {file.metadata['source']}")

Loaded 1 documents: 
file_name: document.txt


This returns a `Document` object which we can then access the content using `page_content`.

## Splitting and Chunking

You may want to split a long document into smaller chunks that can fit into your model's context window.

In [33]:
# https://medium.com/the-modern-scientist/building-generative-ai-applications-using-langchain-and-openai-apis-ee3212400630
# https://python.langchain.com/v0.2/docs/concepts/#text-splitters
# https://python.langchain.com/v0.2/docs/how_to/recursive_text_splitter/
from langchain_text_splitters import RecursiveCharacterTextSplitter
texts = document
print(f"Document has {len(texts)} chunk.")

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=256,
    chunk_overlap=32,
    length_function=len,
    is_separator_regex=False
)

texts = text_splitter.split_documents(document)
print(f"Document is now splitted into {len(texts)} chunks.")

Document has 1 chunks.
Document is splitted into 14 chunks.


## Embedding Models

Embedding models create a vector representation of a piece of text.

### Loading embedding model

In [44]:
model_id = 'sentence-transformers/all-MiniLM-L6-v2'

In [61]:
# Convert the chunks of list[Document] from chunking steps and getting just the content
str_sentences = []
for text in texts:
    str_sentences.append(text.page_content)

### Embedding the chunks

There are two methods I've seen on the Internet:
1. Using `sentence-transformers` directly from SBERT without Langchain integration
2. Using `HuggingFaceEmbeddings` from Langchain integration.

#### Using `sentence-transformers` directly

In [45]:
# Using SBERT Sentence-transformer
# https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2

# from sentence_transformers import SentenceTransformer

# model = SentenceTransformer(model_id)

In [74]:
# %%time
# embeddings = model.encode(str_sentences)
# print(embeddings)

**Output**:
<small>
```python
[[ 0.00377308  0.00456841  0.04387417 ...  0.09883708  0.02295836
  -0.03164019]
 [ 0.00901005  0.08310206  0.02108724 ...  0.01976032  0.03389375
   0.00992528]
 [-0.00360388  0.02749235  0.16526043 ...  0.12410361 -0.01022669
  -0.01867055]
```
 ...
```python
 [-0.00293806  0.02712424  0.07696037 ...  0.10188963  0.05918914
   0.01326817]
 [-0.01563196  0.10205315  0.04504438 ...  0.04045928 -0.05388908
  -0.0288553 ]
 [ 0.01877778 -0.00323905  0.02495503 ...  0.13343076  0.02986323
  -0.00972282]]
```
CPU times: total: 78.1 ms

Wall time: 69.8 ms
</small>

#### Using `HuggingFaceEmbeddings` Langchain integration.

In [75]:
%%time
# https://api.python.langchain.com/en/latest/embeddings/langchain_community.embeddings.huggingface.HuggingFaceEmbeddings.html
# https://python.langchain.com/v0.2/docs/how_to/embed_text/
from langchain_huggingface import HuggingFaceEmbeddings

model_name = model_id
model_kwargs = {'device': 'cuda'}
encode_kwargs = {'normalize_embeddings': False}

hf = HuggingFaceEmbeddings(
    model_name=model_name,
    model_kwargs=model_kwargs,
    encode_kwargs=encode_kwargs
)

embeddings = hf.embed_documents(str_sentences)
print(embeddings)

[[0.0037730755284428596, 0.0045684087090194225, 0.04387417435646057, 0.05250142887234688, 0.004436813294887543, 0.004214747808873653, -7.576293864985928e-05, -0.09917211532592773, 0.02801760658621788, -0.05088052898645401, 0.11243652552366257, 0.024021759629249573, 0.04808700084686279, -0.04742689058184624, -0.04374527931213379, 0.0584816038608551, -0.006532648112624884, 0.031272951513528824, -0.05113019421696663, -0.030994238331913948, -0.052098315209150314, 0.02228744514286518, -0.008549649268388748, 0.04041309282183647, 0.044383078813552856, -0.03792939707636833, 0.03520028293132782, 0.038495346903800964, 0.028630990535020828, -0.05558997392654419, 0.026805037632584572, 0.07130058854818344, 0.007693156599998474, 0.06348275393247604, -0.06918059289455414, 0.07726868987083435, -0.026898296549916267, 0.0016365272458642721, 0.05710403993725777, -0.04309928044676781, 0.005569589324295521, -0.0558568611741066, 0.039323385804891586, -0.09883550554513931, -0.13383257389068604, -0.0033737756

#### Comparing the 2 embeddings

Despite HuggingFace being a few seconds slower, the value of each embedding is more detailed (more significant digits comparing to `sentence-transformers`) and using Langchain-supported tools all the way to the end might be more convenient.

### (Optional) Saving/Cachine embeddings locally