# RAG using OpenAI LLM

Source: 
1. [Vipra Singh](<https://medium.com/@vipra_singh/building-llm-applications-introduction-part-1-1c90294b155b#4d28>)
2. [brightinventions](<https://brightinventions.pl/blog/build-llm-application-with-rag-langchain/>)
3. [LangChain RAG](<https://python.langchain.com/v0.2/docs/tutorials/rag/#built-in-chains>)
4. [SBERT.net](<sbert.net>)

* Only work on 1 single `.txt` file at a time for now
* Used a randomly generated story from **ChatGPT**, you could most likely use any: Just change `file_name` to match your new file's name.


Libraries:
1. `sentence-transformers`: For embedding
2. `PyTorch`: For CUDA operation
3. `OpenAI`: For LLM
4. `Langchain`: For chaining prompts and LLM
5. `LlamaIndex`: For indexing
6. `FAISS`: For vector storage and retrieval

No `LlamaIndex` yet as of this one.

## Loading environment's variables

In [46]:
import os
import dotenv
dotenv_file = dotenv.find_dotenv()
dotenv.load_dotenv(dotenv_file)


openai_api_key = os.getenv("OPENAI_API_KEY")
file_name = os.getenv("FILE_NAME")
file_url = os.getenv("FILE_URL")

## Loading the document



### Downloading the source document

In [47]:
import requests

# Check if file already exists, if not we fetch
if not os.path.exists(file_name):
    response = requests.get(file_url, stream=True)

    with open(file_name, mode='wb') as file:
        for chunk in response.iter_content(chunk_size=256): # Bytes
            file.write(chunk)
    print(f"The file has been downloaded successfully.")
else:
    print(f"File already exists.")

File already exists.


### Loading the document into memory

In [48]:
# https://python.langchain.com/v0.2/docs/how_to/document_loader_directory/#auto-detect-file-encodings-with-textloader
# https://docs.kanaries.net/topics/LangChain/langchain-document-loader
from langchain_community.document_loaders import TextLoader

loader = TextLoader(file_path=file_name)
document = loader.load()
print(f"Loaded {len(document)} documents: ")
for file in document:
    print(f"file_name: {file.metadata['source']}")

Loaded 1 documents: 
file_name: document.txt


This returns a `Document` object which we can then access the content using `page_content`.

## Splitting and Chunking

You may want to split a long document into smaller chunks that can fit into your model's context window.

In [49]:
# https://medium.com/the-modern-scientist/building-generative-ai-applications-using-langchain-and-openai-apis-ee3212400630
# https://python.langchain.com/v0.2/docs/concepts/#text-splitters
# https://python.langchain.com/v0.2/docs/how_to/recursive_text_splitter/
from langchain_text_splitters import RecursiveCharacterTextSplitter
texts = document
print(f"Document has {len(texts)} chunk.")

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=256,
    chunk_overlap=32,
    length_function=len,
    is_separator_regex=False
)

texts = text_splitter.split_documents(document)
print(f"Document is now splitted into {len(texts)} chunks.")

Document has 1 chunk.
Document is now splitted into 14 chunks.


## Embedding Models

Embedding models create a vector representation of a piece of text.

### Loading embedding model

In [50]:
model_id = 'sentence-transformers/all-MiniLM-L6-v2'

In [51]:
# Convert the chunks of list[Document] from chunking steps and getting just the content
str_sentences = []
for text in texts:
    str_sentences.append(text.page_content)

### Embedding the chunks

There are two methods I've seen on the Internet:
1. Using `sentence-transformers` directly from SBERT without Langchain integration
2. Using `HuggingFaceEmbeddings` from Langchain integration.

#### Using `sentence-transformers` directly

In [52]:
# Using SBERT Sentence-transformer
# https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2

# from sentence_transformers import SentenceTransformer

# model = SentenceTransformer(model_id)

In [53]:
# %%time
# embeddings = model.encode(str_sentences)
# print(embeddings)

**Output**:
<small>
```python
[[ 0.00377308  0.00456841  0.04387417 ...  0.09883708  0.02295836
  -0.03164019]
 [ 0.00901005  0.08310206  0.02108724 ...  0.01976032  0.03389375
   0.00992528]
 [-0.00360388  0.02749235  0.16526043 ...  0.12410361 -0.01022669
  -0.01867055]
```
 ...
```python
 [-0.00293806  0.02712424  0.07696037 ...  0.10188963  0.05918914
   0.01326817]
 [-0.01563196  0.10205315  0.04504438 ...  0.04045928 -0.05388908
  -0.0288553 ]
 [ 0.01877778 -0.00323905  0.02495503 ...  0.13343076  0.02986323
  -0.00972282]]
```
CPU times: total: 78.1 ms

Wall time: 69.8 ms
</small>

#### Using `HuggingFaceEmbeddings` Langchain integration.

In [54]:
# https://api.python.langchain.com/en/latest/embeddings/langchain_community.embeddings.huggingface.HuggingFaceEmbeddings.html
# https://python.langchain.com/v0.2/docs/how_to/embed_text/
from langchain_huggingface import HuggingFaceEmbeddings

model_name = model_id
model_kwargs = {'device': 'cuda'}
encode_kwargs = {'normalize_embeddings': False}

embedding_model = HuggingFaceEmbeddings(
    model_name=model_name,
    model_kwargs=model_kwargs,
    encode_kwargs=encode_kwargs
)

### (Optional) Saving/Cachine Embeddings Locally

In [55]:
# https://python.langchain.com/v0.2/docs/how_to/caching_embeddings/
from langchain.storage import LocalFileStore
from langchain.embeddings import CacheBackedEmbeddings

store = LocalFileStore('./cache/')

cache_embedder = CacheBackedEmbeddings.from_bytes_store(
    embedding_model, store, namespace=model_id
)

There's no need to call `cache_embedder.embed_documents()` since the Vector Stores (coming up) will handle it internally

#### Comparing the 2 embeddings

Despite HuggingFace being a few seconds slower, the value of each embedding is more detailed (more significant digits comparing to `sentence-transformers`) and using Langchain-supported tools all the way to the end might be more convenient.

## Vector Stores

We're using `FAISS` for this one, since `ChromaDB` is still brand new there isn't much coverage of it yet.

In [56]:
from langchain_community.vectorstores import FAISS

db = FAISS.from_documents(texts, cache_embedder)

In [57]:
# Similarity search
query = "What happened to Thomas"
docs = db.similarity_search(query)
print(docs[0].page_content)

A few days later, Thomas realized something unusual. The vegetables near the flower seedlings he had missed were growing better than the others. The flowers attracted bees and butterflies, which helped pollinate his vegetable plants. Thomas started to


In [58]:
# Vector search
embedding_vector = cache_embedder.embed_query(query)
docs = db.similarity_search_by_vector(embedding_vector)
print(docs[0].page_content)

A few days later, Thomas realized something unusual. The vegetables near the flower seedlings he had missed were growing better than the others. The flowers attracted bees and butterflies, which helped pollinate his vegetable plants. Thomas started to


## Retrievers

For taking a query and returning relevant documents.

In [59]:
retriever = db.as_retriever()

In [60]:
# We can use it by:
docs = retriever.invoke(input=query)
print(docs[0].page_content)

A few days later, Thomas realized something unusual. The vegetables near the flower seedlings he had missed were growing better than the others. The flowers attracted bees and butterflies, which helped pollinate his vegetable plants. Thomas started to


In [61]:
# Specifying parameters
retriever = db.as_retriever(search_kwargs={'k':3})

In [62]:
docs = retriever.invoke(input=query)
for sentence in docs:
    print(sentence.page_content)

A few days later, Thomas realized something unusual. The vegetables near the flower seedlings he had missed were growing better than the others. The flowers attracted bees and butterflies, which helped pollinate his vegetable plants. Thomas started to
He decided to leave some of the flower seedlings to grow among his vegetables. Over time, both gardens thrived. The flowers attracted more pollinators, and the vegetables grew bigger and healthier. Lily and Thomas learned to appreciate each other's
Lily noticed what Thomas had done and felt sad. “My flowers just wanted to spread their beauty,” she thought. But she didn’t say anything to Thomas.


## Prompt and LLM

### Set up Logging for debugging

In [63]:
import logging
logging.basicConfig()
logging.getLogger('langchain.retrievers.re_phraser').setLevel(logging.INFO)

### Custom Prompt

In [64]:
prompt_template = """You are an English teacher teaching elementary students context clues, reading comprehension, and critical thinking. You have the students read from the context text. Your task is to answer questions by either: 1. Directly copy and pasting passages from the context, 2. Infer an answer that might not directly be contained in the context, 3. Use critical thinking.
The context is as follow: <context>{context}</context>
The question asked by a student is as follow:
"""

In [65]:
# https://python.langchain.com/v0.2/docs/integrations/retrievers/re_phrase/
from langchain_openai import ChatOpenAI
from langchain import hub
from langchain_core.prompts import ChatPromptTemplate
# https://api.python.langchain.com/en/latest/chains/langchain.chains.retrieval.create_retrieval_chain.html

import openai
openai.api_key = openai_api_key

# retrieval_qa_chat_prompt = hub.pull("langchain-ai/retrieval-qa-chat")
retrieval_qa_chat_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", prompt_template),
        ("human", "{input}")
    ]
)

llm = ChatOpenAI(temperature=0)

In [66]:
# https://python.langchain.com/v0.2/docs/tutorials/rag/#built-in-chains
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain.chains import create_retrieval_chain

combined_docs_chain = create_stuff_documents_chain(
    llm, retrieval_qa_chat_prompt
)

retrieval_chain = create_retrieval_chain(retriever, combined_docs_chain)

In [67]:
# Running
retrieval_chain.invoke({'input': "What is Thomas' problem with Lily?"})

{'input': "What is Thomas' problem with Lily?",
 'context': [Document(metadata={'source': 'document.txt'}, page_content='Lily noticed what Thomas had done and felt sad. “My flowers just wanted to spread their beauty,” she thought. But she didn’t say anything to Thomas.'),
  Document(metadata={'source': 'document.txt'}, page_content="He decided to leave some of the flower seedlings to grow among his vegetables. Over time, both gardens thrived. The flowers attracted more pollinators, and the vegetables grew bigger and healthier. Lily and Thomas learned to appreciate each other's"),
  Document(metadata={'source': 'document.txt'}, page_content='From then on, Lily and Thomas’s gardens became the talk of the village. People admired the beautiful mix of flowers and vegetables and enjoyed the produce they shared.')],
 'answer': "Thomas' problem with Lily is that he initially didn't appreciate the beauty of her flowers and decided to remove them from his garden. Lily noticed this and felt sad, 

## Running the RAG

In [69]:
query = "What do you think happen immediately after the end of the story?"
response = retrieval_chain.invoke({'input': query})
print(response['answer'])

Based on the information provided in the context, it can be inferred that Thomas likely had a change of heart and decided to let the flowers grow alongside his vegetables to continue benefiting from the bees and butterflies that were helping pollinate his plants. This change in perspective may have led to a more harmonious garden where both flowers and vegetables thrived together.


## Conclusion and Future Possibility

Right now, our RAG isn't able to complicately infer or come up with something new from the existing story.

Possible TODO:
1. History
2. Chatbot
3. Streamlit
4. Better embeddings and retrieval

## Citation

```bibtex
@inproceedings{thakur-2020-AugSBERT,
  title = "Augmented {SBERT}: Data Augmentation Method for Improving Bi-Encoders for Pairwise Sentence Scoring Tasks",
  author = "Thakur, Nandan and Reimers, Nils and Daxenberger, Johannes  and Gurevych, Iryna",
  booktitle = "Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies",
  month = jun,
  year = "2021",
  address = "Online",
  publisher = "Association for Computational Linguistics",
  url = "https://www.aclweb.org/anthology/2021.naacl-main.28",
  pages = "296--310",
}
```