# Retrieval Augmented Generation (RAG)
Retrieval Augmented Generation (RAG) combines information retrieval with language models. It first searches for relevant facts in external sources, then feeds those facts to the language model alongside the user's prompt. This helps the model generate more accurate and factual responses, even on topics beyond its initial training data.


---
## 1.&nbsp; Installations and Settings 🛠️

**faiss-cpu** is a library that provides a fast and efficient database for storing and retrieving our numerical summaries of information.

In [1]:
%%bash
pip install -qqq -U langchain-huggingface
pip install -qqq -U langchain
pip install -qqq -U langchain-community
pip install -qqq -U faiss-cpu

     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 314.7/314.7 kB 2.6 MB/s eta 0:00:00
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 227.1/227.1 kB 9.1 MB/s eta 0:00:00
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 124.9/124.9 kB 4.8 MB/s eta 0:00:00
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 53.0/53.0 kB 3.2 MB/s eta 0:00:00
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 142.7/142.7 kB 1.9 MB/s eta 0:00:00
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 21.3/21.3 MB 31.8 MB/s eta 0:00:00
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 974.0/974.0 kB 4.7 MB/s eta 0:00:00
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.2/2.2 MB 8.4 MB/s eta 0:00:00
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 49.2/49.2 kB 5.2 MB/s eta 0:00:00
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 27.0/27.0 MB 38.9 MB/s eta 0:00:00


In [2]:
import os
from google.colab import userdata # we stored our access token as a colab secret

os.environ["HUGGINGFACEHUB_API_TOKEN"] = userdata.get('HF_TOKEN')

---
## 2.&nbsp; Setting up your LLM 🧠

In [3]:
from langchain_huggingface import HuggingFaceEndpoint

# This info's at the top of each HuggingFace model page
hf_model = "mistralai/Mistral-7B-Instruct-v0.3"

llm = HuggingFaceEndpoint(repo_id = hf_model)

The token has not been saved to the git credentials helper. Pass `add_to_git_credential=True` in this function directly or `--add-to-git-credential` if using via `huggingface-cli` if you want to set the git credential as well.
Token is valid (permission: write).
Your token has been saved to /root/.cache/huggingface/token
Login successful


---
## 3.&nbsp; Retrieval Augmented Generation 🔃

### 3.1.&nbsp; Find your data
Our model needs some information to work its magic! In this case, we'll be using a copy of Alice's Adventures in Wonderland, but feel free to swap it out for anything you like: legal documents, school textbooks, websites – the possibilities are endless!

In [4]:
!wget -O /content/alice_in_wonderland.txt https://www.gutenberg.org/cache/epub/11/pg11.txt

--2024-06-11 12:16:55--  https://www.gutenberg.org/cache/epub/11/pg11.txt
Resolving www.gutenberg.org (www.gutenberg.org)... 152.19.134.47, 2610:28:3090:3000:0:bad:cafe:47
Connecting to www.gutenberg.org (www.gutenberg.org)|152.19.134.47|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 174355 (170K) [text/plain]
Saving to: ‘/content/alice_in_wonderland.txt’


2024-06-11 12:16:55 (1.46 MB/s) - ‘/content/alice_in_wonderland.txt’ saved [174355/174355]



> If your working locally, just download a txt book from Project Gutenburg. Here's the link to [Alice's Adventures in Wonderland](https://www.gutenberg.org/cache/epub/11/pg11.txt). Feel free to use any other book though.

### 3.2.&nbsp; Load the data
Now that we have the data, we have to load it in a format LangChain can understand. For this, Langchain has [loaders](https://python.langchain.com/docs/modules/data_connection/document_loaders/). There's loaders for CSV, text, PDF, and a host of other formats. You're not restricted to just text here.



In [5]:
from langchain.document_loaders import TextLoader

loader = TextLoader("/content/alice_in_wonderland.txt")
documents = loader.load()

### 3.3.&nbsp; Splitting the document
Obviously, a whole book is a lot to digest. This is made easier by [splitting](https://python.langchain.com/docs/modules/data_connection/document_transformers/) the document into chunks. You can split it by paragraphs, sentences, or even individual words, depending on what you want to analyse. In Langchain, we have different tools like the RecursiveCharacterTextSplitter (say that five times fast!) that understand the structure of text and help you break it down into manageable chunks.

Check out [this website](https://chunkviz.up.railway.app/) to help visualise the splitting process.


In [6]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(chunk_size=800,
                                               chunk_overlap=150)

docs = text_splitter.split_documents(documents)

### 3.4.&nbsp; Creating vectors with embeddings

[Embeddings](https://python.langchain.com/docs/integrations/text_embedding) are a fancy way of saying we turn words into numbers that computers can understand. Each word gets its own unique code, based on its meaning and relationship to other words. The list of numbers produced is known as a vector. Vectors allow us to compare text and find chunks that contain similar information.

Different embedding models encode words and meanings in different ways, and finding the right one can be tricky. We're using open-source models from HuggingFace, who even have a handy [leaderboard of embeddings](https://huggingface.co/spaces/mteb/leaderboard) on their website. Just browse the options and see which one speaks your language (literally!).
> As we are doing a retrieval project, click on the `Retrieval` tab of the leaderboard to see the best embeddings for retrieval tasks.

In [7]:
from langchain_huggingface import HuggingFaceEmbeddings

# embeddings
embedding_model = "sentence-transformers/all-MiniLM-l6-v2"
embeddings_folder = "/content/"

embeddings = HuggingFaceEmbeddings(model_name=embedding_model,
                                   cache_folder=embeddings_folder)

modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.7k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]



config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

👆 The embeddings download the first time but are then stored locally. This means that every time you start a new session on Colab, the embeddings will download again. However, if you're working locally, once they are downloaded, they are stored on your local machine and won't need to be downloaded again unless you delete them.

To exemplify using embeddings to transform a sentence into a vector, let's look at an example:

In [8]:
test_text = "Why do data scientists make great comedians? They're always trying to make ANOVA pun"
query_result = embeddings.embed_query(test_text)
query_result

[0.009409730322659016,
 -0.023806262761354446,
 -0.012127556838095188,
 0.036123793572187424,
 -0.03382446989417076,
 -0.07974203675985336,
 0.07004599273204803,
 0.0746554508805275,
 0.040141817182302475,
 0.044190675020217896,
 -0.007506002672016621,
 -0.06001221016049385,
 -0.1002824455499649,
 0.03230968862771988,
 -0.039546359330415726,
 0.01690666750073433,
 -0.030313946306705475,
 -0.12780098617076874,
 -0.03218214958906174,
 -0.07546593993902206,
 7.65406948630698e-05,
 0.05085943266749382,
 0.12591633200645447,
 -0.04004546254873276,
 0.040401361882686615,
 -0.022957658395171165,
 -0.07265666872262955,
 -0.02543492801487446,
 -0.01982489600777626,
 0.011819686740636826,
 -0.03572341427206993,
 0.03657732158899307,
 0.07559993863105774,
 0.034250661730766296,
 -0.05330544710159302,
 -0.03082628920674324,
 0.02147844061255455,
 0.12243160605430603,
 -0.005445732735097408,
 0.048340149223804474,
 -0.004316544625908136,
 -0.043691329658031464,
 0.009050620719790459,
 0.02711021341

In [9]:
characters = len(test_text)
dimensions = len(query_result)
print(f"The {characters} character sentence was transformed into a {dimensions} dimension vector")

The 84 character sentence was transformed into a 384 dimension vector


Embedding vectors have a fixed length, meaning each vector produced by this specific embedding will always have 384 dimensions. Choosing the appropriate embedding size involves a trade-off between accuracy and computational efficiency. Larger embeddings capture more semantic information but require more memory and processing power. Start with the provided MiniLM embedding as a baseline and experiment with different sizes to find the optimal balance for your needs.

### 3.5.&nbsp; Creating a vector database
Imagine a library where books aren't just filed alphabetically, but also by their themes, characters, and emotions. That's the magic of vector databases: they unlock information beyond keywords, connecting ideas in unexpected ways.

In [10]:
from langchain.vectorstores import FAISS

vector_db = FAISS.from_documents(docs, embeddings)

Once the database is made, you can save it to use over and over again in the future.

In [11]:
vector_db.save_local("/content/faiss_index")

Here's the code to load it again.

> We'll leave it commented out here as we don't need it right now - it's already stored above in the variable `vector_db`.

In [12]:
# new_db = FAISS.load_local("/content/faiss_index", embeddings)

You can also search your database to see which vectors are close to your input.

In [13]:
vector_db.similarity_search("What does the Mad Hatter drink?")

[Document(page_content='CHAPTER VII.\nA Mad Tea-Party\n\n\nThere was a table set out under a tree in front of the house, and the\nMarch Hare and the Hatter were having tea at it: a Dormouse was sitting\nbetween them, fast asleep, and the other two were using it as a\ncushion, resting their elbows on it, and talking over its head. “Very\nuncomfortable for the Dormouse,” thought Alice; “only, as it’s asleep,\nI suppose it doesn’t mind.”\n\nThe table was a large one, but the three were all crowded together at\none corner of it: “No room! No room!” they cried out when they saw\nAlice coming. “There’s _plenty_ of room!” said Alice indignantly, and\nshe sat down in a large arm-chair at one end of the table.\n\n“Have some wine,” the March Hare said in an encouraging tone.', metadata={'source': '/content/alice_in_wonderland.txt'}),
 Document(page_content='“I’d rather finish my tea,” said the Hatter, with an anxious look at\nthe Queen, who was reading the list of singers.\n\n“You may go,” said 

### 3.6.&nbsp; Adding a prompt
We can guide our model's behavior with a prompt, similar to how we gave instructions to the chatbot.
> Google have a good page about [prompting best practices](https://ai.google.dev/docs/prompt_best_practices).

In [14]:
from langchain.prompts.prompt import PromptTemplate

input_template = """Answer the question based only on the following context. Keep your answers short and succinct.

Context to answer question:
{context}

Question to be answered: {question}
Response:"""


prompt = PromptTemplate(template=input_template,
                        input_variables=["context", "question"])

### 3.7.&nbsp; RAG - chaining it all together
This is the final piece of the puzzle, we now bring everything together in a chain. Our vector database, our prompt, and our LLM join to give us retrieval augmented generation.

In [15]:
from langchain.chains import RetrievalQA

qa_chain = RetrievalQA.from_chain_type(
    llm = llm,
    retriever = vector_db.as_retriever(search_kwargs={"k": 2}), # top 2 results only, speed things up
    return_source_documents = True,
    chain_type_kwargs = {"prompt": prompt},
)

In [16]:
answer = qa_chain.invoke("Who likes to chop off heads?")

answer

{'query': 'Who likes to chop off heads?',
 'result': ' The Queen likes to chop off heads.',
 'source_documents': [Document(page_content='“Leave off that!” screamed the Queen. “You make me giddy.” And then,\nturning to the rose-tree, she went on, “What _have_ you been doing\nhere?”\n\n“May it please your Majesty,” said Two, in a very humble tone, going\ndown on one knee as he spoke, “we were trying—”\n\n“_I_ see!” said the Queen, who had meanwhile been examining the roses.\n“Off with their heads!” and the procession moved on, three of the\nsoldiers remaining behind to execute the unfortunate gardeners, who ran\nto Alice for protection.\n\n“You shan’t be beheaded!” said Alice, and she put them into a large\nflower-pot that stood near. The three soldiers wandered about for a\nminute or two, looking for them, and then quietly marched off after the\nothers.\n\n“Are their heads off?” shouted the Queen.', metadata={'source': '/content/alice_in_wonderland.txt'}),
  Document(page_content='“In m

#### 3.7.1.&nbsp; Exploring the returned dictionary

In [17]:
answer.keys()

dict_keys(['query', 'result', 'source_documents'])

##### `query`

The question that we asked.

In [18]:
answer['query']

'Who likes to chop off heads?'

##### `result`

The response.

In [19]:
answer['result']

' The Queen likes to chop off heads.'

In [20]:
print(answer['result'])

 The Queen likes to chop off heads.


##### `source_documents`

What information was used to form the response.

In [21]:
answer['source_documents']

[Document(page_content='“Leave off that!” screamed the Queen. “You make me giddy.” And then,\nturning to the rose-tree, she went on, “What _have_ you been doing\nhere?”\n\n“May it please your Majesty,” said Two, in a very humble tone, going\ndown on one knee as he spoke, “we were trying—”\n\n“_I_ see!” said the Queen, who had meanwhile been examining the roses.\n“Off with their heads!” and the procession moved on, three of the\nsoldiers remaining behind to execute the unfortunate gardeners, who ran\nto Alice for protection.\n\n“You shan’t be beheaded!” said Alice, and she put them into a large\nflower-pot that stood near. The three soldiers wandered about for a\nminute or two, looking for them, and then quietly marched off after the\nothers.\n\n“Are their heads off?” shouted the Queen.', metadata={'source': '/content/alice_in_wonderland.txt'}),
 Document(page_content='“In my youth,” said the sage, as he shook his grey locks,\n    “I kept all my limbs very supple\nBy the use of this oin

In [22]:
answer['source_documents'][0]

Document(page_content='“Leave off that!” screamed the Queen. “You make me giddy.” And then,\nturning to the rose-tree, she went on, “What _have_ you been doing\nhere?”\n\n“May it please your Majesty,” said Two, in a very humble tone, going\ndown on one knee as he spoke, “we were trying—”\n\n“_I_ see!” said the Queen, who had meanwhile been examining the roses.\n“Off with their heads!” and the procession moved on, three of the\nsoldiers remaining behind to execute the unfortunate gardeners, who ran\nto Alice for protection.\n\n“You shan’t be beheaded!” said Alice, and she put them into a large\nflower-pot that stood near. The three soldiers wandered about for a\nminute or two, looking for them, and then quietly marched off after the\nothers.\n\n“Are their heads off?” shouted the Queen.', metadata={'source': '/content/alice_in_wonderland.txt'})

In [23]:
answer['source_documents'][0].page_content

'“Leave off that!” screamed the Queen. “You make me giddy.” And then,\nturning to the rose-tree, she went on, “What _have_ you been doing\nhere?”\n\n“May it please your Majesty,” said Two, in a very humble tone, going\ndown on one knee as he spoke, “we were trying—”\n\n“_I_ see!” said the Queen, who had meanwhile been examining the roses.\n“Off with their heads!” and the procession moved on, three of the\nsoldiers remaining behind to execute the unfortunate gardeners, who ran\nto Alice for protection.\n\n“You shan’t be beheaded!” said Alice, and she put them into a large\nflower-pot that stood near. The three soldiers wandered about for a\nminute or two, looking for them, and then quietly marched off after the\nothers.\n\n“Are their heads off?” shouted the Queen.'

In [24]:
print(answer['source_documents'][0].page_content)

“Leave off that!” screamed the Queen. “You make me giddy.” And then,
turning to the rose-tree, she went on, “What _have_ you been doing
here?”

“May it please your Majesty,” said Two, in a very humble tone, going
down on one knee as he spoke, “we were trying—”

“_I_ see!” said the Queen, who had meanwhile been examining the roses.
“Off with their heads!” and the procession moved on, three of the
soldiers remaining behind to execute the unfortunate gardeners, who ran
to Alice for protection.

“You shan’t be beheaded!” said Alice, and she put them into a large
flower-pot that stood near. The three soldiers wandered about for a
minute or two, looking for them, and then quietly marched off after the
others.

“Are their heads off?” shouted the Queen.


The documents name also gets returned, useful if you have multiple documents!

In [25]:
answer['source_documents'][0].metadata

{'source': '/content/alice_in_wonderland.txt'}

In [26]:
answer['source_documents'][0].metadata["source"]

'/content/alice_in_wonderland.txt'

---
## Challenge 😀
Build a RAG chain that you can use to question the PDF of the python version of `An Introduction to Statistical Learning`. This is an amazing book covering all the maths behind machine learning, you should definitely keep a copy on your hard drive.
* https://www.statlearning.com/
* https://python.langchain.com/docs/modules/data_connection/document_loaders/pdf