<a href="https://colab.research.google.com/github/MateoVB/AIT-Deep-Learning/blob/main/MateoVelarde_AIT_RAG_Assessment.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# RAG + LLM Assessment

Retrieval-Augmented Generation (RAG) is a method that combines the capabilities of large language models with external or proprietary data sources. It involves extracting relevant information from a large corpus and then generating context-appropriate responses to queries.

Your task is to create a Retrieval-Augmented Generation (RAG) system using a Large Language Model (LLM). The RAG system should be able to retrieve relevant information from a knowledge base and generate coherent and informative responses to user queries.

Steps:

1. Choose a domain and collect a suitable dataset of documents (at least 5 documents - PDFs or HTML pages) to serve as the knowledge base for your RAG system. Select one of the following topics:
   * latest scientific papers from arxiv.org,
   * fiction books released,
   * legal documents or,
   * social media posts.

   Make sure that the documents are newer then the training dataset of the applied LLM. (20 points)

2. Create three relevant prompts to the dataset, and one irrelevant prompt. (20 points)

3. Load an LLM with at least 5B parameters. (10 points)

4. Test the LLM with your prompts. The goal should be that without the collected dataset your model is unable to answer the question. If it gives you a good answer, select another question to answer and maybe a different dataset. (10 points)

5. Create a LangChain-based RAG system by setting up a vector database from the documents. (20 points)

6. Provide your three relevant and one irrelevant prompts to your RAG system. For the relevant prompts, your RAG system should return relevant answers, and for the irrelevant prompt, an empty answer. (20 points)


# Installing Dependencies

Here we will install necessary dependencies for the project, using Google's T4 GPU. Then we will also restart the kernal so that the dependencies go into effect.

In [1]:
# Dependencies
!pip install transformers>=4.32.0 optimum>=1.12.0 > null
!pip install auto-gptq --extra-index-url https://huggingface.github.io/autogptq-index/whl/cu118/ > null
!pip install langchain > null
!pip install chromadb > null
!pip install sentence_transformers > null # ==2.2.2
!pip install unstructured > null
!pip install pdf2image > null
!pip install pdfminer.six > null
!pip install unstructured-pytesseract > null
!pip install unstructured-inference > null
!pip install faiss-gpu > null
!pip install pikepdf > null
!pip install pypdf > null
!pip install accelerate > null
!pip install pillow_heif > null
!pip install -i https://pypi.org/simple/ bitsandbytes > null

[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
spacy 3.7.4 requires typer<0.10.0,>=0.3.0, but you have typer 0.12.3 which is incompatible.
weasel 0.3.4 requires typer<0.10.0,>=0.3.0, but you have typer 0.12.3 which is incompatible.[0m[31m
[0m[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
imageio 2.31.6 requires pillow<10.1.0,>=8.3.2, but you have pillow 10.3.0 which is incompatible.[0m[31m
[0m

In [None]:
# Restart kernal
import os
os.kill(os.getpid(), 9)

# Imports

Here we will import necessary python libraries as well as defining our private user access token from Huggingface to access models.

In [1]:
# Import the necessary Python libraries
from huggingface_hub import login
from langchain.llms import HuggingFacePipeline
from transformers import AutoModelForCausalLM, AutoTokenizer, GenerationConfig, pipeline, BitsAndBytesConfig
from textwrap import fill
from langchain.prompts import PromptTemplate
import locale
from langchain.document_loaders import UnstructuredURLLoader
from langchain.vectorstores.utils import filter_complex_metadata # 'filter_complex_metadata' removes complex metadata that are not in str, int, float or bool format
from langchain.vectorstores import FAISS
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.text_splitter import RecursiveCharacterTextSplitter

from langchain.prompts import PromptTemplate
from langchain.chains import RetrievalQA

locale.getpreferredencoding = lambda: "UTF-8"

# User Access Token from Huggingface to access models with accepted licence
HUGGINGFACE_UAT="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"
login(HUGGINGFACE_UAT)

The token has not been saved to the git credentials helper. Pass `add_to_git_credential=True` in this function directly or `--add-to-git-credential` if using via `huggingface-cli` if you want to set the git credential as well.
Token is valid (permission: read).
Your token has been saved to /root/.cache/huggingface/token
Login successful


# Step 1

To build the knowledge base for our RAG system we need to decide on a domain and a dataset of documents. Our domain will be "fiction books released" and our dataset will be 5 books,saved as PDFs. Since the large language model (LLM) we load in Step 3 was trained in March 2023, the books are newer than what the LLM was trained on.

In [2]:
# Define the document paths
document_paths = ["books/book1.pdf", "books/book2.pdf", "books/book3.pdf", "books/book4.pdf", "books/book5.pdf"]

# Step 2

Now we will define three relevant prompts for the dataset of books and one irrelevant prompt.

In [3]:
# Relevant Prompts
prompt1 = "What is Rick Riordan's book The Chalice of the Gods about?"
prompt2 = "Who is seeking help in Adelle Waldman's book Help Wanted?"
prompt3 = "What is the storm referring to in Vanessa Chan's book The Storm We Made?"

# Irrelevant Prompt
irrelevant_prompt = "What are the latest stock market trends?"

# Step 3

Now we will load the LLM we referenced before. The LLM is the Falcon-7B, with 7B parameters.

In [4]:
model_name = "meta-llama/Meta-Llama-3-8B-Instruct"

quantization_config = BitsAndBytesConfig(load_in_8bit=True)

model = AutoModelForCausalLM.from_pretrained(model_name,
                                             device_map="auto",
                                             quantization_config=quantization_config,
                                             trust_remote_code=True)

tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=True)

gen_cfg = GenerationConfig.from_pretrained(model_name)
gen_cfg.max_new_tokens=512
gen_cfg.temperature=0.0000001 # 0.0 # For RAG we would like to have determenistic answers
gen_cfg.return_full_text=True
gen_cfg.do_sample=True
gen_cfg.repetition_penalty=1.11

pipe=pipeline(
    task="text-generation",
    model=model,
    tokenizer=tokenizer,
    generation_config=gen_cfg
)

llm = HuggingFacePipeline(pipeline=pipe)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/654 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/23.9k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/4 [00:00<?, ?it/s]

model-00001-of-00004.safetensors:   0%|          | 0.00/4.98G [00:00<?, ?B/s]

model-00002-of-00004.safetensors:   0%|          | 0.00/5.00G [00:00<?, ?B/s]

model-00003-of-00004.safetensors:   0%|          | 0.00/4.92G [00:00<?, ?B/s]

model-00004-of-00004.safetensors:   0%|          | 0.00/1.17G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/187 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/51.0k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/9.09M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/73.0 [00:00<?, ?B/s]

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


# Step 4
Now we will test the LLM with the 4 prompts created above.

In [5]:
# Return template of LLM
template = """
<|begin_of_text|><|start_header_id|>user<|end_header_id|>

{text}<|eot_id|><|start_header_id|>assistant<|end_header_id|>
"""

# Prompt Template
prompt = PromptTemplate(
    input_variables=["text"],
    template=template,
)

# Testing all 4 prompts
result = llm(prompt.format(text=prompt1))
print(fill(result.strip(), width=100))
print("\n")
result = llm(prompt.format(text=prompt2))
print(fill(result.strip(), width=100))
print("\n")
result = llm(prompt.format(text=prompt3))
print(fill(result.strip(), width=100))
print("\n")
result = llm(prompt.format(text=irrelevant_prompt))
print(fill(result.strip(), width=100))

  warn_deprecated(
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


<|begin_of_text|><|start_header_id|>user<|end_header_id|>  What is Rick Riordan's book The Chalice
of the Gods about?<|eot_id|><|start_header_id|>assistant<|end_header_id|> Rick Riordan did not write
a book called "The Chalice of the Gods". It's possible that you may be thinking of another author or
book with a similar title.  However, Rick Riordan has written many popular books in various series,
such as Percy Jackson and the Olympians, The Heroes of Olympus, Magnus Chase and the Gods of Asgard,
and more. If you're looking for information on one of his actual books, I'd be happy to help!




Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


<|begin_of_text|><|start_header_id|>user<|end_header_id|>  Who is seeking help in Adelle Waldman's
book Help Wanted?<|eot_id|><|start_header_id|>assistant<|end_header_id|> Adelle Waldman did not
write a book called "Help Wanted". The title "Help Wanted" is actually the name of a 2013 novel by
Steve Toltz.   If you meant to ask about Adelle Waldman's book, she wrote "The Love Affairs of
Nathaniel P.", which was published in 2012. The main character, Nathan, is seeking help and guidance
as he navigates his relationships and personal life.




Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


<|begin_of_text|><|start_header_id|>user<|end_header_id|>  What is the storm referring to in Vanessa
Chan's book The Storm We Made?<|eot_id|><|start_header_id|>assistant<|end_header_id|> I apologize,
but I couldn't find any information on a book called "The Storm We Made" by Vanessa Chan. It's
possible that it's a lesser-known or upcoming book, or it may be a mistake.  Could you please
provide more context or details about the book, such as the publication date or a brief summary?
This might help me better assist you in finding the answer you're looking for.


<|begin_of_text|><|start_header_id|>user<|end_header_id|>  What are the latest stock market
trends?<|eot_id|><|start_header_id|>assistant<|end_header_id|> The latest stock market trends can be
influenced by a variety of factors, including global economic conditions, central bank policies,
corporate earnings, and investor sentiment. Here are some current trends in the stock market:  1.
**Global Market Volatility**: The stock marke

# Step 5
Now we will create a LangChain-based RAG system by setting up a vector database from the documents. First we load the PDF files.

In [6]:
from langchain.document_loaders import UnstructuredPDFLoader
import os

loaders = [UnstructuredPDFLoader(fn) for fn in document_paths]

Next, due to the limited receptive field (context window) of the LLMs, we split the text into smaller parts with some overlap.

In [7]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

chunked_pdf_doc = []

for loader in loaders:
    print("Loading raw document..." + loader.file_path)
    pdf_doc = loader.load()
    updated_pdf_doc = filter_complex_metadata(pdf_doc)
    print("Splitting text...")
    text_splitter = RecursiveCharacterTextSplitter(chunk_size=1024, chunk_overlap=256)
    documents = text_splitter.split_documents(updated_pdf_doc)
    chunked_pdf_doc.extend(documents)

len(chunked_pdf_doc)

Loading raw document...books/book1.pdf


[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /root/nltk_data...
[nltk_data]   Unzipping taggers/averaged_perceptron_tagger.zip.


Splitting text...
Loading raw document...books/book2.pdf
Splitting text...
Loading raw document...books/book3.pdf
Splitting text...
Loading raw document...books/book4.pdf
Splitting text...
Loading raw document...books/book5.pdf
Splitting text...


2862

We then build the vector database using a small language model.

In [8]:
embeddings = HuggingFaceEmbeddings() # default model_name="sentence-transformers/all-mpnet-base-v2"

modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.6k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]



config.json:   0%|          | 0.00/571 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/438M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/363 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/239 [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

The texts and the associated vectors are stored in a FAISS object; we make the vectors for each part.

In [9]:
%%time
db_pdf = FAISS.from_documents(chunked_pdf_doc, embeddings)

CPU times: user 40.3 s, sys: 94.2 ms, total: 40.4 s
Wall time: 40.8 s


We also define the new prompt template using the new context.

In [10]:
%%time

prompt_template = """
<|begin_of_text|><|start_header_id|>system<|end_header_id|>

Use the following context to answer the question at the end. Do not use any other information. If you can't find the relevant information in the context, just say you don't have enough information to answer the question. Don't try to make up an answer.

{context}<|eot_id|><|start_header_id|>user<|end_header_id|>

{question}<|eot_id|><|start_header_id|>assistant<|end_header_id|>
"""


CPU times: user 3 µs, sys: 1e+03 ns, total: 4 µs
Wall time: 42.2 µs


We then use the LangChain RetrievalQA function ("chain"). This needs a Retreiver, an LLM and a chain_type input. When the RetrievalQA chain is called, the data collector retrieves the most similar content to the instruction from a vector store. The ``chain type = "stuff"`` method concatenates all retrieved information and invokes the language model. The LLM then generates the text/response based on the retrieved documents.

In [11]:
%%time
prompt = PromptTemplate(template=prompt_template, input_variables=["context", "question"])
Chain_pdf = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=db_pdf.as_retriever(search_type="similarity_score_threshold", search_kwargs={'k': 4, 'score_threshold': 0.2}),
    chain_type_kwargs={"prompt": prompt},
)

CPU times: user 1.7 ms, sys: 0 ns, total: 1.7 ms
Wall time: 1.66 ms


# Step 6
Finally, we will provide our three relevant prompts and one irrelevant prompt into the RAG system. For the relevant prompts, the RAG system should return relevant answers, and for the irrelevant prompt, an empty answer.

In [12]:
# Relevant Prompt
result = Chain_pdf.invoke(prompt1)
print(fill(result['result'].strip(), width=100))
print("\n")

# Relevant Prompt
result = Chain_pdf.invoke(prompt2)
print(fill(result['result'].strip(), width=100))
print("\n")

# Relevant Prompt
result = Chain_pdf.invoke(prompt3)
print(fill(result['result'].strip(), width=100))
print("\n")

# Irrelevant Prompt
result = Chain_pdf.invoke(irrelevant_prompt)
print(fill(result['result'].strip(), width=100))

Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


<|begin_of_text|><|start_header_id|>system<|end_header_id|>  Use the following context to answer the
question at the end. Do not use any other information. If you can't find the relevant information in
the context, just say you don't have enough information to answer the question. Don't try to make up
an answer.  The exterior gleamed with black-and-gold scenes of the gods at a feast, their
silhouettes outlined in threads of Celestial bronze. It was a fancy piece of pottery, but I wasn’t
sure how anyone could pour nectar out of it.  “The chalice of the gods,” I guessed. I wanted to add
that of course I understood why Gary had stolen it.  Unfortunately, I didn’t.  “So . . . since
you’re already immortal, would the cup make you  younger?” I asked. “Or has it been your lifelong
dream to serve drinks?”  “Oh, that’s disappointing. . . .” Gary closed his fist and the chalice
disappeared. “Perhaps I should have started with Annabeth Chase. I understand she’s smart.”  mouth.
“No. It isn’t right

Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


<|begin_of_text|><|start_header_id|>system<|end_header_id|>  Use the following context to answer the
question at the end. Do not use any other information. If you can't find the relevant information in
the context, just say you don't have enough information to answer the question. Don't try to make up
an answer.  Ada cast her gaze down and squeezed one hand in the other. “It is not what will happen
to your sister,” he said quickly, trying to  dispel the clouds that he knew had settled upon her
thoughts.  She looked back up. “I’ve been waiting for a letter, but I have yet to hear anything. I
did send money, so maybe they used it. I don’t know. I just hate being so far from them now.
Sometimes I wonder whether I did the wrong thing coming here.”  “You are trying to help.” “But what
if it isn’t enough?” He felt a kinship with her, both of them carrying around their own private
heartache over someone they loved. He smiled. “You saved me on the street, Ada,” he said. “If anyone
can save you

Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


<|begin_of_text|><|start_header_id|>system<|end_header_id|>  Use the following context to answer the
question at the end. Do not use any other information. If you can't find the relevant information in
the context, just say you don't have enough information to answer the question. Don't try to make up
an answer.  THAT MORNING THERE had been a storm setting up to the south, and the smell of rain had
roused her. Not rain falling, but rain coming. The air was fuzzy with that particular smell. Lucille
had lain still in bed for a few minutes upon waking, breathing it in and listening for thunder, but
all she heard was the sound of the birds, who seemed so blithely unconcerned by any change in the
weather that it made her wonder whether she was wrong. Maybe there was no gathering storm. Maybe she
only wanted there to be. She was not the only one who would have welcomed rain. The drought had been
going on for so long that hardly anyone on the island could get any crops to grow. Work was scarc