<a href="https://colab.research.google.com/github/Matthew-Z-Purvis/InfoChallenge2023/blob/main/Purvisassessment7.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# RAG + LLM Assessment

Your task is to create a Retrieval-Augmented Generation (RAG) system using a Large Language Model (LLM). The RAG system should be able to retrieve relevant information from a knowledge base and generate coherent and informative responses to user queries.

Steps:

1. Choose a domain and collect a suitable dataset of documents (at least 5 documents - PDFs or HTML pages) to serve as the knowledge base for your RAG system. Select one of the following topics:
   * latest scientific papers from arxiv.org,
   * fiction books released,
   * legal documents or,
   * social media posts.

   Make sure that the documents are newer then the training dataset of the applied LLM. (20 points)

2. Create three relevant prompts to the dataset, and one irrelevant prompt. (20 points)

3. Load an LLM with at least 5B parameters. (10 points)

4. Test the LLM with your prompts. The goal should be that without the collected dataset your model is unable to answer the question. If it gives you a good answer, select another question to answer and maybe a different dataset. (10 points)

5. Create a LangChain-based RAG system by setting up a vector database from the documents. (20 points)

6. Provide your three relevant and one irrelevant prompts to your RAG system. For the relevant prompts, your RAG system should return relevant answers, and for the irrelevant prompt, an empty answer. (20 points)


## Installing dependencies



In [1]:
!pip install transformers>=4.32.0 optimum>=1.12.0 > null
!pip install auto-gptq --extra-index-url https://huggingface.github.io/autogptq-index/whl/cu118/ > null
!pip install langchain > null
!pip install chromadb > null
!pip install sentence_transformers > null # ==2.2.2
!pip install unstructured > null
!pip install pdf2image > null
!pip install pdfminer.six > null
!pip install unstructured-pytesseract > null
!pip install unstructured-inference > null
!pip install faiss-gpu > null
!pip install pikepdf > null
!pip install pypdf > null
!pip install accelerate > null
!pip install pillow_heif > null
!pip install -i https://pypi.org/simple/ bitsandbytes > null

[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
spacy 3.7.4 requires typer<0.10.0,>=0.3.0, but you have typer 0.12.3 which is incompatible.
weasel 0.3.4 requires typer<0.10.0,>=0.3.0, but you have typer 0.12.3 which is incompatible.[0m[31m
[0m[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
imageio 2.31.6 requires pillow<10.1.0,>=8.3.2, but you have pillow 10.3.0 which is incompatible.[0m[31m
[0m

In [None]:
import os
os.kill(os.getpid(), 9)

# Imports


In [1]:
from huggingface_hub import login
from langchain.llms import HuggingFacePipeline
from transformers import AutoModelForCausalLM, AutoTokenizer, GenerationConfig, pipeline, BitsAndBytesConfig
from textwrap import fill
from langchain.prompts import PromptTemplate
import locale
from langchain.document_loaders import UnstructuredURLLoader
from langchain.vectorstores.utils import filter_complex_metadata # 'filter_complex_metadata' removes complex metadata that are not in str, int, float or bool format
from langchain.vectorstores import FAISS
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.text_splitter import RecursiveCharacterTextSplitter

from langchain.prompts import PromptTemplate
from langchain.chains import RetrievalQA

locale.getpreferredencoding = lambda: "UTF-8"

# you need to define your private User Access Token from Huggingface
# to be able to access models with accepted licence
HUGGINGFACE_UAT="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"
login(HUGGINGFACE_UAT)

The token has not been saved to the git credentials helper. Pass `add_to_git_credential=True` in this function directly or `--add-to-git-credential` if using via `huggingface-cli` if you want to set the git credential as well.
Token is valid (permission: read).
Your token has been saved to /root/.cache/huggingface/token
Login successful


In [2]:
model_name = "meta-llama/Meta-Llama-3-8B-Instruct" # 8B language model from Meta AI

quantization_config = BitsAndBytesConfig(load_in_8bit=True)

model = AutoModelForCausalLM.from_pretrained(model_name,
                                             device_map="auto",
                                             quantization_config=quantization_config,
                                             trust_remote_code=True)

tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=True)

gen_cfg = GenerationConfig.from_pretrained(model_name)
gen_cfg.max_new_tokens=512
gen_cfg.temperature=0.0000001 # 0.0 # For RAG we would like to have determenistic answers
gen_cfg.return_full_text=True
gen_cfg.do_sample=True
gen_cfg.repetition_penalty=1.11

pipe=pipeline(
    task="text-generation",
    model=model,
    tokenizer=tokenizer,
    generation_config=gen_cfg
)

llm = HuggingFacePipeline(pipeline=pipe)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/654 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/23.9k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/4 [00:00<?, ?it/s]

model-00001-of-00004.safetensors:   0%|          | 0.00/4.98G [00:00<?, ?B/s]

model-00002-of-00004.safetensors:   0%|          | 0.00/5.00G [00:00<?, ?B/s]

model-00003-of-00004.safetensors:   0%|          | 0.00/4.92G [00:00<?, ?B/s]

model-00004-of-00004.safetensors:   0%|          | 0.00/1.17G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/187 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/51.0k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/9.09M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/73.0 [00:00<?, ?B/s]

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


Testing the model with basic prompts.

In [3]:
template = """
<|begin_of_text|><|start_header_id|>user<|end_header_id|>

{text}<|eot_id|><|start_header_id|>assistant<|end_header_id|>
"""

prompt = PromptTemplate(
    input_variables=["text"],
    template=template,
)


In [4]:
text = "What is The Cult of the Chafing Dish about?" # the LLM hallucinates an answer for this because it did not even get the author right
result = llm(prompt.format(text=text))
print(fill(result.strip(), width=100))

  warn_deprecated(
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


<|begin_of_text|><|start_header_id|>user<|end_header_id|>  What is The Cult of the Chafing Dish
about?<|eot_id|><|start_header_id|>assistant<|end_header_id|> A fascinating topic! "The Cult of the
Chafing Dish" is a 2019 article by food writer and historian, Michael Krondl, that explores the
history and cultural significance of chafing dishes in American society.  In essence, the article
delves into how chafing dishes, those ubiquitous, electrically heated serving vessels often used at
buffets and parties, have become an integral part of American social gatherings. Krondl argues that
these humble appliances have transcended their functional purpose to become a symbol of community,
hospitality, and even nostalgia.  He examines how chafing dishes have been used throughout American
history, from early 20th-century dinner parties to modern-day wedding receptions and corporate
events. He also discusses how they've influenced the way we eat, interact with each other, and
perceive social statu

In [5]:
# load 5 books that were released recently

web_loader = UnstructuredURLLoader(
    urls=["https://www.gutenberg.org/cache/epub/73560/pg73560-images.html", "https://www.gutenberg.org/cache/epub/73555/pg73555-images.html", "https://www.gutenberg.org/cache/epub/73549/pg73549-images.html", "https://www.gutenberg.org/cache/epub/73505/pg73505-images.html", "https://www.gutenberg.org/cache/epub/73548/pg73548-images.html"], mode="elements", strategy="fast",
    )
web_doc = web_loader.load()
updated_web_doc = filter_complex_metadata(web_doc)

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /root/nltk_data...
[nltk_data]   Unzipping taggers/averaged_perceptron_tagger.zip.


In [6]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=2048, chunk_overlap=512)
chunked_web_doc = text_splitter.split_documents(updated_web_doc)
len(chunked_web_doc)

2661

In [7]:
embeddings = HuggingFaceEmbeddings() # default model_name="sentence-transformers/all-mpnet-base-v2"


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.6k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]



config.json:   0%|          | 0.00/571 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/438M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/363 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/239 [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

In [8]:
%%time

# Create the vectorized db with FAISS

db_web = FAISS.from_documents(chunked_web_doc, embeddings)

CPU times: user 9.58 s, sys: 37.2 ms, total: 9.62 s
Wall time: 9.91 s


In [9]:
%%time

prompt_template = """
<|begin_of_text|><|start_header_id|>system<|end_header_id|>

Use the following context to answer the question at the end. Do not use any other knowledge or information that you may have.
If you can't find the relevant information in the following context, your answer should use no words. Do not try to make up an answer.
For example, if the context is "" and the question is "What is a car?", you should output
CONTEXT:

{context}<|eot_id|><|start_header_id|>user<|end_header_id|>

{question}<|eot_id|><|start_header_id|>assistant<|end_header_id|>
"""


CPU times: user 4 µs, sys: 2 µs, total: 6 µs
Wall time: 7.15 µs


In [10]:
prompt = PromptTemplate(template=prompt_template, input_variables=["context", "question"])
Chain_web = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    # retriever=db.as_retriever(search_type="similarity_score_threshold", search_kwargs={'k': 5, 'score_threshold': 0.8})
    # Similarity Search is the default way to retrieve documents relevant to a query, but we can use MMR by setting search_type = "mmr"
    # k defines how many documents are returned; defaults to 4.
    # score_threshold allows to set a minimum relevance for documents returned by the retriever, if we are using the "similarity_score_threshold" search type.
    # return_source_documents=True, # Optional parameter, returns the source documents used to answer the question
    retriever=db_web.as_retriever(search_type="similarity_score_threshold", search_kwargs={'k': 10, 'score_threshold': 0.1}),
    chain_type_kwargs={"prompt": prompt},
)


In [17]:
query = "Who are the characters in The Cult of the Chafing Dish?"
result = Chain_web.invoke(query)
result

Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


{'query': 'Who are the characters in The Cult of the Chafing Dish?',
 'result': '\n<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\nUse the following context to answer the question at the end. Do not use any other knowledge or information that you may have.\nIf you can\'t find the relevant information in the following context, your answer should use no words. Do not try to make up an answer.\nFor example, if the context is "" and the question is "What is a car?", you should output\nCONTEXT:\n\nTitle: The Cult of the Chafing Dish\n\nThe Project Gutenberg eBook of The Cult of the Chafing Dish\n\n*** END OF THE PROJECT GUTENBERG EBOOK THE CULT OF THE CHAFING DISH ***\n\nTHE·CULT·OF·THE·CHAFINGDISH···\n\n*** START OF THE PROJECT GUTENBERG EBOOK THE CULT OF THE CHAFING DISH ***\n\nCHAPTER·1·THE·CHAFING·DISH\n\nAntiquity of the Chafing Dish, 3\n\nChafing Dish, Antiquity of, 3\n\nTo bachelors, male and female, in chambers,\r\nlodgings, diggings, and the like, in fact to\r\nall w

In [18]:
print(fill(result['result'].strip(), width=100))

<|begin_of_text|><|start_header_id|>system<|end_header_id|>  Use the following context to answer the
question at the end. Do not use any other knowledge or information that you may have. If you can't
find the relevant information in the following context, your answer should use no words. Do not try
to make up an answer. For example, if the context is "" and the question is "What is a car?", you
should output CONTEXT:  Title: The Cult of the Chafing Dish  The Project Gutenberg eBook of The Cult
of the Chafing Dish  *** END OF THE PROJECT GUTENBERG EBOOK THE CULT OF THE CHAFING DISH ***
THE·CULT·OF·THE·CHAFINGDISH···  *** START OF THE PROJECT GUTENBERG EBOOK THE CULT OF THE CHAFING
DISH ***  CHAPTER·1·THE·CHAFING·DISH  Antiquity of the Chafing Dish, 3  Chafing Dish, Antiquity of,
3  To bachelors, male and female, in chambers,  lodgings, diggings, and the like, in fact to  all
who “batch”; to young couples with a  taste for theatres, concerts, and homely late  suppers; to
yachtsmen, shoot

In [14]:
%%time

query = "What are the places mentioned in How it feels to be colored me?"
result = Chain_web.invoke(query)
print(fill(result['result'].strip(), width=100))

Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


<|begin_of_text|><|start_header_id|>system<|end_header_id|>  Use the following context to answer the
question at the end. Do not use any other knowledge or information that you may have. If you can't
find the relevant information in the following context, your answer should use no words. Do not try
to make up an answer. For example, if the context is "" and the question is "What is a car?", you
should output CONTEXT:  How It Feels  to Be Colored Me  Title: How it feels to be colored me  The
Project Gutenberg eBook of How it feels to be colored me  *** END OF THE PROJECT GUTENBERG EBOOK HOW
IT FEELS TO BE COLORED ME ***  *** START OF THE PROJECT GUTENBERG EBOOK HOW IT FEELS TO BE COLORED
ME ***  I remember the very day  that I became colored. Up to my  thirteenth year I lived in the
little Negro town of Eatonville,  Florida. It is exclusively a colored  town. The only white people
I  knew passed through the town  going to or coming from Orlando.  The native whites rode dusty
horses, the

In [19]:
%%time

query = "Who is BUCK BRADY?"
result = Chain_web.invoke(query)
print(fill(result['result'].strip(), width=100))

Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


<|begin_of_text|><|start_header_id|>system<|end_header_id|>  Use the following context to answer the
question at the end. Do not use any other knowledge or information that you may have. If you can't
find the relevant information in the following context, your answer should use no words. Do not try
to make up an answer. For example, if the context is "" and the question is "What is a car?", you
should output CONTEXT:  Title: By order of Buck Brady  BY ORDER OF BUCK BRADY  —BUCK BRADY.  “You’ve
got a sweet idea of fun,”  growled Pete. “That moth eaten old  sidewinder is jist thirty-two years
old,  and if we hadn’t turned them guns over  to him he’d jist about ruined the whole  gang of us
with his pet Winchester. When  you see ‘By order of Buck Brady,’ you  better read the upper part of
it and act  accordingly.”  Buck Brady was always whittling.  Thin shavings were an  obsession with
Buck. He would  sit for hours, tilted back in a  broken chair against the shady side of his  little
offic

In [24]:
%%time

query = "Who is going to win the United States 2024 presidential election" # should return a blank output
result = Chain_web.invoke(query)
print(fill(result['result'].strip(), width=100))

Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


<|begin_of_text|><|start_header_id|>system<|end_header_id|>  Use the following context to answer the
question at the end. Do not use any other knowledge or information that you may have. If you can't
find the relevant information in the following context, your answer should use no words. Do not try
to make up an answer. For example, if the context is "" and the question is "What is a car?", you
should output CONTEXT:  <|eot_id|><|start_header_id|>user<|end_header_id|>  Who is going to win the
United States 2024 presidential election<|eot_id|><|start_header_id|>assistant<|end_header_id|>
CONTEXT:
CPU times: user 1.19 s, sys: 4.62 ms, total: 1.2 s
Wall time: 1.53 s
