# RAG

## Requirements

In [1]:
%%capture
!pip install transformers accelerate bitsandbytes langchain langchain-community sentence-transformers faiss-gpu pandas gdown

## Dataset

In [2]:
!gdown --fuzzy https://drive.google.com/file/d/1Lq2zVJlN_B4kUAu4VafQ4jXMIQiAR9vI/view?usp=sharing

Downloading...
From (original): https://drive.google.com/uc?id=1Lq2zVJlN_B4kUAu4VafQ4jXMIQiAR9vI
From (redirected): https://drive.google.com/uc?id=1Lq2zVJlN_B4kUAu4VafQ4jXMIQiAR9vI&confirm=t&uuid=f823fa5c-5658-42eb-9b5d-98416da52b84
To: /content/IMDB_crawled.json
100% 292M/292M [00:06<00:00, 44.3MB/s]


## Config

In [3]:
class Config:
    EMBEDDING_MODEL_NAME="thenlper/gte-base"
    LLM_MODEL_NAME="HuggingFaceH4/zephyr-7b-beta"
    K = 5 # top K retrieval

## Preprocessing

In [4]:
import pandas as pd

df = pd.read_json('IMDB_crawled.json')

In [5]:
df.columns

Index(['id', 'title', 'first_page_summary', 'release_year', 'mpaa', 'budget',
       'gross_worldwide', 'rating', 'directors', 'writers', 'stars',
       'related_links', 'languages', 'countries_of_origin', 'summaries',
       'synposis', 'reviews', 'genres'],
      dtype='object')

In [6]:
import os

os.makedirs('data', exist_ok=True)

# preprocess your data and only store the needed data as the context window for embedding model is limited
df = df[["title", "release_year", "first_page_summary", "genres", "rating"]]

df.to_csv('data/imdb.csv', index=False)

## Vectorizer

load the CSV file and vectorize the rows using HuggingFaceEmbeddings.
Store the results using FAISS vectorstore.
Save the vectorestore in a pickle file for future usages.

In [7]:
import pickle

from langchain.document_loaders.csv_loader import CSVLoader
from langchain.vectorstores.utils import DistanceStrategy
from langchain.vectorstores.faiss import FAISS

from langchain_community.embeddings import HuggingFaceEmbeddings

# load the csv
loader = CSVLoader(file_path='data/imdb.csv')
docs = loader.load()
# load the embeddings model
model_name = "sentence-transformers/all-mpnet-base-v2"
model_kwargs = {'device': 'cuda'}
encode_kwargs = {'normalize_embeddings': False}
hf = HuggingFaceEmbeddings(
    model_name=model_name,
    model_kwargs=model_kwargs,
    encode_kwargs=encode_kwargs
)
# save embed the documents using the model in a vectorstore
vectorstore = FAISS.from_documents(docs, hf)

with open("data/vectorstore.pkl", "wb") as f:
    pickle.dump(vectorstore, f)

  warn_deprecated(
  from tqdm.autonotebook import tqdm, trange
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.6k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]



config.json:   0%|          | 0.00/571 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/438M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/363 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/239 [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

load the vectorstore as a retriever.

In [8]:
# with open("data/vectorstore.pkl", "rb") as f:
#     vectorstore = pickle.load(f)

# load the retriever from the vectorstore
retriever = vectorstore.as_retriever()

## LLM

load the quantized LLM.

In [9]:
import torch

from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
from transformers import pipeline

from langchain_community.llms.huggingface_pipeline import HuggingFacePipeline

# load the quantization config
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
)

model = AutoModelForCausalLM.from_pretrained(Config.LLM_MODEL_NAME, quantization_config=bnb_config, device_map="cuda:0")
tokenizer = AutoTokenizer.from_pretrained(Config.LLM_MODEL_NAME)

# # init the pipeline
READER_LLM = pipeline("text-generation", model=model, tokenizer=tokenizer, max_new_tokens= 200)

llm = HuggingFacePipeline(
    pipeline=READER_LLM
)

config.json:   0%|          | 0.00/638 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/23.9k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/8 [00:00<?, ?it/s]

model-00001-of-00008.safetensors:   0%|          | 0.00/1.89G [00:00<?, ?B/s]

model-00002-of-00008.safetensors:   0%|          | 0.00/1.95G [00:00<?, ?B/s]

model-00003-of-00008.safetensors:   0%|          | 0.00/1.98G [00:00<?, ?B/s]

model-00004-of-00008.safetensors:   0%|          | 0.00/1.95G [00:00<?, ?B/s]

model-00005-of-00008.safetensors:   0%|          | 0.00/1.98G [00:00<?, ?B/s]

model-00006-of-00008.safetensors:   0%|          | 0.00/1.95G [00:00<?, ?B/s]

model-00007-of-00008.safetensors:   0%|          | 0.00/1.98G [00:00<?, ?B/s]

model-00008-of-00008.safetensors:   0%|          | 0.00/816M [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/8 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/111 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/1.43k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/493k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.80M [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/42.0 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/168 [00:00<?, ?B/s]

  warn_deprecated(


initialize the prompt template for the query chain. query chain is used to get a query from the chat history. you may change the prompt as you like to get better results.

In [26]:
from langchain.prompts import PromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough

class LoggerStrOutputParser(StrOutputParser):
    def parse(self, text: str) -> str:
        # process the LLM output
        text = text.split("<|assistant|>\n")[-1]
        print(f"QUERY: {text}")
        return text

query_transform_prompt = PromptTemplate(
    input_variables=["messages"],
    template="""<|system|>You are a helpful assistant.
{messages}
<|user|>
give me ONE simple search query for user's last request in above conversation.
<|assistant|>"""
)

# init the query chain
query_transforming_retriever_chain = (
    {"messages": RunnablePassthrough()}
    | query_transform_prompt
    | llm
    | LoggerStrOutputParser()
)

initialize the main retrieval chain that gives the resulting documents to LLM and gets the output back.

In [27]:
from langchain.chains.combine_documents import create_stuff_documents_chain


prompt = PromptTemplate(
    input_variables=["context", "messages"],
    template="""<|system|>You are a helpful assistant.

Here are the movies you MUST choose from:

{context}
-----------------
{messages}
<|assistant|>""")

# init the retriver chain
retrieval_chain = (
    {"context":retriever, "messages": RunnablePassthrough()}
    | prompt
    | llm
    | LoggerStrOutputParser()
)

write the conversation helper class for easier testing.

In [23]:
class Conversation:
    def __init__(self):
        self.messages = []

    def add_assistant_message(self, message):
        self.messages.append(('assistant', message))

    def add_user_message(self, message):
        self.messages.append(('user', message))

    def get_messages(self):
        # concatenate the messages with the roles in the instruction format
        return "\n".join([f'<|{message[0]}|>\n"{message[1]}."' for message in self.messages])

    def chat(self, message):
        self.add_user_message(message)
        messages = self.get_messages()
        # invoke the chain
        query = query_transforming_retriever_chain.invoke(messages)
        response = retrieval_chain.invoke(query)
        self.add_assistant_message(response)
        return response

## Test

talk with the RAG to see how good it performs.

In [28]:
c = Conversation()
A = c.chat('give me a cool gangster movie')
# print(A)

QUERY: "recommend a stylish and gritty gangster movie for me to watch"
QUERY: Based on your preference for a stylish and gritty gangster movie, I would recommend "American Gangster" (2007) starring Denzel Washington and Russell Crowe. This biographical crime drama is based on the true story of Frank Lucas, a Harlem drug lord who rose to power in the 1970s. The movie is known for its gritty and realistic portrayal of the criminal underworld, as well as its stylish cinematography and powerful performances by the lead actors. With a rating of 7.8 on IMDb, "American Gangster" is a must-watch for any fan of the gangster genre.

If you're interested in a more classic example of the genre, you might also consider "Oscar" (1991), starring Marlon Brando and Johnny Depp. This crime comedy follows the story of a gangster trying to keep a promise he


In [29]:
A = c.chat('give me a newer one')
# print(A)

You seem to be using the pipelines sequentially on GPU. In order to maximize efficiency please use a dataset


QUERY: "Recommend a more recent gangster movie similar to 'American Gangster' with a rating of 7.8 or higher on IMDb."
QUERY: Based on your criteria, I would recommend the movie "Blitz" (2011) as a more recent gangster movie similar to "American Gangster" with a rating of 7.8 or higher on IMDb. While "Blitz" is not specifically labeled as a documentary, it is a crime thriller that is based on true events, making it a biographical film in a similar vein to "American Gangster". The movie follows a tough cop who is determined to bring down a ruthless gang leader, and it has received critical acclaim with a rating of 7.8 on IMDb. Some of the genres listed on IMDb for "Blitz" include crime, drama, and thriller, which are all similar to the genres listed for "American Gangster". I hope this recommendation helps, and I hope you enjoy watching "Blitz"!
