# RAG
[colab link](https://colab.research.google.com/drive/1NqjR4Ykko7f_Ue9p4eQ93E4qPM2TyD5e?usp=sharing)

## Requirements

In [4]:
%%capture
!pip install transformers accelerate>=0.20.3 bitsandbytes langchain langchain-community faiss-gpu pandas gdown protobuf

In [1]:
%%capture
!pip install sentence-transformers

In [2]:
from sentence_transformers import SentenceTransformer

  from tqdm.autonotebook import tqdm, trange


In [9]:
!pip install datasets

Collecting datasets
  Downloading datasets-2.20.0-py3-none-any.whl (547 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m547.8/547.8 kB[0m [31m8.2 MB/s[0m eta [36m0:00:00[0m
Collecting pyarrow>=15.0.0 (from datasets)
  Downloading pyarrow-16.1.0-cp310-cp310-manylinux_2_28_x86_64.whl (40.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m40.8/40.8 MB[0m [31m21.3 MB/s[0m eta [36m0:00:00[0m
Collecting dill<0.3.9,>=0.3.0 (from datasets)
  Downloading dill-0.3.8-py3-none-any.whl (116 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m116.3/116.3 kB[0m [31m16.4 MB/s[0m eta [36m0:00:00[0m
Collecting requests>=2.32.2 (from datasets)
  Downloading requests-2.32.3-py3-none-any.whl (64 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m64.9/64.9 kB[0m [31m8.4 MB/s[0m eta [36m0:00:00[0m
Collecting xxhash (from datasets)
  Downloading xxhash-3.4.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1

## Dataset

In [5]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [2]:
%cd drive/MyDrive/

/content/drive/MyDrive


In [None]:
!gdown --fuzzy https://drive.google.com/file/d/1Lq2zVJlN_B4kUAu4VafQ4jXMIQiAR9vI/view?usp=sharing

Downloading...
From (original): https://drive.google.com/uc?id=1Lq2zVJlN_B4kUAu4VafQ4jXMIQiAR9vI
From (redirected): https://drive.google.com/uc?id=1Lq2zVJlN_B4kUAu4VafQ4jXMIQiAR9vI&confirm=t&uuid=ee13769f-40be-449d-9834-c73fdd518996
To: /content/IMDB_crawled.json
100% 292M/292M [00:02<00:00, 125MB/s] 


## Config

In [3]:
class Config:
    EMBEDDING_MODEL_NAME="thenlper/gte-base"
    LLM_MODEL_NAME="HuggingFaceH4/zephyr-7b-beta"
    K = 5 # top K retrieval

## Preprocessing

In [11]:
import pandas as pd

df = pd.read_json('IMDB_crawled.json')

In [12]:
import os
import numpy as np

os.makedirs('data', exist_ok=True)

# preprocess your data and only store the needed data as the context window for embedding model is limited

df = df[['title', 'first_page_summary', 'rating', 'reviews', 'genres']]
df = df.replace(to_replace='None', value=np.nan).dropna()

def concatenate(x):
  if len(x) > 0:
    return x[0].strip()
  return ''

def concatenate_reviews(x):
  if len(x) > 0:
    return x[0][0].strip()
  return ''

df['reviews'] = df['reviews'].apply(concatenate_reviews)
df['genres'] = df['genres'].apply(concatenate)

df.to_csv('data/imdb.csv', index=False)

## Vectorizer

load the CSV file and vectorize the rows using HuggingFaceEmbeddings.
Store the results using FAISS vectorstore.
Save the vectorestore in a pickle file for future usages.

In [4]:
import pickle

from langchain.document_loaders.csv_loader import CSVLoader
from langchain.vectorstores.utils import DistanceStrategy
from langchain.vectorstores.faiss import FAISS

from langchain_community.embeddings import HuggingFaceEmbeddings

# load the csv

loader = CSVLoader(file_path="data/imdb.csv")
documents = loader.load()

# load the embeddings model

embeddings = HuggingFaceEmbeddings(model_name=Config.EMBEDDING_MODEL_NAME)

# save embed the documents using the model in a vectorstore

db = FAISS.from_documents(documents, embeddings)
db.save_local('data/vectorstore')

  warn_deprecated(
  from tqdm.autonotebook import tqdm, trange
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/385 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/68.1k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/57.0 [00:00<?, ?B/s]



config.json:   0%|          | 0.00/618 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/219M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/314 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/712k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

load the vectorstore as a retriever.

In [10]:
# load the retriever from the vectorstore

new_db = FAISS.load_local('data/vectorstore', embeddings, allow_dangerous_deserialization=True)
retriever = new_db.as_retriever()

## LLM

load the quantized LLM.

In [16]:
import torch

from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
from transformers import pipeline

from langchain_community.llms.huggingface_pipeline import HuggingFacePipeline

# load the quantization config
bnb_config = BitsAndBytesConfig(load_in_4bit=True)

model = AutoModelForCausalLM.from_pretrained(Config.LLM_MODEL_NAME, quantization_config=bnb_config, device_map="cuda:0")
tokenizer = AutoTokenizer.from_pretrained(Config.LLM_MODEL_NAME)

# init the pipeline
READER_LLM = pipeline("text-generation", model=model, tokenizer=tokenizer, max_new_tokens=1000, top_k=Config.K, temperature=0.1)

llm = HuggingFacePipeline(
    pipeline=READER_LLM,
)

config.json:   0%|          | 0.00/638 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/23.9k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/8 [00:00<?, ?it/s]

model-00001-of-00008.safetensors:   0%|          | 0.00/1.89G [00:00<?, ?B/s]

model-00002-of-00008.safetensors:   0%|          | 0.00/1.95G [00:00<?, ?B/s]

model-00003-of-00008.safetensors:   0%|          | 0.00/1.98G [00:00<?, ?B/s]

model-00004-of-00008.safetensors:   0%|          | 0.00/1.95G [00:00<?, ?B/s]

model-00005-of-00008.safetensors:   0%|          | 0.00/1.98G [00:00<?, ?B/s]

model-00006-of-00008.safetensors:   0%|          | 0.00/1.95G [00:00<?, ?B/s]

model-00007-of-00008.safetensors:   0%|          | 0.00/1.98G [00:00<?, ?B/s]

model-00008-of-00008.safetensors:   0%|          | 0.00/816M [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/8 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/111 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/1.43k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/493k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.80M [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/42.0 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/168 [00:00<?, ?B/s]

  warn_deprecated(


initialize the prompt template for the query chain. query chain is used to get a query from the chat history. you may change the prompt as you like to get better results.

In [25]:
from langchain.prompts import PromptTemplate

from langchain_core.output_parsers import StrOutputParser

class LoggerStrOutputParser(StrOutputParser):
    def parse(self, text: str) -> str:
        # process the LLM output
        text = text.split('\n')[-1]
        print(f"QUERY: {text}")
        return text

query_transform_prompt = PromptTemplate(
    input_variables=["messages"],
    template="""<|system|>You are a helpful assistant.
{messages}
<|user|>
give me the search query about the above conversation.
<|assistant|>"""
)

# init the query chain
query_transforming_retriever_chain = query_transform_prompt | llm | LoggerStrOutputParser() | retriever

initialize the main retrieval chain that gives the resulting documents to LLM and gets the output back.

In [29]:
from langchain.chains.combine_documents import create_stuff_documents_chain

from langchain_core.runnables import RunnablePassthrough

prompt = PromptTemplate(
    input_variables=["context", "messages"],
    template="""<|system|>You are a helpful assistant.

Here are the movies you MUST choose from:

{context}
-----------------
{messages}
<|assistant|>""")

# init the retriver chain
retrieval_chain = create_stuff_documents_chain(llm, prompt)

def get_movies(messages):
  docs = query_transforming_retriever_chain.invoke(messages)
  movies = retrieval_chain.invoke({'context': docs, 'messages': messages})
  return movies

write the conversation helper class for easier testing.

In [30]:
class Conversation:
    def __init__(self):
        self.messages = []

    def add_assistant_message(self, message):
        self.messages.append(('assistant', message))

    def add_user_message(self, message):
        self.messages.append(('user', message))

    def get_messages(self):
        # concatenate the messages with the roles in the instruction format
        messages_text = ''
        for person, message in self.messages:
          messages_text += f'<|{person}|>\n{message}'
        return messages_text

    def chat(self, message):
        self.add_user_message(message)
        messages = self.get_messages()
        # invoke the chain
        response = get_movies(messages)
        response = response.split('<|assistant|>')[-1]
        self.add_assistant_message(response)
        return response

## Test

talk with the RAG to see how good it performs.

In [31]:
import warnings
warnings.filterwarnings("ignore")

c = Conversation()
A = c.chat('give me a cool gangster movie')
print(A)

QUERY: search query: "recommend a gripping gangster movie with a charismatic protagonist navigating the dangerous underworld of organized crime, filled with intense action sequences, gritty realism, and a captivating storyline that keeps you on the edge of your seat."

Based on your preferences and the movies you've already seen, I would highly recommend "Goodfellas" (1990) directed by Martin Scorsese. It's a classic gangster movie that tells the story of Henry Hill and his life in the mafia, covering his relationships with his wife Karen and his mob partners Jimmy Conway and Tommy DeVito. The movie is known for its impeccable pacing, storytelling, and entertainment value. It's a must-watch for any fan of the genre. Enjoy!


In [32]:
A = c.chat('give me a newer one')
print(A)

QUERY: - "Recent mafia movies with a similar focus on the use of music and cinematography to create a memorable and immersive experience as '

If you're looking for a more recent gangster movie, I would suggest "The Irishman" (2019) directed by Martin Scorsese. It's a crime drama based on the book "I Heard You Paint Houses" by Charles Brandt. The movie follows the story of Frank Sheeran, a hitman and union official, and his relationships with Jimmy Hoffa and the Bufalino crime family. It features an all-star cast including Robert De Niro, Al Pacino, and Joe Pesci. The movie is known for its impressive production values, including de-aging technology used to make the actors look younger in certain scenes. It's a long movie, but it's definitely worth watching if you're a fan of the genre. Enjoy!
