<a href="https://www.kaggle.com/code/aisuko/retrieval-augmented-generation-arg?scriptVersionId=166817782" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

# Overview

We introduced **Retrieval Augmented Generation(RAG)** in article [Retrieval Agumented Generation(RAG)](https://open.substack.com/pub/aisuko/p/retrieval-augmented-generationrag?r=fbe14&utm_campaign=post&utm_medium=web&showWelcomeOnShare=true). RAG enables LLMs reach beyond their static knowledge and grab real-time information quite easily. RAG taps into external sources, like internal company documents, to understand the context of our queries a bit better.

In this notebook, we are going to use RAG with it's default database from the paper. Retrieval-agumented generation("RAG") models combine the powers of pretrained dense retrieval(DPR) and Seq2Seq2 models. RAG models retrieve docs, pass them to a seq2seq2 model, then marginalize to generate outputs. The **retriever** and **seq2seq models** are initlized from pretrained models, and fine-tuned jointly, allowing both retreival and generation to adapt to downstream tasks. Here are some classes inside transformers that we need to know:

In [1]:
%%capture
!conda install -c pytorch -c nvidia faiss-gpu=1.8.0 -y

In [2]:
%%capture
!pip install transformers==4.38.2
# !pip install accelerate==0.27.2
!pip install datasets==2.18.0 # Fix issue numpy attributes error of RagRetriever
# !pip install peft==0.9.0
# !pip install bitsandbytes==0.42.0

In [3]:
import os
import torch
import warnings
# for checking faiss-gpu
import faiss

from huggingface_hub import login
from kaggle_secrets import UserSecretsClient
user_secrets = UserSecretsClient()
login(token=user_secrets.get_secret("HUGGINGFACE_TOKEN"))


os.environ["MODEL_NAME"] = "facebook/rag-token-base"
os.environ["DATASET"] = ""

if torch.cuda.is_available():
    torch.backends.cudnn.deterministic=True
    # https://github.com/huggingface/transformers/issues/28731
    torch.backends.cuda.enable_mem_efficient_sdp(False)
    device='cuda'
else:
    device='cpu'
    
warnings.filterwarnings('ignore')

Token will not been saved to git credential helper. Pass `add_to_git_credential=True` if you want to set the git credential as well.
Token is valid (permission: write).
Your token has been saved to /root/.cache/huggingface/token
Login successful


In [4]:
from transformers import AutoTokenizer

tokenizer=AutoTokenizer.from_pretrained(os.getenv("MODEL_NAME"))

config.json:   0%|          | 0.00/4.55k [00:00<?, ?B/s]

(…)_encoder_tokenizer/tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

question_encoder_tokenizer/vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

(…)ncoder_tokenizer/special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization. 
The tokenizer class you load from this checkpoint is 'RagTokenizer'. 
The class this function is called from is 'DPRQuestionEncoderTokenizer'.
The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization. 
The tokenizer class you load from this checkpoint is 'RagTokenizer'. 
The class this function is called from is 'DPRQuestionEncoderTokenizerFast'.


(…)enerator_tokenizer/tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

generator_tokenizer/vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

generator_tokenizer/merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

(…)erator_tokenizer/special_tokens_map.json:   0%|          | 0.00/772 [00:00<?, ?B/s]

The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization. 
The tokenizer class you load from this checkpoint is 'RagTokenizer'. 
The class this function is called from is 'BartTokenizer'.
The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization. 
The tokenizer class you load from this checkpoint is 'RagTokenizer'. 
The class this function is called from is 'BartTokenizerFast'.


In [5]:
from transformers import RagRetriever

retriever=RagRetriever.from_pretrained(
    os.getenv("MODEL_NAME"), 
    index_name="exact", 
    use_dummy_dataset=True,
    trust_remote_code=True
)

The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.
The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization. 
The tokenizer class you load from this checkpoint is 'RagTokenizer'. 
The class this function is called from is 'DPRQuestionEncoderTokenizer'.
The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization. 
The tokenizer class you load from this checkpoint is 'RagTokenizer'. 
The class this function is called from is 'DPRQuestionEncoderTokenizerFast'.
The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization. 
The tokenizer class you load from this checkpoint is 'RagTokenizer'. 
The class this function is called from is 'BartTokenizer'.
The to

Downloading builder script:   0%|          | 0.00/8.63k [00:00<?, ?B/s]

Downloading readme:   0%|          | 0.00/14.9k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/40.8M [00:00<?, ?B/s]

Generating train split: 0 examples [00:00, ? examples/s]

Generating train split: 0 examples [00:00, ? examples/s]

  0%|          | 0/10 [00:00<?, ?it/s]

In [6]:
from transformers import RagSequenceForGeneration

model=RagSequenceForGeneration.from_pretrained(os.getenv("MODEL_NAME"), retriever=retriever, device=device)
type(model)

pytorch_model.bin:   0%|          | 0.00/2.06G [00:00<?, ?B/s]

transformers.models.rag.modeling_rag.RagSequenceForGeneration

In [7]:
input_ids=tokenizer("How many people live in Melbourne?", return_tensors="pt")

In [8]:
generated=model.generate(input_ids["input_ids"], max_length=50)
generated_str=tokenizer.decode(generated[0], skip_special_tokens=True)
generated_str

'The!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!'

# Reference List

* https://www.promptingguide.ai/research/rag
* https://blog.gopenai.com/retrieval-augmented-generation-rag-585aa903d6bd
* https://github.com/ray-project/ray
* https://github.com/facebookresearch/faiss/blob/main/INSTALL.md
* https://lightning.ai/docs/pytorch/stable/