# Retrieveal Augmented Generation

Retrieval Augmented Generation is a useful way to tailor the knowledge of an LLM to your own documents and knowledge base. In this notebook I will be creating a RAG model using the Huggingface Transformers library. This RAG model pretrained on Wikipedia text for retrieval.

Useful Links:
<ul><li><a href="https://huggingface.co/transformers/v3.3.1/model_doc/rag.html" target="_blank">HF Transformers RAG Documentation</a></li>
    <li><a href="https://arxiv.org/abs/2005.11401" target="_blank">RAG Paper</a></li>
</ul>

Required Python Libraries:
<ul><li>transformers</li> 
    <li>faiss</li>
</ul>

In [1]:
# It's always a good idea to make sure you have the latest packag versions
#!pip install -U transformers faiss datasets

In [2]:
# Remove warnings
import warnings
warnings.filterwarnings(action='ignore')

from transformers.utils import logging
logging.set_verbosity_error() 

In [3]:
from transformers import RagTokenizer, RagRetriever, RagTokenForGeneration

This pretrained model uses the "wiki_dpr" dataset which contains 21M Wikipedia articles from 2018. The dataset is large in terms of disk space at around 78GB however the retrieval indexes are compressed to significantly reduce the required space. There is some loss of accuracy in the compression which needs to be taken into account when deploying RAG. Compression can significantly reduce the amount of disk space and RAM required.

In [4]:
model_name = "facebook/rag-token-nq"

tokenizer = RagTokenizer.from_pretrained(model_name)
retriever = RagRetriever.from_pretrained(model_name, dataset="wiki_dpr", index_name="compressed")
model = RagTokenForGeneration.from_pretrained(model_name, retriever=retriever)

Loading dataset shards:   0%|          | 0/161 [00:00<?, ?it/s]

Loading dataset shards:   0%|          | 0/161 [00:00<?, ?it/s]

In [5]:
input_dict = tokenizer.prepare_seq2seq_batch("What is the capital of Australia?", return_tensors="pt") 

generated = model.generate(input_ids=input_dict["input_ids"]) 
print(tokenizer.batch_decode(generated, skip_special_tokens=True)[0]) 

 canberra


In conclusion RAG is a highly effective way to delploy LLMs in an enterprise setting. New RAG technologies are emerging every day such as agents and tool usage. RAG is currently one of the most popular uses of LLMs.