## **RAG: Document Retrieval and Generation**

The goal of this notebook is to demonstrate the process of document retrieval and generation using a Retrieval-Augmented Generation (RAG) approach. It showcases how to index a set of documents using embeddings and FAISS, retrieve the most relevant documents based on a given query, and then generate a contextually accurate response by concatenating the retrieved documents with a prompt.

**1. Library Testing and GPU Availability**:

In this section, it will be tested and checked the availability of several libraries including:

- **scikit-learn** (sklearn)
- **FAISS** (Facebook AI Similarity Search)
- **Transformers** (Hugging Face)
- **TensorFlow**
- **PyTorch** (torch)

In [None]:
import sklearn
print(sklearn.__version__)

1.5.2


In [None]:
#!pip install faiss-gpu-cu11

In [3]:
import faiss
print(faiss.get_num_gpus())  # Should print the number of GPUs

1


In [16]:
from transformers import pipeline
print("Transformers library is working!")

Transformers library is working!


In [6]:
import tensorflow as tf
print("Is GPU available:", tf.config.list_physical_devices('GPU'))
print("Built with CUDA:", tf.test.is_built_with_cuda())
print("Built with cuDNN:", tf.test.is_built_with_gpu_support())
print("GPU Device Name:", tf.test.gpu_device_name())

Is GPU available: [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
Built with CUDA: True
Built with cuDNN: True
GPU Device Name: /device:GPU:0


2024-11-26 13:59:09.249212: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:894] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2024-11-26 13:59:09.249910: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:894] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2024-11-26 13:59:09.249993: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:894] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysf

In [15]:
import torch 
print(torch.__version__)
print(torch.cuda.is_available())

2.4.1
True


In [8]:
import torch
print("Is CUDA available:", torch.cuda.is_available())
print("cuDNN Version:", torch.backends.cudnn.version())
print("Is cuDNN Enabled:", torch.backends.cudnn.enabled)

Is CUDA available: True
cuDNN Version: 90100
Is cuDNN Enabled: True


#### **Retrieval-augmented generation (RAG)**

**2. Example Execution:**

A simple retrieval-augmented generation (RAG) pipeline is created using the Hugging Face Transformers library. A pre-trained model and tokenizer are loaded, followed by a retrieval task using FAISS, and text is generated based on the retrieved context.

In [None]:
# HuggingFace token
import os
from dotenv import load_dotenv

load_dotenv() # load variables from .env 

hf_token=os.getenv("HF_TOKEN")

In [None]:
# Import necessary libraries
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, AutoModel
import faiss
import torch
import warnings
warnings.filterwarnings("ignore")
from sklearn.feature_extraction.text import TfidfVectorizer

# Data example
documents = [
    "A Torre Eiffel tem 324 metros de altura.",
    "A Estátua da Liberdade fica em Nova York.",
    "O Monte Everest é a montanha mais alta do mundo."
]

# Document Vectorization
tokenizer = AutoTokenizer.from_pretrained("sentence-transformers/all-MiniLM-L6-v2")
model = AutoModel.from_pretrained("sentence-transformers/all-MiniLM-L6-v2")

def embed(texts):
    inputs = tokenizer(texts, padding=True, truncation=True, return_tensors="pt")
    with torch.no_grad():
        embeddings = model(**inputs).last_hidden_state.mean(dim=1)
    return embeddings.numpy()

document_embeddings = embed(documents)   

# Indexing with FAISS
index = faiss.IndexFlatL2(document_embeddings.shape[1])
index.add(document_embeddings)

# Generator Model
generator_tokenizer = AutoTokenizer.from_pretrained("unicamp-dl/ptt5-base-portuguese-vocab") # available for pt-br language
generator_model = AutoModelForSeq2SeqLM.from_pretrained("unicamp-dl/ptt5-base-portuguese-vocab")

# Search and Generation Function
def rag(question, top_k=1):
    # Question Embedding
    question_embedding = embed([question])
    # Retrieval of the Most Similar Documents
    distances, indices = index.search(question_embedding, top_k)
    retrieved_docs = [documents[idx] for idx in indices[0]]
    # Concatenate Retrieved Documents to the Prompt
    input_text = " ".join(retrieved_docs) + " " + question
    inputs = generator_tokenizer.encode(input_text, return_tensors="pt", truncation=True)
    with warnings.catch_warnings():
       warnings.simplefilter("ignore")
    outputs = generator_model.generate(inputs, max_length=50, num_beams=2)
    answer = generator_tokenizer.decode(outputs[0], skip_special_tokens=True)
    return answer 

# Usage Example
question = "Qual é a altura da Torre Eiffel?"
print(rag(question))

Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.


A Torre Eiffel tem 324 metros de altura. Qual é a altura da Torre Eiffel?


In [20]:
from transformers import AutoTokenizer, AutoModelForMaskedLM, pipeline

# Load the model and tokenizer
model = AutoModelForMaskedLM.from_pretrained('neuralmind/bert-base-portuguese-cased')
tokenizer = AutoTokenizer.from_pretrained('neuralmind/bert-base-portuguese-cased', do_lower_case=False)

# Create the pipeline for the fill-mask task
pipe = pipeline('fill-mask', model=model, tokenizer=tokenizer, device=0)

# Test the pipeline
result = pipe('Tinha uma [MASK] no meio do caminho.')
print(result)

Some weights of the model checkpoint at neuralmind/bert-base-portuguese-cased were not used when initializing BertForMaskedLM: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight', 'cls.seq_relationship.bias', 'cls.seq_relationship.weight']
- This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


[{'score': 0.1428781896829605, 'token': 5028, 'token_str': 'pedra', 'sequence': 'Tinha uma pedra no meio do caminho.'}, {'score': 0.062133897095918655, 'token': 7411, 'token_str': 'árvore', 'sequence': 'Tinha uma árvore no meio do caminho.'}, {'score': 0.05514989048242569, 'token': 5675, 'token_str': 'estrada', 'sequence': 'Tinha uma estrada no meio do caminho.'}, {'score': 0.029918920248746872, 'token': 1105, 'token_str': 'casa', 'sequence': 'Tinha uma casa no meio do caminho.'}, {'score': 0.025660440325737, 'token': 3466, 'token_str': 'cruz', 'sequence': 'Tinha uma cruz no meio do caminho.'}]


In [None]:
# Another Example - RAG
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
from sentence_transformers import SentenceTransformer 
import faiss
import numpy as np

# Simulating a Document Database
documents = [
    "A energia renovável é obtida de recursos naturais que se regeneram naturalmente.",
    "Os principais tipo de energia renovável são solar, eólica, hidroelétrica e biomassa.",
    "A energia renovável ajuda a reduzir a emissão de gases de efeito estufa."
    "Investir em energia renovável pode impulsionar a economia verde."
]

# Step 1: Document Indexing Using Embeddings"
embedder = SentenceTransformer('all-MiniLM-L6-v2')
document_embeddings = embedder.encode(documents)

# Creating the FAISS Index"
d = document_embeddings.shape[1]
index = faiss.IndexFlatL2(d)
index.add(document_embeddings)

# Function to Retrieve Relevant Documents
def retrieve_documents(query, k=2):
    query_embedding = embedder.encode([query])
    distances, indices = index.search(query_embedding, k)
    retrieved_docs = [documents[idx] for idx in indices[0]]
    return retrieved_docs

# Load T5 tokenizer and model fine-tuned on Portuguese
generator_tokenizer = AutoTokenizer.from_pretrained("unicamp-dl/ptt5-base-portuguese-vocab")
generator_model = AutoModelForSeq2SeqLM.from_pretrained("unicamp-dl/ptt5-base-portuguese-vocab")

# Function to Generate the Response
def generate_answer(question, retrieved_docs):
    context = " ".join(retrieved_docs)
    input_text = f"summarize: question: {question} context: {context}"
    inputs = generator_tokenizer(input_text, return_tensors="pt", max_length=512, truncation=True)
    outputs = generator_model.generate(**inputs, max_length=150, num_beams=5, early_stopping=True)
    answer = generator_tokenizer.decode(outputs[0], skip_special_tokens=True)
    return answer

# Usage Example
question = "Fale sobre energia renovável"
retrieved_docs = retrieve_documents(question)
answer = generate_answer(question, retrieved_docs)
print("Pergunta:", question)
print("Resposta:", answer)

Pergunta: Fale sobre energia renovável
Resposta: summarize: question: Fale sobre energia renovável context: A energia renovável ajuda a reduzir a emissão de gases de efeito estufa.Investir em energia renovável pode impulsionar a economia verde. A energia renovável é obtida de recursos naturais que se regeneram naturalmente.
