<a href="https://colab.research.google.com/github/aliikhwan99/RAG/blob/main/building_basic_RAG.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **RAG (Retrieval-Augmented Generation)** is a framework that **combines** retrieval-based methods with **generative models**. It enhances the capabilities of a **generative model** (like OpenAI's GPT or T5) by using **external knowledge** sources during the generation process, improving factual accuracy and relevance.

# What is RAG?

* Retrieval-Augmented: The system retrieves relevant information from a knowledge base or external documents (like Wikipedia, custom datasets, etc.) to provide context.

* Generation: A generative model uses the retrieved information to produce coherent and contextually enriched responses or outputs.

* RAG frameworks are widely used in applications like question answering, summarization, and document generation where factual accuracy is critical.

# How to Build RAG in Python

You can build a RAG pipeline using libraries like Hugging Face Transformers, faiss (for dense vector search), and OpenAI embeddings. Here's a high-level approach

# Step 1: Define Your Knowledge Base

* Collect your data: This could be documents, articles, or any textual information you want the model to retrieve from.
* Preprocess the data: Tokenize, clean, and structure it for embedding generation.

# Step 2: Embed the Knowledge Base

Use embeddings to convert your textual data into vector representations.

* Libraries: Use sentence-transformers (e.g., sentence-transformers/all-mpnet-base-v2) or OpenAI's embedding models.

In [None]:
pip install sentence-transformers


In [None]:
import sentence_transformers
print("sentence-transformers version:", sentence_transformers.__version__)

In [None]:
from sentence_transformers import SentenceTransformer
model = SentenceTransformer('all-mpnet-base-v2')

# Sample data
documents = ["This is a document about AI.", "Another document on machine learning."]
embeddings = model.encode(documents, convert_to_tensor=True)

# Step 3: Set Up a Retrieval System
Store the embeddings for fast retrieval.

Use faiss for approximate nearest neighbor (ANN) search.

In [None]:
pip install faiss-cpu

In [None]:
import faiss
import numpy as np

# Convert embeddings to numpy
embeddings_np = np.array([emb.numpy() for emb in embeddings])

# Indexing
index = faiss.IndexFlatL2(embeddings_np.shape[1])  # L2 distance
index.add(embeddings_np)


# Step 4: Build the Generative Model
Fine-tune or use a pre-trained generative model (like GPT or T5).

Integrate with Hugging Face's transformers

In [None]:
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("t5-small")
model = AutoModelForSeq2SeqLM.from_pretrained("t5-small")

# Step 5: Combine Retrieval and Generation
**1.Accept a query, retrieve top-k relevant documents using faiss:**

In [None]:
from sentence_transformers import SentenceTransformer
import numpy as np

# Load SentenceTransformer model for embeddings
embedding_model = SentenceTransformer('all-mpnet-base-v2')

# Generate query embedding
query = "What is AI?"
query_embedding = embedding_model.encode([query])[0]  # This produces a NumPy array


**2.Append retrieved documents to the query and generate a response:**

In [None]:
input_text = f"Question: {query} Context: {retrieved_docs[0]}"
inputs = tokenizer(input_text, return_tensors="pt", max_length=512, truncation=True)
outputs = model.generate(**inputs)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

In [None]:
retrieved_docs = []


if retrieved_docs:
    input_text = f"Question: {query} Context: {retrieved_docs[0]}"
else:
    print("No documents retrieved. Please check the retrieval process.")


In [None]:
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
import faiss
import numpy as np

# Initialize the model and tokenizer
model_name = "facebook/bart-large-cnn"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)

# Example documents
documents = ["This is a document about AI.", "Another document on machine learning."]

# Generate embeddings (mock embeddings for simplicity)
embeddings = np.random.rand(len(documents), 768).astype("float32")  # Mock embeddings
index = faiss.IndexFlatL2(768)  # Initialize FAISS index
index.add(embeddings)

# Query
query = "What is AI?"
query_embedding = np.random.rand(768).astype("float32")  # Mock query embedding
D, I = index.search(np.array([query_embedding]), k=1)

# Retrieve documents
retrieved_docs = [documents[i] for i in I[0]]

# Generate response
if retrieved_docs:
    input_text = f"Question: {query} Context: {retrieved_docs[0]}"
    inputs = tokenizer(input_text, return_tensors="pt", max_length=512, truncation=True)
    outputs = model.generate(**inputs)
    response = tokenizer.decode(outputs[0], skip_special_tokens=True)
    print(response)
else:
    print("No documents retrieved.")
