<a href="https://colab.research.google.com/github/bhanuchaddha/Understanding-RAG/blob/main/BasicRAG.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Basic RAG (Retrieval-Augmented Generation)**

### **Introduction:**
Retrieval-Augmented Generation (RAG) combines information retrieval and text generation by leveraging pre-trained language models and vector-based search. In this example, we are using:
- **FAISS** (Facebook AI Similarity Search) to index document embeddings and retrieve the most relevant ones.
- **GPT-Neo** for text generation based on the query and retrieved documents.
- **Gradio** to build a simple interactive UI to input a query and get a response.

### **Step 1: Install Required Libraries**
We need to install a few libraries to run the RAG model:
- **transformers**: to load pre-trained models like `bert-base-uncased` and `gpt-neo`.
- **faiss-cpu**: for efficient similarity search and document retrieval.
- **gradio**: to create the interactive user interface.
- **torch**: as it's needed to handle PyTorch models.

In [None]:
!pip install transformers faiss-cpu torch gradio langchain

Collecting faiss-cpu
  Downloading faiss_cpu-1.9.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (4.4 kB)
Collecting gradio
  Downloading gradio-5.0.0-py3-none-any.whl.metadata (15 kB)
Collecting langchain
  Downloading langchain-0.3.3-py3-none-any.whl.metadata (7.1 kB)
Collecting aiofiles<24.0,>=22.0 (from gradio)
  Downloading aiofiles-23.2.1-py3-none-any.whl.metadata (9.7 kB)
Collecting fastapi<1.0 (from gradio)
  Downloading fastapi-0.115.0-py3-none-any.whl.metadata (27 kB)
Collecting ffmpy (from gradio)
  Downloading ffmpy-0.4.0-py3-none-any.whl.metadata (2.9 kB)
Collecting gradio-client==1.4.0 (from gradio)
  Downloading gradio_client-1.4.0-py3-none-any.whl.metadata (7.1 kB)
Collecting httpx>=0.24.1 (from gradio)
  Downloading httpx-0.27.2-py3-none-any.whl.metadata (7.1 kB)
Collecting huggingface-hub<1.0,>=0.23.2 (from transformers)
  Downloading huggingface_hub-0.25.2-py3-none-any.whl.metadata (13 kB)
Collecting orjson~=3.0 (from gradio)
  Downloading orjso

### **Step 2: Import Necessary Libraries**
In this step, we import the libraries needed for building the RAG system:
- `AutoTokenizer` and `AutoModel` are used for handling BERT-based embeddings.
- `faiss` is used to create an index for document embeddings and handle similarity searches.
- `pipeline` helps load GPT-Neo for text generation.

In [None]:
import faiss
import numpy as np
from transformers import AutoTokenizer, AutoModel, pipeline

### **Step 3: Loading Tokenizer and Model for Embeddings**
Here, we load a pre-trained BERT model (`bert-base-uncased`) and its tokenizer. This model will be used to generate embeddings for our documents.

In [None]:
# Load the tokenizer and model for embeddings
tokenizer = AutoTokenizer.from_pretrained('bert-base-uncased')
model = AutoModel.from_pretrained('bert-base-uncased')

### **Step 4: Define Documents**
We define a list of documents that will be used to retrieve relevant information based on the user's query. These are simple sentences about AI and self-driving cars.

In [None]:
documents = [
    "AI is revolutionizing self-driving technology.",
    "Recent AI advancements focus on safety in autonomous vehicles.",
    "AI is helping to reduce human errors in self-driving cars.",
    "Large language models are driving advancements in machine learning."
]

### **Step 5: Function to Generate Embeddings**
This function, `get_embeddings`, takes in a list of text strings, tokenizes them, and generates vector embeddings using the BERT model. These embeddings are what we will index using FAISS for similarity search.

In [None]:
def get_embeddings(texts):
    inputs = tokenizer(texts, return_tensors='pt', padding=True, truncation=True)
    outputs = model(**inputs)
    return outputs.last_hidden_state[:, 0, :].detach().numpy()

### **Step 6: Compute Embeddings for the Documents**
We use the `get_embeddings` function to convert the documents into embeddings. Each document is represented as a vector in this step, which can later be indexed for similarity searches.

In [None]:
# Compute embeddings for documents
embeddings = get_embeddings(documents)

### **Step 7: Indexing the Embeddings using FAISS**
FAISS is used to index the document embeddings. This allows us to efficiently search for documents that are most similar to the user's query. Here, we create a FAISS index and add the document embeddings to it.

In [None]:
# Index the embeddings using FAISS
dimension = embeddings.shape[1]
index = faiss.IndexFlatL2(dimension)
index.add(embeddings)

### **Step 8: Load GPT-Neo for Text Generation**
We load GPT-Neo (specifically `EleutherAI/gpt-neo-1.3B`), a model capable of generating human-like text. This will be used to generate a response based on the query and relevant documents.

In [None]:
# Load GPT-Neo for text generation
generator = pipeline('text-generation', model='EleutherAI/gpt-neo-1.3B')

### **Step 9: Retrieval and Generation Function**
The `retrieve_and_generate` function takes a user's query, generates its embedding, retrieves the top-k most relevant documents, and uses those documents to help GPT-Neo generate a response.
- First, it calculates the embedding of the query and finds the nearest documents in the FAISS index.
- Then, it combines the retrieved documents with the query to create a 'prompt' for GPT-Neo, which will generate the response.

In [None]:

def retrieve_and_generate(query, top_k=2):
    query_embedding = get_embeddings([query])
    distances, indices = index.search(query_embedding, top_k)
    retrieved_docs = [documents[idx] for idx in indices[0]]
    prompt = query + "\n\nRelevant information:\n" + "\n".join(retrieved_docs)
    generated_text = generator(prompt, max_length=100, do_sample=False)[0]['generated_text']
    return generated_text


### **Step 10: Gradio Interface Setup**
We use **Gradio** to create a simple user interface where the user can input a query. The Gradio interface will call the `retrieve_and_generate` function to display the generated response.

In [None]:

import gradio as gr

interface = gr.Interface(
    fn=retrieve_and_generate,
    inputs='text',
    outputs='text',
    title='Retrieval-Augmented Generation (RAG) Demo',
    description='Ask a question related to AI and see RAG in action. It retrieves relevant information and generates a response using GPT-Neo.'
)

interface.launch()


### **Step 11: Run the Gradio Interface**
Finally, we launch the Gradio app. This creates an interface where users can type a query and get a response generated by the RAG model.

After running the code, you’ll get a link where you can interact with the model through a simple web UI.

### **Example Query:**
You can test the app with the following query:

```plaintext
Tell me about the latest AI trends in self-driving cars.
```

The RAG model will retrieve the most relevant documents and use them to generate a response using GPT-Neo.

### **Conclusion:**
This demonstrates a basic Retrieval-Augmented Generation (RAG) model with embeddings, FAISS for similarity search, GPT-Neo for text generation, and Gradio for UI interaction. This setup provides a foundation for building more advanced systems with additional documents, more complex retrieval mechanisms, or fine-tuned models.