<a href="https://colab.research.google.com/github/bhanuchaddha/Understanding-RAG/blob/main/BasicRAG.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
!pip install transformers faiss-cpu torch gradio langchain

Collecting faiss-cpu
  Downloading faiss_cpu-1.9.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (4.4 kB)
Collecting gradio
  Downloading gradio-5.0.0-py3-none-any.whl.metadata (15 kB)
Collecting langchain
  Downloading langchain-0.3.3-py3-none-any.whl.metadata (7.1 kB)
Collecting aiofiles<24.0,>=22.0 (from gradio)
  Downloading aiofiles-23.2.1-py3-none-any.whl.metadata (9.7 kB)
Collecting fastapi<1.0 (from gradio)
  Downloading fastapi-0.115.0-py3-none-any.whl.metadata (27 kB)
Collecting ffmpy (from gradio)
  Downloading ffmpy-0.4.0-py3-none-any.whl.metadata (2.9 kB)
Collecting gradio-client==1.4.0 (from gradio)
  Downloading gradio_client-1.4.0-py3-none-any.whl.metadata (7.1 kB)
Collecting httpx>=0.24.1 (from gradio)
  Downloading httpx-0.27.2-py3-none-any.whl.metadata (7.1 kB)
Collecting huggingface-hub<1.0,>=0.23.2 (from transformers)
  Downloading huggingface_hub-0.25.2-py3-none-any.whl.metadata (13 kB)
Collecting orjson~=3.0 (from gradio)
  Downloading orjso

In [3]:
import faiss
import numpy as np
from transformers import AutoTokenizer, AutoModel, pipeline

# Load the tokenizer and model for embeddings
tokenizer = AutoTokenizer.from_pretrained('bert-base-uncased')
model = AutoModel.from_pretrained('bert-base-uncased')

documents = [
    "AI is revolutionizing self-driving technology.",
    "Recent AI advancements focus on safety in autonomous vehicles.",
    "AI is helping to reduce human errors in self-driving cars.",
    "Large language models are driving advancements in machine learning."
]

def get_embeddings(texts):
    inputs = tokenizer(texts, return_tensors='pt', padding=True, truncation=True)
    outputs = model(**inputs)
    return outputs.last_hidden_state[:, 0, :].detach().numpy()

# Compute embeddings for documents
embeddings = get_embeddings(documents)

# Index the embeddings using FAISS
dimension = embeddings.shape[1]
index = faiss.IndexFlatL2(dimension)
index.add(embeddings)

# Load GPT-Neo for text generation
generator = pipeline("text-generation", model="EleutherAI/gpt-neo-1.3B")

def retrieve_and_generate(query, top_k=2):
    query_embedding = get_embeddings([query])
    distances, indices = index.search(query_embedding, top_k)

    # Retrieve top_k documents
    retrieved_docs = [documents[idx] for idx in indices[0]]

    # Combine retrieved information with the query
    prompt = query + "\n\nRelevant information:\n" + "\n".join(retrieved_docs)

    # Generate response
    generated_text = generator(prompt, max_length=100, do_sample=False)[0]['generated_text']

    return generated_text

# Code Explaination

## Imports

•	faiss: A library for efficient similarity search and clustering of dense
vectors. It’s often used to build an index of embeddings and perform fast nearest neighbor search.

•	numpy: A numerical library in Python for handling arrays and matrices. Here it’s used for array manipulation.

•	transformers (AutoTokenizer, AutoModel, pipeline): From Hugging Face, these are used to handle tokenization, load a pretrained model, and set up pipelines for text generation.

In [4]:
import gradio as gr

# Create Gradio interface
interface = gr.Interface(
    fn=retrieve_and_generate,
    inputs="text",
    outputs="text",
    title="Retrieval-Augmented Generation (RAG) Demo",
    description="Ask a question related to AI and see RAG in action. It retrieves relevant information and generates a response using GPT-Neo."
)

# Launch the Gradio app
interface.launch()

Setting queue=True in a Colab notebook requires sharing enabled. Setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://e4a8ee00b17cbf894e.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)


