# Retrieval-Augmented Generation (RAG) with Hugging Face and Gemini

This notebook demonstrates how to build a simple RAG system. RAG enhances the capabilities of Large Language Models (LLMs) by providing them with external information during the text generation process.

### How does it work?
1.  **Retrieval**: When you ask a question (a "query"), the system first searches a knowledge base (in our case, the text from a PDF you upload) to find the most relevant snippets of text.
2.  **Augmentation**: These relevant text snippets are then added to your original query to form a new, more detailed prompt.
3.  **Generation**: This augmented prompt is sent to an LLM (like Google's Gemini), which then generates an answer based on the provided context.

This process allows the LLM to answer questions about specific documents it wasn't originally trained on.

### 1. Install and Import Dependencies

In [1]:
# Install necessary libraries
!pip install -q google-generativeai pypdf sentence-transformers faiss-cpu

[2K   [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m323.9/323.9 kB[0m [31m3.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m31.4/31.4 MB[0m [31m15.0 MB/s[0m eta [36m0:00:00[0m
[?25h

### 2. Configure API Key

In [3]:
import os
import getpass
import google.generativeai as genai

# Get the API key from the user
if "GEMINI_API_KEY" not in os.environ:
    os.environ["GEMINI_API_KEY"] = getpass.getpass("Enter your Gemini API key: ")

# Configure the Gemini API
genai.configure(api_key=os.environ["GEMINI_API_KEY"])

# Initialize the generative model
model = genai.GenerativeModel('gemini-2.5-flash')
print("Gemini API configured successfully!")

Gemini API configured successfully!


### 3. Upload and Process PDF Document

In [4]:
from google.colab import files
from pypdf import PdfReader

# Upload the PDF
print("Upload your PDF")
uploaded = files.upload()

if not uploaded:
    print("No file uploaded. Please run the cell again and upload a PDF.")
else:
    # Get the filename
    pdf_filename = next(iter(uploaded))
    print(f"Uploaded PDF: {pdf_filename}")

    # Read the PDF
    reader = PdfReader(pdf_filename)
    pdf_text = ""
    for page in reader.pages:
        pdf_text += page.extract_text()

    # Split the text into chunks
    text_chunks = [pdf_text[i:i + 1000] for i in range(0, len(pdf_text), 1000)]
    print(f"Created {len(text_chunks)} text chunks.")

Upload your PDF


Saving s41598-025-89230-7.pdf to s41598-025-89230-7.pdf
Uploaded PDF: s41598-025-89230-7.pdf
Created 92 text chunks.


### 4. Create Text Embeddings

In [5]:
from sentence_transformers import SentenceTransformer

# Load a pre-trained model from Hugging Face
embedding_model = SentenceTransformer('all-MiniLM-L6-v2')

# Create embeddings for the text chunks
if 'text_chunks' in locals() and text_chunks:
    embeddings = embedding_model.encode(text_chunks)
    print(f"Created {len(embeddings)} embeddings.")
else:
    print("No text chunks to embed. Please upload a PDF first.")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md: 0.00B [00:00, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

Created 92 embeddings.


### 5. Build the Vector Store

In [6]:
import faiss
import numpy as np

# Create a FAISS index
if 'embeddings' in locals() and len(embeddings) > 0:
    d = embeddings.shape[1]  # dimension of the embeddings
    index = faiss.IndexFlatL2(d)
    index.add(np.array(embeddings).astype('float32'))
    print("FAISS index created.")
else:
    print("No embeddings to create an index from. Please create embeddings first.")

FAISS index created.


### 6. Implement the RAG Chain

In [7]:
def get_rag_answer(query, k=3):
    """
    Performs retrieval-augmented generation.

    Args:
        query: The user's question
        k: Number of relevant chunks to retrieve (default: 3)

    Returns:
        The generated answer based on retrieved context
    """
    # 1. Retrieval
    query_embedding = embedding_model.encode([query])
    distances, indices = index.search(np.array(query_embedding).astype('float32'), k)

    # Get the most relevant text chunks
    retrieved_chunks = [text_chunks[i] for i in indices[0]]

    # 2. Augmentation
    context = "\n\n".join(retrieved_chunks)
    augmented_prompt = f"""Based on the following context, answer the question.

Context:
{context}

Question: {query}

Answer:"""

    # 3. Generation
    response = model.generate_content(augmented_prompt)

    return response.text

print("RAG function defined successfully!")

RAG function defined successfully!


### 7. Ask a Question to Your Document

In [10]:
# Ask a question
user_query = "Who are you"
answer = get_rag_answer(user_query)
print(f"Question: {user_query}")
print(f"Answer: {answer}")

Question: Who are you
Answer: I am a large language model, an AI assistant. I do not have a personal identity or name. The provided context describes a research paper introducing a "hybrid framework" and "proposed model" for skin cancer diagnosis, not information about who I am.


### 8. Interactive Gradio Interface

Now let's create a user-friendly interface using Gradio where you can interactively ask questions about your uploaded PDF.

In [11]:
import gradio as gr

def chat_with_pdf(question, history):
    """
    Function to handle the chat interface.

    Args:
        question: The user's question
        history: Chat history (not used in basic version)

    Returns:
        The answer from the RAG system
    """
    if not question.strip():
        return "Please enter a question."

    try:
        answer = get_rag_answer(question)
        return answer
    except Exception as e:
        return f"Error: {str(e)}. Make sure you've uploaded and processed a PDF first!"

# Create the Gradio interface
with gr.Blocks(theme=gr.themes.Soft()) as demo:
    gr.Markdown(
        """
        # üìö RAG-Powered PDF Q&A System

        Ask questions about your uploaded PDF document. The system will retrieve relevant
        information and generate accurate answers using Gemini AI.

        **Note:** Make sure you've run all the previous cells and uploaded a PDF before using this interface.
        """
    )

    with gr.Row():
        with gr.Column():
            question_input = gr.Textbox(
                label="Your Question",
                placeholder="Ask anything about your PDF document...",
                lines=3
            )
            submit_btn = gr.Button("Ask Question", variant="primary")

        with gr.Column():
            answer_output = gr.Textbox(
                label="Answer",
                lines=10,
                placeholder="The answer will appear here..."
            )

    # Example questions
    gr.Examples(
        examples=[
            ["What is the main topic of this document?"],
            ["Can you summarize the key points?"],
            ["What are the important details mentioned?"],
        ],
        inputs=question_input
    )

    # Set up the interaction
    submit_btn.click(
        fn=lambda q: chat_with_pdf(q, None),
        inputs=question_input,
        outputs=answer_output
    )

    question_input.submit(
        fn=lambda q: chat_with_pdf(q, None),
        inputs=question_input,
        outputs=answer_output
    )

# Launch the interface
demo.launch(debug=True, share=True)

Colab notebook detected. This cell will run indefinitely so that you can see errors and logs. To turn off, set debug=False in launch().
* Running on public URL: https://88158f1f9d3810b49a.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)


Keyboard interruption in main thread... closing server.
Killing tunnel 127.0.0.1:7860 <> https://88158f1f9d3810b49a.gradio.live


