#  Lab 3: Build a RAG PDF Chatbot

Welcome to Lab 3! This is the culmination of everything we've learned in this series. Now you'll build a complete **Retrieval Augmented Generation** application that can:

-  Upload and process PDF documents
-  Convert document chunks into embeddings
-  Store vectors in Qdrant database
-  Chat with your documents using Gemini
-  Present it all in a beautiful Streamlit UI

**What you'll learn:**
- How RAG works end-to-end
- PDF text extraction and chunking
- Combining vector search with LLM generation
- Building interactive AI applications with Streamlit

**Prerequisites:**
- Completed Lab 1 (LLM basics) and Lab 2 (Vector databases)
- Google Cloud account with Vertex AI enabled
- Qdrant Cloud account(Free tear)

---


##  Understanding RAG (Retrieval Augmented Generation)

### What is RAG?

**RAG** is a technique that enhances LLM responses by providing relevant context from your own documents. Instead of relying solely on the LLM's training data, RAG retrieves specific information to answer questions accurately.

### How RAG Works (The Pipeline):

```

‚îÇ  1. INGESTION (One-time setup)                                  ‚îÇ
‚îÇ     PDF ‚Üí Extract Text ‚Üí Split into Chunks ‚Üí Create Embeddings  ‚îÇ
‚îÇ           ‚Üí Store in Vector Database                            ‚îÇ
 -----------------------------------------------------------------
                              ‚Üì

‚îÇ  2. RETRIEVAL (When user asks a question)                       ‚îÇ
‚îÇ     User Question ‚Üí Create Embedding ‚Üí Search Vector DB         ‚îÇ
‚îÇ           ‚Üí Get Top-K Similar Chunks                            ‚îÇ
 -----------------------------------------------------------------
                              ‚Üì

‚îÇ  3. GENERATION (Create the answer)                              ‚îÇ
‚îÇ     Retrieved Chunks + User Question ‚Üí Send to LLM ‚Üí Answer     ‚îÇ
 -----------------------------------------------------------------
```

### Why RAG?

|        Without RAG              |               With RAG                        |
|---------------------------------|-----------------------------------------------|
| LLM only knows training data    | LLM has access to **your specific documents** |
| May hallucinate facts           | **Grounded** in actual document content       |
| Can't answer about private data | Can answer about **any uploaded document**    |
| Generic responses               | **Specific, accurate** responses              |

---


##  Step 1: Install Required Libraries

We need the following libraries for our RAG application:
- `qdrant-client`: Vector database client
- `google-generativeai`: Google's Gemini API
- `google-cloud-aiplatform`: Vertex AI for embeddings
- `PyPDF2`: PDF text extraction
- `streamlit`: Web UI framework


In [None]:
#  Install all required packages (run once)
%pip install qdrant-client google-generativeai google-cloud-aiplatform PyPDF2 streamlit

print(" All packages installed!")
print("\n If this is your first time, restart the kernel before continuing.")


Collecting google-generativeai
  Using cached google_generativeai-0.8.6-py3-none-any.whl.metadata (3.9 kB)
Collecting PyPDF2
  Using cached pypdf2-3.0.1-py3-none-any.whl.metadata (6.8 kB)
Collecting streamlit
  Using cached streamlit-1.52.2-py3-none-any.whl.metadata (9.8 kB)
Collecting google-ai-generativelanguage==0.6.15 (from google-generativeai)
  Using cached google_ai_generativelanguage-0.6.15-py3-none-any.whl.metadata (5.7 kB)
Collecting google-api-python-client (from google-generativeai)
  Using cached google_api_python_client-2.187.0-py3-none-any.whl.metadata (7.0 kB)
Collecting protobuf>=3.20.0 (from qdrant-client)
  Using cached protobuf-5.29.5-cp310-abi3-win_amd64.whl.metadata (592 bytes)
Collecting altair!=5.4.0,!=5.4.1,<7,>=4.0 (from streamlit)
  Using cached altair-6.0.0-py3-none-any.whl.metadata (11 kB)
Collecting blinker<2,>=1.5.0 (from streamlit)
  Using cached blinker-1.9.0-py3-none-any.whl.metadata (1.6 kB)
Collecting click<9,>=7.0 (from streamlit)
  Using cached cli

  You can safely remove it manually.

[notice] A new release of pip is available: 24.0 -> 25.3
[notice] To update, run: C:\Users\asggm\AppData\Local\Microsoft\WindowsApps\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\python.exe -m pip install --upgrade pip


##  Step 2: Import Libraries

Let's import all the libraries we need. Each serves a specific purpose in our RAG pipeline.


In [2]:
# Standard library imports
import os
import uuid

# Google AI imports
import google.generativeai as genai  # Gemini for text generation
import vertexai                       # Vertex AI platform
from vertexai.language_models import TextEmbeddingModel  # For creating embeddings

# Qdrant imports
from qdrant_client import QdrantClient
from qdrant_client.http import models

# PDF processing
from PyPDF2 import PdfReader

# UI (will be used in the Streamlit app)
# import streamlit as st  # Uncomment when running as Streamlit app

print(" All libraries imported successfully!")


 All libraries imported successfully!


---

##  Step 3: Configure Services

We need to set up connections to:
1. **Google AI (Gemini)** - For text generation
2. **Vertex AI** - For embeddings
3. **Qdrant** - For vector storage

### Configure Google Generative AI (Gemini)

Google's Gemini models are accessed via Application Default Credentials (ADC) when running in Google Cloud. For local development, you may need to set up authentication.


In [31]:
# 
#   GOOGLE CLOUD CONFIGURATION
# ============================================

# Option 1: If running in Google Cloud (Vertex AI Workbench), ADC works automatically
# Option 2: For local development, set your API key below

# For local development, get an API key from: https://makersuite.google.com/app/apikey
GOOGLE_API_KEY = ""  # Leave empty if using ADC in Google Cloud

if GOOGLE_API_KEY:
    genai.configure(api_key=GOOGLE_API_KEY)
    print(" Configured with API key")
else:
    genai.configure()  # Uses Application Default Credentials
    print(" Configured with Application Default Credentials (ADC)")

# Initialize Vertex AI for embeddings
# Replace with your Google Cloud Project ID if not using environment variable
PROJECT_ID = os.environ.get("GOOGLE_CLOUD_PROJECT", "test_project")
vertexai.init(project=PROJECT_ID, location="us-central1")
print(f" Vertex AI initialized for project: {PROJECT_ID}")


 Configured with Application Default Credentials (ADC)
 Vertex AI initialized for project: test_project


### Configure Qdrant Connection

Enter your Qdrant Cloud credentials (you can use same credentails as Lab 2).


In [None]:
#   QDRANT CONFIGURATION
# ============================================

# Your Qdrant API Key (from https://cloud.qdrant.io)
QDRANT_API_KEY = ""  # Paste your API key here

# Your Qdrant Cluster URL (make sure port is 6333!)
QDRANT_URL = ""  # Example: "https://xxx.us-east4-0.gcp.cloud.qdrant.io:6333"

# ============================================

# Validate Qdrant credentials
if not QDRANT_API_KEY or not QDRANT_URL:
    print(" ERROR: Missing Qdrant credentials!")
    print("   Please fill in QDRANT_API_KEY and QDRANT_URL above.")
else:
    print(" Qdrant credentials configured")
    print(f" URL: {QDRANT_URL[:50]}..." if len(QDRANT_URL) > 50 else f"üîó URL: {QDRANT_URL}")


 Qdrant credentials configured
 URL: https://d78b1147-cde0-4b94-aa1e-9b6b2278050c.us-ea...


### Configure LLM Settings

These settings control how the Gemini model generates responses.

**Parameters explained:**
- **MODEL_NAME**: Which Gemini model to use
- **TEMPERATURE**: Controls randomness (0 = deterministic, 1 = creative)
- **TOP_P**: Controls diversity of responses
- **MAX_OUTPUT**: Maximum tokens in the response


In [8]:

#  LLM CONFIGURATION
# =====================

# Model options: "gemini-2.0-flash-001", "gemini-1.5-flash", "gemini-1.5-pro"
MODEL_NAME = "gemini-2.0-flash-001"

# Temperature: 0.0 (factual) to 1.0 (creative)
TEMPERATURE = 0.7

# Top-P: Controls diversity (0.1 to 1.0)
TOP_P = 0.9

# Maximum output tokens
MAX_OUTPUT = 8192

# Default collection name for PDFs
DEFAULT_COLLECTION = "pdfs_collection"

# ============================================

print(f" Model: {MODEL_NAME}")
print(f"  Temperature: {TEMPERATURE}")
print(f"  Top-P: {TOP_P}")
print(f" Max Output Tokens: {MAX_OUTPUT}")


 Model: gemini-2.0-flash-001
  Temperature: 0.7
  Top-P: 0.9
 Max Output Tokens: 8192


---

##  Step 4: Define Core RAG Functions

Now we'll create the core functions that power our RAG application. These handle:
1. **Qdrant connection** - Initialize database client
2. **Collection management** - Create/recreate vector collections
3. **PDF ingestion** - Extract text, chunk it, create embeddings, store in Qdrant
4. **Response generation** - Retrieve context and generate answers

### 4.1 Qdrant Helper Functions


In [10]:
def init_qdrant(qdrant_url: str, qdrant_api_key: str) -> QdrantClient:
    """
    Initialize and return a Qdrant client.
    
    Args:
        qdrant_url: Your Qdrant cluster URL
        qdrant_api_key: Your Qdrant API key
    
    Returns:
        QdrantClient instance
    """
    return QdrantClient(url=qdrant_url, api_key=qdrant_api_key, timeout=30)


def create_qdrant_collection(collection_name: str, qdrant_url: str, qdrant_api_key: str, 
                             vector_size: int = 768, distance: str = "Cosine"):
    """
    Create (or recreate) a Qdrant collection for storing document embeddings.
    
    Args:
        collection_name: Name for the collection
        qdrant_url: Qdrant cluster URL
        qdrant_api_key: Qdrant API key
        vector_size: Dimension of embedding vectors (768 for gemini-embedding-001)
        distance: Distance metric for similarity (Cosine, Euclidean, or Dot)
    """
    client = init_qdrant(qdrant_url, qdrant_api_key)
    
    # Try to recreate (deletes if exists, then creates)
    try:
        client.recreate_collection(
            collection_name=collection_name,
            vectors_config=models.VectorParams(size=vector_size, distance=distance)
        )
    except AttributeError:
        # Fallback for older Qdrant versions
        try:
            client.delete_collection(collection_name=collection_name)
        except:
            pass  # Collection didn't exist
        client.create_collection(
            collection_name=collection_name,
            vectors_config=models.VectorParams(size=vector_size, distance=distance)
        )
    
    print(f" Collection '{collection_name}' created successfully!")

print(" Qdrant helper functions defined")


 Qdrant helper functions defined


### 4.2 PDF Ingestion Function

This is the **Ingestion** phase of RAG. The function:
1. Reads PDF files and extracts text
2. Splits text into smaller chunks (for better retrieval)
3. Creates embeddings for each chunk using Google's embedding model
4. Stores chunks and embeddings in Qdrant


In [11]:
def ingest_pdfs_to_qdrant(pdf_files, collection_name: str, qdrant_url: str, 
                          qdrant_api_key: str, chunk_size: int = 500):
    """
    Ingest PDF files into Qdrant vector database.
    
    This function:
    1. Extracts text from each PDF
    2. Splits text into chunks of specified size
    3. Creates embeddings using Google's gemini-embedding-001 model
    4. Stores chunks with embeddings in Qdrant
    
    Args:
        pdf_files: List of PDF file paths or file objects
        collection_name: Name of the Qdrant collection
        qdrant_url: Qdrant cluster URL
        qdrant_api_key: Qdrant API key
        chunk_size: Number of characters per chunk (default: 500)
    """
    client = init_qdrant(qdrant_url, qdrant_api_key)
    
    # Initialize the embedding model
    embedding_model = TextEmbeddingModel.from_pretrained("text-embedding-004")
    
    for pdf_file in pdf_files:
        print(f" Processing: {pdf_file if isinstance(pdf_file, str) else pdf_file.name}")
        
        # Extract text from PDF
        reader = PdfReader(pdf_file)
        full_text = "".join(page.extract_text() or "" for page in reader.pages)
        print(f"    Extracted {len(full_text)} characters")
        
        # Split into chunks
        chunks = [full_text[i:i + chunk_size] for i in range(0, len(full_text), chunk_size)]
        print(f"    Created {len(chunks)} chunks")
        
        # Create embeddings for all chunks
        embeddings = embedding_model.get_embeddings(chunks)
        
        # Create points for Qdrant
        points = []
        for emb, chunk in zip(embeddings, chunks):
            points.append(
                models.PointStruct(
                    id=str(uuid.uuid4()),  # Unique ID for each chunk
                    vector=emb.values,      # The embedding vector
                    payload={"text": chunk} # Store the original text
                )
            )
        
        # Upsert points to Qdrant
        client.upsert(collection_name=collection_name, points=points)
        print(f"    Uploaded {len(points)} chunks to Qdrant")

print(" PDF ingestion function defined")


 PDF ingestion function defined


### 4.3 Response Generation Function

This is the **Retrieval + Generation** phase. The function:
1. Takes the user's question
2. Searches Qdrant for relevant document chunks
3. Combines the context with the question
4. Sends to Gemini to generate an answer


In [21]:
def get_bot_response(messages: list, model_name: str, temperature: float, 
                     top_p: float, max_output: int, collection_name: str,
                     qdrant_url: str, qdrant_api_key: str, k: int = 3) -> str:
    """
    Generate a response using RAG: retrieve relevant context, then generate answer.
    
    Args:
        messages: List of conversation messages [{"role": "user/assistant", "content": "..."}]
        model_name: Gemini model to use
        temperature: Response randomness (0-1)
        top_p: Response diversity (0-1)
        max_output: Maximum output tokens
        collection_name: Qdrant collection with document embeddings (None to skip RAG)
        qdrant_url: Qdrant cluster URL
        qdrant_api_key: Qdrant API key
        k: Number of chunks to retrieve (default: 3)
    
    Returns:
        Generated response text
    """
    # Build conversation history for multi-turn conversations
    history = ""
    for msg in messages:
        speaker = "User" if msg["role"] == "user" else "Assistant"
        history += f"{speaker}: {msg['content']}\n"
    
    # RETRIEVAL: Get relevant context from Qdrant
    context = ""
    if collection_name and qdrant_url and qdrant_api_key:
        # Create embedding for the user's question
        embedding_model = TextEmbeddingModel.from_pretrained("text-embedding-004")
        query_embedding = embedding_model.get_embeddings([messages[-1]["content"]])[0].values
        
        # Search Qdrant for similar chunks
        client = init_qdrant(qdrant_url, qdrant_api_key)
        search_result = client.query_points(
            collection_name=collection_name,
            query=query_embedding,
            limit=k
        )
        hits = search_result.points
        
        # Combine retrieved chunks into context
        context = "\n\n".join(hit.payload.get("text", "") for hit in hits)
        print(f" Retrieved {len(hits)} relevant chunks from documents")
    
    # GENERATION: Create the prompt and generate response
    if context:
        prompt = f"""You are a helpful assistant. Use the following context from documents to answer the user's question. If the context doesn't contain relevant information, say so.

Context from documents:
{context}

Conversation:
{history}
Assistant:"""
    else:
        prompt = f"""You are a helpful assistant.

Conversation:
{history}
Assistant:"""
    
    # Call Gemini
    model = genai.GenerativeModel(
        model_name=model_name,
        generation_config={
            "temperature": temperature,
            "max_output_tokens": max_output,
            "top_p": top_p
        }
    )
    response = model.generate_content(prompt)
    return response.text

print(" Response generation function defined")


 Response generation function defined


---

##  Step 5: Test the RAG Pipeline (Notebook Version)

Before running the full Streamlit app, let's test each component in the notebook.

### 5.1 Test: Ingest a PDF

First, let's ingest a sample PDF. You can use any PDF file you have available.


In [13]:
# Test PDF Ingestion
# Replace with the path to your PDF file

PDF_PATH = "data/your_document.pdf"  # Update this path!

# Check if credentials are set
if not QDRANT_API_KEY or not QDRANT_URL:
    print(" Please set your Qdrant credentials in Step 3 first!")
else:
    # Uncomment the lines below when you have a PDF to test
    # print(" Creating collection...")
    # create_qdrant_collection(
    #     collection_name=DEFAULT_COLLECTION,
    #     qdrant_url=QDRANT_URL,
    #     qdrant_api_key=QDRANT_API_KEY,
    #     vector_size=768  # gemini-embedding-001 dimension
    # )
    # 
    # print("\nüì§ Ingesting PDF...")
    # ingest_pdfs_to_qdrant(
    #     pdf_files=[PDF_PATH],
    #     collection_name=DEFAULT_COLLECTION,
    #     qdrant_url=QDRANT_URL,
    #     qdrant_api_key=QDRANT_API_KEY,
    #     chunk_size=500
    # )
    # print("\n PDF ingestion complete!")
    
    print(" Uncomment the code above and update PDF_PATH to test ingestion")


 Uncomment the code above and update PDF_PATH to test ingestion


### 5.2 Test: Ask a Question

After ingesting a PDF, test the RAG pipeline by asking a question about the document.


In [14]:
# Test asking a question (after ingesting a PDF)

# Your test question
TEST_QUESTION = "What is this document about?"

# Create a simple message history
messages = [{"role": "user", "content": TEST_QUESTION}]

# Check if we can run the test
if not QDRANT_API_KEY or not QDRANT_URL:
    print(" Please set your Qdrant credentials first!")
else:
    print(f" Question: {TEST_QUESTION}")
    print("-" * 50)
    
    # Uncomment below after ingesting a PDF
    # response = get_bot_response(
    #     messages=messages,
    #     model_name=MODEL_NAME,
    #     temperature=TEMPERATURE,
    #     top_p=TOP_P,
    #     max_output=MAX_OUTPUT,
    #     collection_name=DEFAULT_COLLECTION,
    #     qdrant_url=QDRANT_URL,
    #     qdrant_api_key=QDRANT_API_KEY,
    #     k=3  # Retrieve top 3 chunks
    # )
    # print(f"\n Answer:\n{response}")
    
    print(" Uncomment the code above after ingesting a PDF to test Q&A")


 Question: What is this document about?
--------------------------------------------------
 Uncomment the code above after ingesting a PDF to test Q&A




## Step 6: Launch the Streamlit Chatbot UI

The code below creates a beautiful web UI for your RAG chatbot using service called Streamlit.

Dont worry about the Streamlit code for now, its just an meant to be an easy way to launch our UI and work with the techniques we've learned above.

### How to Run (3 Easy Steps):

1. **Step 6.part1:** Run the first cell below to create the `rag_chatbot_app.py` file
2. **Step 6.part2:** Run the second cell to launch the Streamlit app
3. **Open the URL** that appears (usually `http://localhost:8501`)

### The UI Features:
-  PDF upload functionality  
-  Interactive chat interface
-  Multi-turn conversation support
-  Real-time RAG responses

> ** Important**: Make sure you've filled in your `QDRANT_API_KEY` and `QDRANT_URL` in Step 3 before running!


In [None]:
#  Step 6-part1: Create the Streamlit app file
# This cell writes a complete Python file with all dependencies

# First, let's save the credentials to include in the file
print(" Creating rag_chatbot_app.py...")
print(f"   Using QDRANT_URL: {QDRANT_URL[:40]}..." if QDRANT_URL else "   ‚ö†Ô∏è QDRANT_URL is empty!")
print(f"   Using MODEL: {MODEL_NAME}")

# Write the file with the current configuration
app_code = f'''# RAG PDF Chatbot - Streamlit Application
# Generated from Lab 3 Notebook

import os
import uuid
import streamlit as st
import google.generativeai as genai
import vertexai
from vertexai.language_models import TextEmbeddingModel
from qdrant_client import QdrantClient
from qdrant_client.http import models
from PyPDF2 import PdfReader

# ============================================
# CONFIGURATION (from notebook)
# ============================================
QDRANT_API_KEY = "{QDRANT_API_KEY}"
QDRANT_URL = "{QDRANT_URL}"
MODEL_NAME = "{MODEL_NAME}"
TEMPERATURE = {TEMPERATURE}
TOP_P = {TOP_P}
MAX_OUTPUT = {MAX_OUTPUT}
DEFAULT_COLLECTION = "{DEFAULT_COLLECTION}"

# Initialize Google AI
GOOGLE_API_KEY = "{GOOGLE_API_KEY if 'GOOGLE_API_KEY' in dir() else ''}"
if GOOGLE_API_KEY:
    genai.configure(api_key=GOOGLE_API_KEY)
else:
    genai.configure()

# Initialize Vertex AI
PROJECT_ID = os.environ.get("GOOGLE_CLOUD_PROJECT", "{PROJECT_ID if 'PROJECT_ID' in dir() else 'your-project-id'}")
vertexai.init(project=PROJECT_ID, location="us-central1")

# ============================================
# RAG FUNCTIONS
# ============================================
def init_qdrant(qdrant_url, qdrant_api_key):
    return QdrantClient(url=qdrant_url, api_key=qdrant_api_key, timeout=30)

def create_qdrant_collection(collection_name, qdrant_url, qdrant_api_key, vector_size=768, distance="Cosine"):
    client = init_qdrant(qdrant_url, qdrant_api_key)
    try:
        client.recreate_collection(
            collection_name=collection_name,
            vectors_config=models.VectorParams(size=vector_size, distance=distance)
        )
    except:
        try:
            client.delete_collection(collection_name=collection_name)
        except:
            pass
        client.create_collection(
            collection_name=collection_name,
            vectors_config=models.VectorParams(size=vector_size, distance=distance)
        )

def ingest_pdfs_to_qdrant(pdf_files, collection_name, qdrant_url, qdrant_api_key, chunk_size=500):
    client = init_qdrant(qdrant_url, qdrant_api_key)
    embedding_model = TextEmbeddingModel.from_pretrained("text-embedding-004")
    
    for pdf_file in pdf_files:
        reader = PdfReader(pdf_file)
        full_text = "".join(page.extract_text() or "" for page in reader.pages)
        chunks = [full_text[i:i + chunk_size] for i in range(0, len(full_text), chunk_size)]
        embeddings = embedding_model.get_embeddings(chunks)
        
        points = []
        for emb, chunk in zip(embeddings, chunks):
            points.append(models.PointStruct(
                id=str(uuid.uuid4()),
                vector=emb.values,
                payload={{"text": chunk}}
            ))
        client.upsert(collection_name=collection_name, points=points)

def get_bot_response(messages, model_name, temperature, top_p, max_output,
                     collection_name, qdrant_url, qdrant_api_key, k=3):
    history = ""
    for msg in messages:
        speaker = "User" if msg["role"] == "user" else "Assistant"
        history += f"{{speaker}}: {{msg['content']}}\\n"
    
    context = ""
    if collection_name and qdrant_url and qdrant_api_key:
        embedding_model = TextEmbeddingModel.from_pretrained("text-embedding-004")
        query_embedding = embedding_model.get_embeddings([messages[-1]["content"]])[0].values
        client = init_qdrant(qdrant_url, qdrant_api_key)
        search_result = client.query_points(collection_name=collection_name, query=query_embedding, limit=k)
        hits = search_result.points
        context = "\\n\\n".join(hit.payload.get("text", "") for hit in hits)
    
    if context:
        prompt = f"""You are a helpful assistant. Use the following context to answer questions.

Context from documents:
{{context}}

Conversation:
{{history}}
Assistant:"""
    else:
        prompt = f"""You are a helpful assistant.

Conversation:
{{history}}
Assistant:"""
    
    model = genai.GenerativeModel(
        model_name=model_name,
        generation_config={{"temperature": temperature, "max_output_tokens": max_output, "top_p": top_p}}
    )
    response = model.generate_content(prompt)
    return response.text

# ============================================
# STREAMLIT UI
# ============================================
st.set_page_config(page_title="My PDF RAG Chatbot", layout="wide", page_icon="")
st.title(" Chat with My PDFs")

if "messages" not in st.session_state:
    st.session_state.messages = []
if "qdrant_collection" not in st.session_state:
    st.session_state.qdrant_collection = None

uploaded_files = st.file_uploader(
    " Upload PDFs (optional ‚Äî chat works even without PDFs)",
    type=["pdf"],
    accept_multiple_files=True,
)

if uploaded_files and not st.session_state.qdrant_collection:
    with st.spinner("Creating Qdrant collection & ingesting PDFs‚Ä¶"):
        create_qdrant_collection(
            collection_name=DEFAULT_COLLECTION,
            qdrant_url=QDRANT_URL,
            qdrant_api_key=QDRANT_API_KEY,
        )
        ingest_pdfs_to_qdrant(
            pdf_files=uploaded_files,
            collection_name=DEFAULT_COLLECTION,
            qdrant_url=QDRANT_URL,
            qdrant_api_key=QDRANT_API_KEY,
        )
        st.session_state.qdrant_collection = DEFAULT_COLLECTION
        st.success(" PDFs ingested! Future replies will include RAG context.")

for msg in st.session_state.messages:
    with st.chat_message(msg["role"]):
        st.markdown(msg["content"])

if user_input := st.chat_input("Type your question‚Ä¶"):
    st.session_state.messages.append({{"role": "user", "content": user_input}})
    st.chat_message("user").markdown(user_input)

    with st.chat_message("assistant"):
        with st.spinner("Thinking‚Ä¶"):
            reply = get_bot_response(
                messages=st.session_state.messages,
                model_name=MODEL_NAME,
                temperature=TEMPERATURE,
                top_p=TOP_P,
                max_output=MAX_OUTPUT,
                collection_name=st.session_state.qdrant_collection,
                qdrant_url=QDRANT_URL,
                qdrant_api_key=QDRANT_API_KEY,
            )
            st.markdown(reply)

    st.session_state.messages.append({{"role": "assistant", "content": reply}})
'''

# Write the file
with open("rag_chatbot_app.py", "w", encoding="utf-8") as f:
    f.write(app_code)

print("\n File created: rag_chatbot_app.py")
print("\n Now run the next cell to launch the app!")


 Creating rag_chatbot_app.py...
   Using QDRANT_URL: https://d78b1147-cde0-4b94-aa1e-9b6b2278...
   Using MODEL: gemini-2.0-flash-001

 File created: rag_chatbot_app.py

 Now run the next cell to launch the app!


In [None]:
# Step 6-part2: Launch the Streamlit App
# Run this cell to start the chatbot server

import subprocess
import sys

print(" Launching Streamlit app...")
print("   This will open a new browser tab automatically!")
print("\n To stop the app: Click 'Interrupt Kernel' or press Ctrl+C")
print("-" * 50)

# Run streamlit using Python module syntax (more reliable)
!{sys.executable} -m streamlit run rag_chatbot_app.py


 Launching Streamlit app...
   This will open a new browser tab automatically!

 To stop the app: Click 'Interrupt Kernel' or press Ctrl+C
--------------------------------------------------
^C


### Example: Working RAG Chatbot

Here's what the chatbot looks like in action! This example shows a user uploading the famous "Attention Is All You Need" paper and asking questions about it:

![RAG Chatbot Demo](images/streamlit_demo.png)

**What's happening:**
1.  We uploaded "Attention Is All You Need.pdf"
2.  We asked: "What is this document about?"
3.  The chatbot retrieved relevant chunks from the PDF and generated an accurate response about the Transformer architecture paper!


---

## üß™ Lab Exercises

Now it's your turn! Complete these exercises to deepen your understanding.

### Lab Exercise 1: Experiment with Chunk Size

The `chunk_size` parameter affects how text is split. Smaller chunks = more precise retrieval but less context. Larger chunks = more context but might include irrelevant information.

**Task**: Modify the ingestion function to use different chunk sizes and observe how it affects retrieval quality.


In [None]:
#  Lab Exercise 1: Experiment with Chunk Sizes
# Try ingesting the same PDF with different chunk sizes

CHUNK_SIZES_TO_TEST = [250, 500, 1000]


### Lab Exercise 2: Adjust Top-K Retrieved Chunks

The `k` parameter in `get_bot_response` controls how many chunks are retrieved. More chunks = more context but potentially more noise.

**Task**: Test the same question with k=1, k=3, and k=5. How does the response quality change?


In [26]:
#  Lab Exercise 2: Test Different k Values
# How many chunks should we retrieve?

K_VALUES_TO_TEST = [1, 3, 5]
TEST_QUESTION_EX2 = "What are the main topics covered in this document?"


### Lab Exercise 3: Customize the System Prompt

The current prompt in `get_bot_response` is generic. Customize it for a specific use case!

**Task**: Modify the prompt to make the chatbot act as a specific persona (e.g., a technical support agent, a research assistant, etc.)


In [30]:
# Lab Exercise 3: Custom System Prompt
# Create a specialized chatbot persona


---

## üéâ Conclusion

Congratulations! You've built a complete **RAG (Retrieval Augmented Generation)** application!

### What You've Learned:

| Concept | What It Does |
|---------|--------------|
| **RAG Pipeline** | Combines retrieval + generation for accurate answers |
| **PDF Processing** | Extract and chunk text from documents |
| **Embeddings** | Convert text to vectors for similarity search |
| **Vector Search** | Find relevant document chunks |
| **Prompt Engineering** | Combine context + question for LLM |
| **Streamlit** | Build interactive web applications |

### Key Parameters to Tune:

| Parameter | Effect | Typical Range |
|-----------|--------|---------------|
| `chunk_size` | How much context per chunk | 250-1000 characters |
| `k` (top-k) | How many chunks to retrieve | 1-5 chunks |
| `temperature` | Response creativity | 0.0-1.0 |
| `vector_size` | Embedding dimensions | Model-dependent |

### Next Steps:
1.  Try with your own PDF documents
2.  Customize the Streamlit UI
3.  Experiment with different embedding models
4.  Add metadata filtering (like in Lab 2)
5.  Deploy your chatbot to the cloud!

Note to see how your Lab Answers compare, go to the Solutions Folder to see our answers!

### Additional Resources:
- [Qdrant Documentation](https://qdrant.tech/documentation/)
- [Google Gemini API](https://ai.google.dev/docs)
- [Streamlit Documentation](https://docs.streamlit.io/)
- [RAG Best Practices](https://www.pinecone.io/learn/retrieval-augmented-generation/)

---

** You've completed all 3 labs! You now have hands-on experience with:**
-  **Lab 1**: LLM prompting and structured output
-  **Lab 2**: Vector databases and similarity search
-  **Lab 3**: Full RAG application with UI

Happy building!!! 
