#**Chat with PDF**
###**Steps:**
- Upload your PDF and ask question related to PDF from Gradio UI
- Extract text from PDF
- Create chunks from the text
- Apply embeddings to chunks and question
- **Retrieve** relevent chunk based on the question using cosine similarity
- Apply that **chunk to the LLM** for response generation
- **Generate** Response


###**Install Required Packages**

In [1]:
!pip install gradio openai PyPDF2 scikit-learn

Collecting gradio
  Downloading gradio-5.31.0-py3-none-any.whl.metadata (16 kB)
Collecting PyPDF2
  Downloading pypdf2-3.0.1-py3-none-any.whl.metadata (6.8 kB)
Collecting aiofiles<25.0,>=22.0 (from gradio)
  Downloading aiofiles-24.1.0-py3-none-any.whl.metadata (10 kB)
Collecting fastapi<1.0,>=0.115.2 (from gradio)
  Downloading fastapi-0.115.12-py3-none-any.whl.metadata (27 kB)
Collecting ffmpy (from gradio)
  Downloading ffmpy-0.5.0-py3-none-any.whl.metadata (3.0 kB)
Collecting gradio-client==1.10.1 (from gradio)
  Downloading gradio_client-1.10.1-py3-none-any.whl.metadata (7.1 kB)
Collecting groovy~=0.1 (from gradio)
  Downloading groovy-0.1.2-py3-none-any.whl.metadata (6.1 kB)
Collecting pydub (from gradio)
  Downloading pydub-0.25.1-py2.py3-none-any.whl.metadata (1.4 kB)
Collecting python-multipart>=0.0.18 (from gradio)
  Downloading python_multipart-0.0.20-py3-none-any.whl.metadata (1.8 kB)
Collecting ruff>=0.9.3 (from gradio)
  Downloading ruff-0.11.11-py3-none-manylinux_2_17_x8

###**Import Required Libraries**

In [2]:
import gradio as gr
from openai import OpenAI
from PyPDF2 import PdfReader
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity

###**Retrive API key from Secrets and Set as an ENV**

In [3]:
# Retrieve the API key from Colab's secrets
from google.colab import userdata
OPENAI_API_KEY = userdata.get('OPENAI_API_KEY')

In [4]:
# Set OPENAI_API_KEY as an ENV
import os
os.environ['OPENAI_API_KEY'] = OPENAI_API_KEY

In [5]:
client=OpenAI()

###**Function to Extract Text from PDF**

In [6]:
def extract_pdf_text(pdf_file):
    reader = PdfReader(pdf_file)
    text = ""
    for page in reader.pages:
        text += page.extract_text()
    return text

###**Function to Split Text into Chunks**
Why Chunking Is Required in RAG?
🔹 1. LLMs Have Token Limits
Large Language Models (like GPT-4 or Mistral) have a context window limit (e.g., 4k–32k tokens).

You can’t pass an entire textbook or corpus into the model at once.

✅ Chunking breaks long documents into manageable pieces that can fit into the model’s context window.

🔹 2. Improves Precision in Retrieval
If your chunks are too large, irrelevant information gets retrieved.

In [7]:
def split_text(text, chunk_size=500, overlap=50):
    chunks = []
    for i in range(0, len(text), chunk_size - overlap):
        chunks.append(text[i:i + chunk_size])
    return chunks

###**Function for Embeddings**

In [8]:
def get_embeddings(text):
    response = client.embeddings.create(
        input=text,
        model="text-embedding-3-small"
    )
    return np.array(response.data[0].embedding)

###**Function to Retrieve Relevant Chunks**

In [9]:
def retrieve_relevant_chunks(query, chunks, chunk_embeddings):
    query_embedding = get_embeddings(query)
    similarities = cosine_similarity([query_embedding], chunk_embeddings)[0]
    top_indices = np.argsort(similarities)[::-1][:3]  # Get top 3 relevant chunks
    return [chunks[i] for i in top_indices]   #we use the chunks list to retrieve the actual text of those relevant chunks. These text chunks will form the context.

###**Function to Generate a Response**

In [10]:
def generate_response(context, query):
    messages = [
        {"role": "system", "content": "You are an assistant that answers questions based on the provided context in 30 words."},
        {"role": "user", "content": f"Context: {context}\n\nQuestion: {query}\n\nAnswer:"}
    ]
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=messages,
        max_tokens=300
    )
    return response.choices[0].message.content

###**Global Variables to Store Chunks and Embeddings**

In [14]:
chunks = []
chunk_embeddings = []

###**Gradio Interface Functions**

In [11]:
def process_pdf(pdf_file):
    global chunks, chunk_embeddings
    text = extract_pdf_text(pdf_file)
    chunks = split_text(text)
    chunk_embeddings = [get_embeddings(chunk) for chunk in chunks]
    return "PDF processed successfully! You can now chat with it."

In [12]:
def chat_with_pdf(query):
    global chunks, chunk_embeddings
    if not chunks or not chunk_embeddings:
        return "Please upload and process a PDF first."
    relevant_chunks = retrieve_relevant_chunks(query, chunks, chunk_embeddings)
    context = "\n".join(relevant_chunks)
    return generate_response(context, query)

###**Gradio Interface**

In [13]:
# Gradio app
with gr.Blocks() as app:
    gr.Markdown("# Chat with Your PDF 📄🤖")
    pdf_file = gr.File(label="Upload PDF", file_types=[".pdf"])
    process_button = gr.Button("Process PDF")
    process_status = gr.Textbox(label="Status", interactive=False)

    query = gr.Textbox(label="Ask a Question")
    chat_button = gr.Button("Chat with PDF")
    response = gr.Textbox(label="Response", interactive=False)

    process_button.click(process_pdf, inputs=pdf_file, outputs=process_status)
    chat_button.click(chat_with_pdf, inputs=query, outputs=response)

app.launch()

It looks like you are running Gradio on a hosted a Jupyter notebook. For the Gradio app to work, sharing must be enabled. Automatically setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://50be4d542b77f9fa31.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)


