<a href="https://colab.research.google.com/github/LashawnFofung/RAG-Pipelines/blob/main/Gradio/Task_Gradio_Chatbot_with_Lite_RAG_Implementation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Gradio Chatbot with Lite RAG Implementation**
<br>

### **üìë Multi-PDF Research Buddy**

*Streamlined Document Intelligence with Gradio & PyMuPDF*

<br>

This notebook provides a lightweight, local environment to upload, index, and query multiple PDF documents simultaneously. It uses a Gradio-powered UI to create a seamless research workspace directly inside Google Colab.

<br>

### **‚öôÔ∏è Key Features**

- **Multi-File Support:** Upload several research papers or reports at once.

- **Instant Indexing:** Uses `PyMuPDF` (fitz) for high-speed text extraction.

- **Intelligent Modes:** * Standard Q&A: Keyword-based sentence retrieval (RAG foundation).

  - **Summary:** Quick overview of document contents.

  - **Key Takeaways**: Automated theme identification through text analysis.

- **Minimalist UI:** A clean, slate-themed monochrome interface for focused work.

<br>

### **üß† How My Retrieval System Works**

I've designed this notebook to use what is known as a "Lite" or "Naive" RAG (Retrieval-Augmented Generation) architecture. Instead of just relying on a model's pre-existing knowledge, I am forcing the system to look at the specific documents you've uploaded.

<br>

Here is how I've broken down the logic:

- **Retrieval:** When you ask a question, I don't just guess. I use `re.split` to break your PDFs into individual sentences and run a keyword search `(search_query in s.lower())` to find the exact lines where your topic is mentioned.

- **Augmentation:** Once I find those relevant sentences, I "augment" the system's knowledge by pulling them into a dedicated `context` variable. This acts as the short-term memory for your specific query.

- **Generation:** In this current version, I am "generating" a response by returning those exact text snippets to you. This ensures 100% accuracy to the source text without any "hallucinations."

- **[!TIP] Taking it to the next level:** To turn this into a **"Full RAG"** pipeline, I would simply take that `context` variable and feed it into a generative LLM (like Gemini). Instead of showing you raw snippets, the AI would read those snippets and write a natural, cohesive answer for you.

<br>


### **How to Use**

- **Installation:** Run the first cell to install the necessary libraries (`PyMuPDF`, Gradio, etc.).

- **Launch:** Run the Main Application cell. Click the **public URL** or use the **inline frame** to view the app.

- **Analyze:** Upload your PDFs, click **"Process All Documents"**, and start chatting with your data!

# **1. Setup & Installation**

In [1]:
# Install both UI and Document Intelligence libraries
!pip install -q pymupdf gradio llama-index llama-index-readers-file google-generativeai jedi

# **2. Main Application (Frontend & Backend Logic)**

In [None]:
#Frontend & Backend Logic in single codeblock,
# so when code block run the entire app launches at once to prevent "Variable Not Defined" errors.

import gradio as gr
import time
import fitz  # PyMuPDF
import re

# --- 1. LOGIC FUNCTIONS ---

# Global variable to store combined text from all uploaded PDFs
document_memory = ""

def process_pdf(files):
    """Extracts text from multiple PDFs and merges them into document_memory."""
    global document_memory
    if not files:
        return "‚ö†Ô∏è Error: No files detected. Please upload one or more PDFs."

    document_memory = "" # Reset memory for new batch
    total_pages = 0
    file_names = []

    try:
        for file in files:
            doc = fitz.open(file.name)
            total_pages += len(doc)
            file_names.append(file.name.split('/')[-1])
            for page in doc:
                document_memory += page.get_text()

        if not document_memory.strip():
            return "‚ö†Ô∏è Warning: The uploaded files appear to be empty or image-based scans."

        return (f"‚úÖ SUCCESS: Indexed {len(files)} files ({total_pages} total pages).\n"
                f"Documents: {', '.join(file_names)}\n"
                f"Status: Ready for cross-document analysis.")
    except Exception as e:
        return f"‚ùå Error processing files: {str(e)}"

def handle_chat(message, history, mode):
    """Analyzes merged document_memory to answer questions."""
    global document_memory
    if not message:
        return "", history

    if not document_memory:
        bot_response = "I have no documents in my memory. Please upload PDFs and click 'Process'!"
    else:
        search_query = message.lower()

        if mode == "Summary":
            bot_response = f"Comprehensive Summary: Across the provided documents ({len(document_memory.split())} words), the text discusses: {document_memory[:300]}..."

        elif mode == "Key Takeaways":
            # Extract common themes (words longer than 6 chars)
            themes = list(set([w.strip(',.()').lower() for w in document_memory.split() if len(w) > 7]))
            bot_response = f"Primary themes identified across files: {', '.join(themes[:5])}."

        else:
            # Q&A Logic: Sentence retrieval
            sentences = re.split(r'(?<=[.!?]) +', document_memory)
            matching_sentences = [s.strip() for s in sentences if search_query in s.lower()]

            if matching_sentences:
                # Provide up to 3 contextually relevant sentences
                context = " ".join(matching_sentences[:3])
                bot_response = f"From the documents: \"{context}\""
            else:
                bot_response = f"I couldn't find a direct mention of '{message}' in the combined text. Try another keyword."

    history.append({"role": "user", "content": message})
    history.append({"role": "assistant", "content": bot_response})
    return "", history

# --- 2. THEME & STYLING ---

monochrome_theme = gr.themes.Soft(
    primary_hue="slate",
    secondary_hue="gray",
    neutral_hue="slate",
).set(
    button_primary_background_fill="*neutral_900",
    button_primary_text_color="white",
    block_title_text_weight="700",
)

custom_css = """
#status_box { background-color: #f8fafc; border: 1px solid #cbd5e1; }
#chat_win { border-radius: 10px; }
"""

# --- 3. UI LAYOUT ---

gr.close_all() # Reset any existing loops

with gr.Blocks(theme=monochrome_theme, css=custom_css, title="Multi-PDF Research Buddy") as demo:
    gr.Markdown("# üìë Multi-PDF Research Buddy")
    gr.Markdown("### *Analyze multiple research papers simultaneously in a minimalist workspace.*")

    with gr.Row():
        # LEFT COLUMN
        with gr.Column(scale=2, min_width=350):
            gr.Markdown("#### ‚öôÔ∏è Document Controls")
            # EXTRA FEATURE: Multiple file support
            pdf_input = gr.File(
                label="Upload PDFs",
                file_types=[".pdf"],
                file_count="multiple" # This enables multi-file upload
            )

            mode_select = gr.Dropdown(
                choices=["Standard Q&A", "Summary", "Key Takeaways"],
                value="Standard Q&A",
                label="Analysis Mode"
            )

            process_btn = gr.Button("üîÑ Process All Documents", variant="primary")

            status_output = gr.Textbox(
                label="System Status",
                lines=6,
                elem_id="status_box",
                interactive=False
            )

            exit_btn = gr.Button("üö™ Shut Down Portal", variant="stop")

        # RIGHT COLUMN
        with gr.Column(scale=3):
            gr.Markdown("#### üí¨ Chat Interface")
            chatbot = gr.Chatbot(
                label="Analysis History",
                height=500,
                type="messages",
                allow_tags=False,
                elem_id="chat_win"
            )

            user_input = gr.Textbox(placeholder="Query your documents...", label="Your Question")

            with gr.Row():
                send_btn = gr.Button("üì§ Send", variant="primary")
                clear_btn = gr.Button("üóëÔ∏è Clear Chat History")

    # --- 4. EVENT LISTENERS ---

    process_btn.click(fn=process_pdf, inputs=pdf_input, outputs=status_output)

    send_btn.click(handle_chat, inputs=[user_input, chatbot, mode_select], outputs=[user_input, chatbot])
    user_input.submit(handle_chat, inputs=[user_input, chatbot, mode_select], outputs=[user_input, chatbot])

    clear_btn.click(lambda: [], None, chatbot, queue=False)

    exit_btn.click(lambda: gr.Info("Session Terminated."), None, None).then(fn=demo.close)

# --- 5. LAUNCH ---
if __name__ == "__main__":
    demo.launch(debug=True, share=True)