<a href="https://colab.research.google.com/github/Qasim-Gill/rag-pdf-upload-chatbot/blob/main/rag_practice.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
# Install necessary libraries
# create app.py also then run this code and paste all code except pip libraries
!pip install -q pymupdf sentence-transformers faiss-cpu transformers langchain pdfplumber streamlit


In [None]:
import fitz  # PyMuPDF (correct way to import)
import faiss
import numpy as np
import pdfplumber
import streamlit as st
from sentence_transformers import SentenceTransformer
from langchain.text_splitter import RecursiveCharacterTextSplitter
from transformers import T5ForConditionalGeneration, T5Tokenizer

# Load Models
embed_model = SentenceTransformer("all-MiniLM-L6-v2")  # Embedding Model
model_name = "google/flan-t5-large"  # LLM Model
tokenizer = T5Tokenizer.from_pretrained(model_name)
model = T5ForConditionalGeneration.from_pretrained(model_name)

# Streamlit UI
st.title("📄 PDF Chatbot with RAG")
st.sidebar.header("Upload a PDF")
uploaded_file = st.sidebar.file_uploader("Choose a PDF file", type="pdf")

if uploaded_file:
    # Extract Text from PDF
    def extract_text_from_pdf(file):
        doc = fitz.open(stream=uploaded_file.read(), filetype="pdf")
        return "\n".join([page.get_text() for page in doc])

    pdf_text = extract_text_from_pdf(uploaded_file)

    # Chunking Text
    def chunk_text(text, chunk_size=512, overlap=50):
        splitter = RecursiveCharacterTextSplitter(chunk_size=chunk_size, chunk_overlap=overlap)
        return splitter.split_text(text)

    chunks = chunk_text(pdf_text)

    # Generate Embeddings
    embeddings = np.array([embed_model.encode(chunk) for chunk in chunks], dtype=np.float32)

    # Store embeddings in FAISS
    dimension = embeddings.shape[1]  # Get embedding size
    index = faiss.IndexFlatL2(dimension)  # Create FAISS index
    index.add(embeddings)  # Add vectors to index

    # Retrieve Relevant Chunks
    def retrieve_relevant_chunks(query, top_k=3):
        query_embedding = embed_model.encode(query).reshape(1, -1).astype(np.float32)
        _, indices = index.search(query_embedding, top_k)
        return [chunks[i] for i in indices[0]]

    # Generate Answer with FLAN-T5
    def generate_answer(question, context):
        prompt = f"Context: {context}\nQuestion: {question}\nAnswer:"
        inputs = tokenizer(prompt, return_tensors="pt", truncation=True, max_length=512)
        outputs = model.generate(**inputs, max_length=256, do_sample=True, top_p=0.9, temperature=0.7)
        return tokenizer.decode(outputs[0], skip_special_tokens=True)

    # User Input
    user_question = st.text_input("Ask a question:")
    if st.button("Get Answer") and user_question:
        retrieved_chunks = retrieve_relevant_chunks(user_question)
        context = " ".join(retrieved_chunks)  # Combine relevant chunks
        answer = generate_answer(user_question, context)
        st.write("**Answer:**", answer)


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.7k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

1_Pooling%2Fconfig.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/2.54k [00:00<?, ?B/s]

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/2.20k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/2.42M [00:00<?, ?B/s]

You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565


config.json:   0%|          | 0.00/662 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/3.13G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

2025-02-12 13:28:58.857 
  command:

    streamlit run /usr/local/lib/python3.11/dist-packages/colab_kernel_launcher.py [ARGUMENTS]


In [None]:
!wget -qO- ipv4.icanhazip.com

35.240.175.116


In [None]:
!streamlit run app.py & npx localtunnel --port 8501


Collecting usage statistics. To deactivate, set browser.gatherUsageStats to false.
[0m
[0m
[34m[1m  You can now view your Streamlit app in your browser.[0m
[0m
[34m  Local URL: [0m[1mhttp://localhost:8501[0m
[34m  Network URL: [0m[1mhttp://172.28.0.12:8501[0m
[34m  External URL: [0m[1mhttp://35.240.175.116:8501[0m
[0m
[1G[0K⠙[1G[0K⠹[1G[0K⠸[1G[0K⠼[1G[0K⠴[1G[0K[1G[0JNeed to install the following packages:
localtunnel@2.0.2
Ok to proceed? (y) [20Gy

[1G[0K⠙[1G[0K⠹[1G[0K⠸[1G[0K⠼[1G[0K⠴[1G[0K⠦[1G[0K⠧[1G[0K⠇[1G[0K⠏[1G[0K⠋[1G[0K⠙[1G[0K⠹[1G[0K⠸[1G[0K⠼[1G[0K⠴[1G[0K⠦[1G[0K⠧[1G[0K⠇[1G[0K⠏[1G[0K⠋[1G[0K⠙[1G[0K⠹[1G[0K⠸[1G[0K⠼[1G[0K⠴[1G[0K⠦[1G[0Kyour url is: https://few-ravens-cheer.loca.lt
2025-02-12 13:30:07.188433: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1739

In [None]:
!wget https://github.com/cloudflare/cloudflared/releases/latest/download/cloudflared-linux-amd64 -O cloudflared
!chmod +x cloudflared


In [None]:
!streamlit run app.py & ./cloudflared tunnel --url http://localhost:8501


### **How FAISS Stores Embeddings Without Signup**  

FAISS (**Facebook AI Similarity Search**) is an **in-memory** vector database, meaning it **stores everything in RAM** without requiring a cloud account or external storage.  

#### **How It Works:**  
1. **Creating an Index in RAM:**  
   ```python
   index = faiss.IndexFlatL2(dimension)  # Create FAISS index
   ```
   This initializes an empty FAISS index in memory.  

2. **Adding Embeddings:**  
   ```python
   index.add(embeddings)  # Add vectors to index
   ```
   This stores the embeddings inside FAISS **temporarily** (it disappears when the program stops).  

#### **Where FAISS Stores Data:**  
- **In RAM (temporary, lost on restart)**  
- **Not in Cloud (no signup needed)**  
- **Not Persistent (does not save automatically)**  

#### **How to Save and Reload FAISS Index Locally:**  
To persist the FAISS index, manually save and reload it:  
```python
faiss.write_index(index, "faiss_index.bin")  # Save to a file
index = faiss.read_index("faiss_index.bin")  # Load from a file
```

#### **Want Persistent Storage?**  
For long-term storage, use:  
- **FAISS + SQLite/PostgreSQL** (Hybrid local storage)  
- **FAISS + Pinecone** (Cloud-based vector storage)  
- **FAISS + Weaviate/Chroma** (Other vector databases)  


### **📌 Summary of the Code**  

This **Streamlit-based RAG (Retrieval-Augmented Generation) PDF chatbot** allows users to upload a PDF, extract its text, generate embeddings, retrieve relevant content, and answer questions using an AI model.

---

### **📂 Workflow**
1. **📄 Upload PDF** → Extracts text using **PyMuPDF (`fitz`)**  
2. **📑 Chunking** → Splits text into smaller pieces using **LangChain**  
3. **🔍 Embeddings** → Converts text chunks into vectors using **`all-MiniLM-L6-v2` (Sentence Transformers)**  
4. **📌 Store in FAISS** → Saves embeddings for **fast similarity search**  
5. **🔎 Retrieve Chunks** → Finds the most relevant text chunks based on user queries  
6. **💡 Generate Answer** → Uses **FLAN-T5 Large (Google)** to generate a response  
7. **📝 Display Response** → Shows AI-generated answers in **Streamlit UI**  

---

### **🤖 Models Used**
✅ **`all-MiniLM-L6-v2` (Sentence Transformers)** → **Embeddings**  
✅ **`google/flan-t5-large` (Hugging Face)** → **Answer generation**  

**💡 Purpose:**  
- **Embeddings** → Convert text into vectors for similarity search (FAISS)  
- **FLAN-T5** → Generates human-like answers based on retrieved text  

---

### **🚀 How It Works**
- **User uploads a PDF** 📄  
- **System processes and stores vector embeddings** 🔍  
- **User asks a question** ❓  
- **FAISS retrieves relevant chunks** 📌  
- **FLAN-T5 generates a simplified answer** 💡  
- **Displays the answer in Streamlit UI** 🎯  

---

### **🔥 Why is This Useful?**
✔ Converts **static PDFs into interactive chatbots**  
✔ **Efficient retrieval** using **FAISS**  
✔ Uses **state-of-the-art NLP models** (Embeddings + LLM)  
✔ **Open-source** and runs in **Google Colab**  

