<a href="https://colab.research.google.com/github/Purvesh-Chitre/Assignment_Tasks/blob/Assignment_3/Assignment_Task_3.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Assignment Task 3<br>
Build a RAG-based Chatbot in Google Colab!<br>

Build a chatbot that:<br>
Uses open-source LLMs (like Mistral, LLaMA, or GPT-4All).<br>
Supports Retrieval-Augmented Generation (RAG).<br>
Allows PDF uploads and extracts relevant information.<br>
Uses a vector database (like FAISS or ChromaDB) for efficient similarity search.<br>
Runs in Google Colab for easy testing.<br>

## **Step 1: Install Required Libraries** <br>

First, we need to install:<br>
-	LangChain (for LLM & RAG integration)<br>
-	FAISS (for similarity search)<br>
-	PyMuPDF (for extracting text from PDFs)<br>
-	Hugging Face Transformers (for the open-source LLM)<br>
-	Sentence Transformers (for embedding generation)<br>

In [1]:
!pip install langchain faiss-cpu transformers accelerate torch sentence-transformers chromadb



In [2]:
!pip install -U langchain-community langchain



In [3]:
!pip install pymupdf



### Import Required Libraries

In [4]:
# Import the torch library for building and training neural networks
import torch
# Import the langchain library for developing applications powered by language models
import langchain
# Import the pymupdf library (also known as fitz) for working with PDF documents
import pymupdf
# Import the sentence_transformers library for creating sentence embeddings
import sentence_transformers
# Import the chromadb library for working with the Chroma vector database
import chromadb
# Import the os module for interacting with the operating system (e.g., file paths)
import os
# Import the fitz module (PyMuPDF) for extracting text from PDFs
import fitz
# Import the faiss library for efficient similarity search and clustering of dense vectors
import faiss
# Import the HuggingFaceEmbeddings class from langchain.embeddings for using Hugging Face models for embeddings
from langchain.embeddings import HuggingFaceEmbeddings
# Import the FAISS class from langchain.vectorstores for using FAISS as a vector store
from langchain.vectorstores import FAISS
# Import the CharacterTextSplitter class from langchain.text_splitter for splitting text into chunks
from langchain.text_splitter import CharacterTextSplitter
# Import the HuggingFacePipeline class from langchain.llms for using Hugging Face models as language models
from langchain.llms import HuggingFacePipeline
# Import the RetrievalQA class from langchain.chains for creating question-answering chains over documents
from langchain.chains import RetrievalQA
# Import the AutoModelForCausalLM and AutoTokenizer classes from transformers for loading pre-trained causal language models and tokenizers
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
# Import the files module from google.colab for interacting with the Google Colab environment (e.g., file uploads)
from google.colab import files

print("All required libraries are successfully installed!")

All required libraries are successfully installed!


### Mount Google Drive

In [5]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


## **Step 2: Load an Open-Source LLM** <br>

Now Let’s Load Mistral 7B v0.3 & Proceed with RAG Chatbot <br>
Load Mistral 7B v0.3<br>
We first need to connect to the LLM via Hugging Face. Login with the Read access token. Generate a new one and copy it to the login box if not already done.<br>



In [6]:
from huggingface_hub import login
login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

Checking if login is successful to engage the LLM with read permission.

In [7]:
!huggingface-cli whoami

PurveshP


Run this to load the Mistral 7B Instruct v0.3 model:<br>

In [16]:
# Load Mistral 7B v0.3
model_name = "mistralai/Mistral-7B-Instruct-v0.3"
tokenizer = AutoTokenizer.from_pretrained(model_name, use_auth_token=True)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.float16, device_map="auto", use_auth_token=True)

# Create a text-generation pipeline
hf_pipeline = pipeline("text-generation", model=model, tokenizer=tokenizer, max_new_tokens=200,
                       pad_token_id=tokenizer.eos_token_id) # Prevents repetition padding issues

# Fix: Wrap it in `HuggingFacePipeline` to make it compatible with LangChain
llm = HuggingFacePipeline(pipeline=hf_pipeline)

print("Mistral 7B v0.3 model loaded successfully with LangChain compatibility!")



Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

Device set to use cuda:0


Mistral 7B v0.3 model loaded successfully with LangChain compatibility!


### Upload a PDF & Extract Text

In [17]:
import os

# Path to the PDF inside Google Drive
pdf_path = "/content/drive/MyDrive/PDF_upload/1. Leadership that gets results.pdf"

# Verify if the file exists
if os.path.exists(pdf_path):
    print(f"Found PDF: {pdf_path}")
else:
    print("PDF file not found. Check your Google Drive path.")

Found PDF: /content/drive/MyDrive/PDF_upload/1. Leadership that gets results.pdf


In [18]:
# import fitz  # PyMuPDF for PDF text extraction

# Function to extract text from the PDF
def extract_text_from_pdf(pdf_path):
    doc = fitz.open(pdf_path)
    text = "\n".join([page.get_text("text") for page in doc])
    return text

# Extract text from the found PDF
pdf_text = extract_text_from_pdf(pdf_path)
print(f"Extracted {len(pdf_text)} characters from PDF.")

Extracted 55294 characters from PDF.


## Step 3: Store Extracted Text in FAISS (Vector Database)<br>
Now, let’s split the text and store it for retrieval.<br>
What This Does:<br>
-	Splits the extracted text into smaller chunks (500 characters each).<br>
-	Converts each chunk into numerical embeddings for efficient search.<br>
-	Stores them in FAISS, a fast vector database.<br>

In [19]:
# Split text into smaller chunks
text_splitter = CharacterTextSplitter(chunk_size=500, chunk_overlap=100)
text_chunks = text_splitter.split_text(pdf_text)

# Generate embeddings
embedding_model = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")

# Store in FAISS (Vector Database)
vector_db = FAISS.from_texts(text_chunks, embedding_model)
print(f"Stored {len(text_chunks)} text chunks in FAISS for retrieval.")



Stored 13 text chunks in FAISS for retrieval.


## Step 4: Set Up the Retrieval-Augmented Generation (RAG) Chatbot <br>
Now, we connect the FAISS database to our Mistral 7B model for intelligent responses.<br>
What This Does:<br>
-	Connects our PDF text database with Mistral 7B.<br>
-	Uses retrieval-augmented generation (RAG) to fetch relevant text before generating responses.<br>

## Challenges of this chatbot and LLM: <br>
LLM being large requires a lot of time to generate the answer response. <br>
An option is to utilize libraries like BitsAndBytes to speed up inference. <br>
Another option is to configure the retrieved document chunks. <br>

In [20]:
# Create a RAG-based chatbot
rag_chain = RetrievalQA.from_chain_type(
    llm=HuggingFacePipeline(pipeline=hf_pipeline),
    retriever=vector_db.as_retriever(),
    chain_type="stuff"
)

# Function to chat with the PDF content
def chat_with_rag(query):
    response = rag_chain.run(query)
    return response

### Creating the chatbot

In [22]:
while True:
    query = input("Ask something about the PDF (or type 'exit' to stop): ")
    if query.lower() == "exit":
        print("Exiting chatbot...")
        break
    response = chat_with_rag(query)
    print("\nChatbot Response:\n", response)

Ask something about the PDF (or type 'exit' to stop): what is this book about?


OutOfMemoryError: CUDA out of memory. Tried to allocate 142.00 MiB. GPU 0 has a total capacity of 14.74 GiB of which 138.12 MiB is free. Process 94345 has 14.60 GiB memory in use. Of the allocated memory 14.14 GiB is allocated by PyTorch, and 341.68 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

**Challenges:**<br>
I did have successful previous result, however, I wanted to improve output quality and tried to modify some parameters which resulted in a log runtime.<br>
As can be seen from the above error. Due to the utilization of resources and limitations of time, the code, though operational is terminated from working.<br>
I am sharing the code as is at this point in time.<br>