# Conversational Agent for ML Textbook (RAG-based Chatbot using Hugging Face)

**Student:** Jaydutt Dave (PGE5)  
**Project Deadline:** 11 May 2025  

---

## 📚 Project Overview

This project demonstrates a **Retrieval-Augmented Generation (RAG)** based conversational agent using a machine learning textbook as the knowledge base.  
Users can ask questions in natural language and receive contextually relevant answers based on the textbook content.

The chatbot leverages the following technologies:

- **PDF Text Extraction** with `PyMuPDF`
- **Text Embedding** using Hugging Face’s `all-MiniLM-L6-v2`
- **Vector Similarity Search** using **FAISS**
- **Answer Generation** using Hugging Face models such as `google/flan-t5-base` or `deepset/roberta-base-squad2`
- **User Interface** built with **Gradio**

### 🔄 Attempted Approach with OpenAI

Initially, OpenAI's GPT model was integrated for answer generation. However, due to **quota limitations** and the requirement of a **paid subscription**, this approach was not feasible.  
To keep the solution free and open-source, the project was successfully migrated to use **Hugging Face models and APIs** instead.

By combining semantic search with powerful open-source language models, this chatbot offers an interactive and intelligent way to explore and understand machine learning concepts directly from the textbook.

---

✅ **Tech Stack Used:**
- Python, Transformers, LangChain
- Hugging Face Inference APIs
- Gradio for chatbot UI

⚠️ **Note:** No OpenAI API is used in this project. All models and components are open-source and run via Hugging Face integration.

---


**Install required packages**

In [2]:
!pip install -q langchain langchain-community langchain-huggingface faiss-cpu sentence-transformers gradio transformers pypdf

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.5/2.5 MB[0m [31m24.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m31.3/31.3 MB[0m [31m26.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m54.1/54.1 MB[0m [31m10.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m322.9/322.9 kB[0m [31m11.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m303.4/303.4 kB[0m [31m17.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m95.2/95.2 kB[0m [31m5.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m437.7/437.7 kB[0m [31m16.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m44.4/44.4 kB[0m [31m1.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

**Authenticate with HuggingFace**

In [3]:
from huggingface_hub import login

# Replace with your actual token
login(token="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")

**Load and split the PDF**

In [4]:
from langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

# Upload your PDF to Colab and set the filename
loader = PyPDFLoader("/content/drive/MyDrive/ml_intro.pdf")
pages = loader.load_and_split()

# Chunk into manageable parts
splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
docs = splitter.split_documents(pages)

**Generate Embeddings and Create Vector Store**

In [5]:
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_community.vectorstores import FAISS

embedding = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")
db = FAISS.from_documents(docs, embedding)
retriever = db.as_retriever(search_kwargs={"k": 1})

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.5k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

**Load HuggingFace Model (FLAN-T5)**

In [6]:
from transformers import AutoTokenizer, AutoModelForQuestionAnswering, pipeline
from langchain.llms import HuggingFacePipeline

model_name = "deepset/roberta-base-squad2"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForQuestionAnswering.from_pretrained(model_name)

qa_pipeline = pipeline("question-answering", model=model, tokenizer=tokenizer)

llm = HuggingFacePipeline(pipeline=qa_pipeline)

tokenizer_config.json:   0%|          | 0.00/79.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/571 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/772 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/496M [00:00<?, ?B/s]

Device set to use cpu
  llm = HuggingFacePipeline(pipeline=qa_pipeline)


**Create the Retrieval-QA Chain**

In [7]:
from langchain.chains import RetrievalQA

qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    retriever=retriever,
    return_source_documents=True
)

def chat_function(question):
    try:
        result = qa_chain.invoke({"query": question})
        return result["result"]
    except Exception as e:
        return f"❌ Error: {str(e)}"


**Launch Gradio Chatbot UI**

In [11]:
import gradio as gr

with gr.Blocks() as demo:
    gr.Markdown("## 🤖 Machine Learning Textbook Chatbot")

    question_input = gr.Textbox(label="Ask a question")
    answer_output = gr.Textbox(label="Answer", lines=3)
    chat_history = gr.Textbox(label="Chat History", lines=10, interactive=False)

    def handle_query(question, history):
        answer = chat_function(question)
        new_history = history + f"\nYou: {question}\nBot: {answer}\n"
        return new_history, answer

    question_input.submit(handle_query, [question_input, chat_history], [chat_history, answer_output])

demo.launch()

It looks like you are running Gradio on a hosted a Jupyter notebook. For the Gradio app to work, sharing must be enabled. Automatically setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://06d86c5640355c48e1.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)


