<a href="https://colab.research.google.com/github/Rohit-Munda/GenAIWorkshop/blob/main/langchain_chatbot_with_gradio_prompt.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 🤖 Build a Simple Chatbot with LangChain, ChromaDB, and Gradio
In this demo, you'll build a basic chatbot that:
- Accepts a `.pdf` or `.txt` document upload
- Splits the document into chunks
- Embeds and indexes the chunks using ChromaDB
- Allows users to ask questions
- Uses LangChain to retrieve relevant chunks and generate answers

We'll also use **Gradio** to build a simple user interface.

## ✅ Install Required Libraries

In [1]:
!pip install -q langchain_community chromadb pypdf gradio

[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/67.3 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m67.3/67.3 kB[0m [31m3.2 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.5/2.5 MB[0m [31m35.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m18.9/18.9 MB[0m [31m88.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m94.9/94.9 kB[0m [31m5.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m302.3/302.3 kB[0m [31m19.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m54.1/54.1 MB[0m [31m16.3 MB/s[0m eta [36m0:00:00

## ✅ Import Required Libraries

In [2]:
import os
import gradio as gr
from google.colab import userdata
from langchain.embeddings import SentenceTransformerEmbeddings
from langchain.vectorstores import Chroma
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.document_loaders import TextLoader, PyPDFLoader
from langchain.chains import RetrievalQA
from langchain.llms import HuggingFaceHub
from sentence_transformers import SentenceTransformer

## 📄 Step 1:  Document Upload, Chunking, and Embedding

In [3]:

def process_file(file_path):
    if file_path.endswith(".pdf"):
        loader = PyPDFLoader(file_path)
    elif file_path.endswith(".txt"):
        loader = TextLoader(file_path)
    else:
        raise ValueError("Unsupported file type")

    documents = loader.load()

    # Split into chunks
    splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
    chunks = splitter.split_documents(documents)

    # Create embeddings
    embeddings = SentenceTransformerEmbeddings(model_name="all-MiniLM-L6-v2")

    # Setup vector store with Chroma
    vectordb = Chroma.from_documents(chunks, embedding=embeddings, persist_directory="./chroma_db")
    vectordb.persist()

    # Setup retriever
    retriever = vectordb.as_retriever()
    return retriever


## 🧠 Step 2: LangChain RetrievalQA Chain Setup

In [8]:

from langchain.prompts import PromptTemplate
from langchain.chains import RetrievalQAWithSourcesChain

retriever = None

# Prompt template for RetrievalQA
prompt_template = PromptTemplate(
    input_variables=["context", "question"],
    template="""Use the following context to answer the question. Return ONLY the answer, nothing else.

Context: {context}
Question: {question}

Answer:"""
)

def upload_and_process(file):
    global retriever
    file_path = file.name
    retriever = process_file(file_path)
    return "✅ Document processed! You can now ask questions."

def answer_query(query):
    global retriever
    if retriever is None:
        return "⚠️ Please upload and process a document first."

    llm = HuggingFaceHub(repo_id="mistralai/Mistral-Nemo-Instruct-2407", huggingfacehub_api_token=userdata.get('HF_TOKEN'),model_kwargs={"temperature":0.2, "max_length":256})
    qa_chain = RetrievalQA.from_chain_type(
        llm=llm,
        retriever=retriever,
        chain_type="stuff",
        chain_type_kwargs={"prompt": prompt_template},
        return_source_documents=False
    )
    result = qa_chain.run({"query": query})
    return result.split("Answer:")[-1].strip()


## 🖼️ Step 3: Gradio UI

In [9]:

with gr.Blocks() as demo:
    gr.Markdown("## 📄 Upload a Document and Ask Questions")

    with gr.Row():
        file_input = gr.File(label="Upload .pdf or .txt file")
        file_output = gr.Textbox(label="Status")

    with gr.Row():
        question_input = gr.Textbox(label="Ask a question")
        answer_output = gr.Textbox(label="Answer")

    file_input.change(upload_and_process, inputs=file_input, outputs=file_output)
    question_input.change(answer_query, inputs=question_input, outputs=answer_output)

demo.launch(share=True, debug=True)


Colab notebook detected. This cell will run indefinitely so that you can see errors and logs. To turn off, set debug=False in launch().
* Running on public URL: https://d9c96c4a77c4f53667.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)




Keyboard interruption in main thread... closing server.
Killing tunnel 127.0.0.1:7860 <> https://d9c96c4a77c4f53667.gradio.live





## ✅ Summary

In this notebook, you've learned how to:
- Load documents in `.txt` or `.pdf` format
- Split documents into chunks
- Embed and store the chunks in ChromaDB
- Use LangChain's `RetrievalQA` to answer questions
- Build an interactive chatbot interface using Gradio

This forms the foundation of a **RAG-based (Retrieval-Augmented Generation)** chatbot.
