# 🤖 Context-Aware Chatbot (RAG + LangChain)

This notebook builds a retrieval-augmented chatbot that answers multi-turn questions using a local knowledge corpus.  
It uses document ingestion, chunking, sentence-transformer embeddings, a Chroma vectorstore, and a small HuggingFace generation model for responses.


In [10]:
%pip install chromadb

Collecting chromadb
  Downloading chromadb-1.0.20-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (7.3 kB)
Collecting pybase64>=1.4.1 (from chromadb)
  Downloading pybase64-1.4.2-cp312-cp312-manylinux1_x86_64.manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_5_x86_64.whl.metadata (8.7 kB)
Collecting posthog<6.0.0,>=2.4.0 (from chromadb)
  Downloading posthog-5.4.0-py3-none-any.whl.metadata (5.7 kB)
Collecting onnxruntime>=1.14.1 (from chromadb)
  Downloading onnxruntime-1.22.1-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.metadata (4.9 kB)
Collecting opentelemetry-exporter-otlp-proto-grpc>=1.2.0 (from chromadb)
  Downloading opentelemetry_exporter_otlp_proto_grpc-1.36.0-py3-none-any.whl.metadata (2.4 kB)
Collecting pypika>=0.48.9 (from chromadb)
  Downloading PyPika-0.48.9.tar.gz (67 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m67.3/67.3 kB[0m [31m2.8 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [

In [1]:
# imports and environment hints
import os
import glob
from pathlib import Path
import json
import time

# LangChain + vectorstore + embeddings
from langchain.document_loaders import TextLoader, PyPDFLoader, DirectoryLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import Chroma
from langchain.chains import ConversationalRetrievalChain

# transformers for generation (Flan-T5)
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, pipeline

# utilities
import numpy as np
import pandas as pd


# 📂 Load local text and PDF documents

Place `.txt` or `.pdf` files in a folder named `./docs/`. This cell discovers files and prepares loaders.


In [2]:
docs_path = Path("docs")
docs_path.mkdir(exist_ok=True)

# list files (you should put your corpus files here)
file_list = sorted([str(p) for p in docs_path.glob("*") if p.suffix.lower() in [".txt", ".pdf", ".md"]])
print("Found files:", file_list)

# helper: choose loader by extension
def load_documents(file_paths):
    docs = []
    for p in file_paths:
        if p.lower().endswith(".pdf"):
            loader = PyPDFLoader(p)
            docs.extend(loader.load())
        elif p.lower().endswith(".txt") or p.lower().endswith(".md"):
            loader = TextLoader(p, encoding="utf8")
            docs.extend(loader.load())
        else:
            print("Skipping:", p)
    return docs

raw_docs = load_documents(file_list)
print(f"Loaded {len(raw_docs)} raw documents.")


Found files: ['docs/AI_ML Engineering – Advanced Internship Tasks.pdf']
Loaded 6 raw documents.


# ✂️ Clean and chunk documents for retrieval

We use a RecursiveCharacterTextSplitter to produce chunks that fit embedding & retrieval constraints.


In [3]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=800, chunk_overlap=120)
split_docs = []
for d in raw_docs:
    chunks = text_splitter.split_text(d.page_content)
    for i, chunk in enumerate(chunks):
        split_docs.append({"page_content": chunk, "metadata": {"source": getattr(d, "metadata", {}).get("source", "local"), "chunk": i}})
print("Total chunks:", len(split_docs))


Total chunks: 10


# 🧠 Sentence-transformer embeddings

We use `sentence-transformers/all-MiniLM-L6-v2` via LangChain's HuggingFaceEmbeddings wrapper.


In [5]:
hf_embedding_model = "sentence-transformers/all-MiniLM-L6-v2"
embeddings = HuggingFaceEmbeddings(model_name=hf_embedding_model, model_kwargs={"device": "cuda"})
# note: set device to "cuda" if GPU available
print("Embeddings initialized:", hf_embedding_model)


Embeddings initialized: sentence-transformers/all-MiniLM-L6-v2


# 💾 Create or load Chroma vectorstore

Chroma persists to `./chroma_db/`. If you already have a DB it will be reused.


In [6]:
persist_dir = "chroma_db"
collection_name = "developershub_corpus"

if os.path.exists(persist_dir) and os.listdir(persist_dir):
    # try to load existing
    vectordb = Chroma(persist_directory=persist_dir, embedding_function=embeddings, collection_name=collection_name)
    print("Loaded existing Chroma DB from", persist_dir)
else:
    # create new from split_docs
    texts = [d["page_content"] for d in split_docs]
    metadatas = [d["metadata"] for d in split_docs]
    vectordb = Chroma.from_texts(texts, embedding=embeddings, metadatas=metadatas, persist_directory=persist_dir, collection_name=collection_name)
    vectordb.persist()
    print("Created and persisted new Chroma DB at", persist_dir)


Created and persisted new Chroma DB at chroma_db


  vectordb.persist()


# 🔍 Retriever: configure how retrieval works

We use the Chroma retriever with k results and optional score thresholding.


In [7]:
retriever = vectordb.as_retriever(search_kwargs={"k": 5})
print("Retriever ready. k =", retriever.search_kwargs.get("k"))


Retriever ready. k = 5


# 🧩 Load a small HuggingFace generation model (Flan-T5)

Using `google/flan-t5-small` for lightweight generation. Swap to a larger model if you have GPU.


In [8]:
gen_model_name = "google/flan-t5-small"
tokenizer = AutoTokenizer.from_pretrained(gen_model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(gen_model_name)
# create a generation pipeline for simple calls
generator = pipeline("text2text-generation", model=model, tokenizer=tokenizer, device=-1)
print("Generator initialized:", gen_model_name)


tokenizer_config.json: 0.00B [00:00, ?B/s]

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json: 0.00B [00:00, ?B/s]

config.json: 0.00B [00:00, ?B/s]

model.safetensors:   0%|          | 0.00/308M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

Device set to use cpu


Generator initialized: google/flan-t5-small


# 🧾 Build a conversational retrieval chain using a custom generator wrapper

LangChain provides ConversationalRetrievalChain. We'll wrap generator in a simple callable that accepts prompt text.


In [10]:
from langchain.llms.base import LLM
from typing import Optional, List, Mapping, Any
from pydantic import Field

class HuggingFaceLLMWrapper(LLM):
    """Small wrapper to let LangChain call the HF pipeline as an LLM class."""
    pipeline: Any = Field(default=None) # Add pipeline field

    def __init__(self, pipeline):
        super().__init__() # Call super().__init__() to initialize Pydantic model
        self.pipeline = pipeline

    @property
    def _identifying_params(self) -> Mapping[str, Any]:
        return {"pipeline": str(self.pipeline)}

    @property
    def _llm_type(self) -> str:
        return "huggingface-flan-t5"

    def _call(self, prompt: str, stop: Optional[List[str]] = None) -> str:
        out = self.pipeline(prompt, max_length=256, do_sample=False)
        return out[0]["generated_text"]

hf_llm = HuggingFaceLLMWrapper(generator)
rag_chain = ConversationalRetrievalChain.from_llm(hf_llm, retriever, return_source_documents=True)
print("Conversational RAG chain ready.")

Conversational RAG chain ready.


# 🗣️ Interactive conversation demo

Run the block, then type queries into `user_input` to simulate interaction. The chain keeps `chat_history` state across turns.


In [11]:
chat_history = []
def ask_bot(user_input):
    result = rag_chain({"question": user_input, "chat_history": chat_history})
    answer = result["answer"]
    sources = result.get("source_documents", [])
    # append to history for multi-turn context
    chat_history.append((user_input, answer))
    return answer, sources

# example queries (replace with your own)
q1 = "What are the main objectives described in the corpus?"
a1, s1 = ask_bot(q1)
print("Q:", q1)
print("A:", a1[:800])
print("Sources:", [m.metadata for m in s1][:3])


  result = rag_chain({"question": user_input, "chat_history": chat_history})
Token indices sequence length is longer than the specified maximum sequence length for this model (607 > 512). Running this sequence through the model will result in indexing errors
Both `max_new_tokens` (=256) and `max_length`(=256) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


Q: What are the main objectives described in the corpus?
A: Streamlit Skills Gained:  Conversational AI development  Document embedding and vector search  Retrieval-Augmented Generation (RAG)  LLM integration and deployment Task 5: Auto Tagging Support Tickets Using LLM Objective: Automatically tag support tickets into categories using a large language model (LLM)  Deploy the chatbot with Streamlit Skills Gained:  Conversational AI development  Document embedding and vector search  Retrieval-Augmented Generation (RAG)  LLM integration and deployment Task 5: Context-Aware Chatbot Using LangChain or RAG Objective: Build a conversational chatbot that can remember context and retrieve external information during conversations
Sources: [{'chunk': 0, 'source': 'docs/AI_ML Engineering – Advanced Internship Tasks.pdf'}, {'chunk': 0, 'source': 'docs/AI_ML Engineering – Advanced Internship Tasks.pdf'}, {'source': 'docs/AI_ML Engineering – Advanced Internship Tasks.pdf', 'chunk': 1}]


# 📚 Inspect the top retrieved chunks for a given query

This helps debug and verify whether retrieval finds relevant passages.


In [12]:
def show_retrieved(query, k=5):
    docs = retriever.get_relevant_documents(query)
    for i, d in enumerate(docs[:k]):
        print("---- RETRIEVED", i, "----")
        print("META:", d.metadata)
        print(d.page_content[:800])
        print()
show_retrieved("multimodal housing images and tabular features")


---- RETRIEVED 0 ----
META: {'chunk': 0, 'source': 'docs/AI_ML Engineering – Advanced Internship Tasks.pdf'}
●  Production-readiness  practices   
Task  3:  Multimodal  ML  –  Housing  Price  Prediction  Using  Images  +  Tabular  Data   
Objective:   Predict  housing  prices  using  both  structured  data  and  house  images.   
Dataset:   Housing  Sales  Dataset  +  Custom  Image  Dataset  (your  own  or  any  public  source)  
Instructions:   
●  Use  CNNs  to  extract  features  from  images   
●  Combine  extracted  image  features  with  tabular  data   
●  Train  a  model  using  both  modalities   
●  Evaluate  performance  using  MAE  and  RMSE   
Skills  Gained:   
●  Multimodal  machine  learning   
●  Convolutional  Neural  Networks  (CNNs)   
●  Feature  fusion  (image  +  tabular)   
●  Regression  modeling  and  evaluation

---- RETRIEVED 1 ----
META: {'source': 'docs/AI_ML Engineering – Advanced Internship Tasks.pdf', 'chunk': 1}
●  Feature  fusion  (image  +  tabular) 

  docs = retriever.get_relevant_documents(query)


# 🔧 Tweak retriever parameters for better recall

Adjust `k` or change the similarity function via Chroma options (in advanced setups).


In [13]:
retriever.search_kwargs["k"] = 8
print("Updated retriever k to", retriever.search_kwargs["k"])


Updated retriever k to 8


# 🧪 Evaluate the bot with multiple example prompts

Provide structured prompts and collect answers for review.


In [14]:
test_prompts = [
    "Summarize the methodology described in the documents.",
    "How were images used in the pipeline?",
    "What evaluation metrics are recommended?"
]
results = []
for p in test_prompts:
    ans, srcs = ask_bot(p)
    results.append({"prompt": p, "answer": ans, "sources": [d.metadata for d in srcs]})
pd.DataFrame(results)


Both `max_new_tokens` (=256) and `max_length`(=256) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
Both `max_new_tokens` (=256) and `max_length`(=256) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
Both `max_new_tokens` (=256) and `max_length`(=256) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
Both `max_new_tokens` (=256) and `max_length`(=256) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


Unnamed: 0,prompt,answer,sources
0,Summarize the methodology described in the doc...,Key results or observations 4. Submission on G...,"[{'chunk': 0, 'source': 'docs/AI_ML Engineerin..."
1,How were images used in the pipeline?,Using Streamlit or Gradio for live interaction,[{'source': 'docs/AI_ML Engineering – Advanced...
2,What evaluation metrics are recommended?,Use prompt engineering or fine-tuning with an ...,"[{'chunk': 1, 'source': 'docs/AI_ML Engineerin..."


# 🧾 Persist the vectorstore and optionally export the retriever state

Chroma is persisted already; save a simple index manifest for reproducibility.


In [15]:
vectordb.persist()
with open("chroma_manifest.json", "w", encoding="utf8") as f:
    json.dump({"persist_dir": persist_dir, "collection": collection_name, "embedding_model": hf_embedding_model, "generator": gen_model_name}, f, indent=2)
print("Saved chroma manifest and persisted DB.")


Saved chroma manifest and persisted DB.


# 🧩 Add one-off documents to the index (useful for updates)

This cell shows how to add new text and update embeddings without rebuilding everything.


In [16]:
def add_text_to_index(text, metadata=None):
    vectordb.add_texts([text], metadatas=[metadata or {}])
    vectordb.persist()

# example:
# add_text_to_index("This is a new policy document paragraph.", {"source": "policy.docx", "chunk": 0})
