
# Task 4: Context-Aware Chatbot Using LangChain / RAG

## Problem Statement & Objective
Build a conversational chatbot that can:

- Remember context (conversational memory)
- Retrieve answers from a vectorized knowledge base (RAG)
- Deploy with Streamlit

This notebook demonstrates the full pipeline with dataset loading, embeddings, vectorstore, retrieval chain, evaluation, and deployment guide.


In [None]:

# Install required libraries (uncomment if running first time)
# !pip install -U langchain sentence-transformers faiss-cpu streamlit scikit-learn matplotlib pandas


In [None]:

import os
import glob
import pandas as pd
import numpy as np

from typing import List
from sentence_transformers import SentenceTransformer
import faiss

# LangChain imports
from langchain.schema import Document
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import FAISS
from langchain.chains import ConversationalRetrievalChain
from langchain.llms import OpenAI  # optional
from langchain.memory import ConversationBufferMemory


In [None]:

# Helper functions to load corpus (from txt files)

def load_corpus_from_txts(folder_path: str) -> List[Document]:
    docs = []
    for path in glob.glob(os.path.join(folder_path, "*.txt")):
        with open(path, "r", encoding="utf-8") as f:
            text = f.read()
        metadata = {"source": os.path.basename(path)}
        docs.append(Document(page_content=text, metadata=metadata))
    return docs

def chunk_documents(documents: List[Document], chunk_size: int = 500, overlap: int = 50):
    new_docs = []
    for d in documents:
        text = d.page_content
        start = 0
        while start < len(text):
            chunk = text[start:start+chunk_size]
            new_meta = dict(d.metadata)
            new_meta["chunk_start"] = start
            new_docs.append(Document(page_content=chunk, metadata=new_meta))
            start += chunk_size - overlap
    return new_docs

print("Functions ready. Place your .txt files in a 'data/' folder.")


In [None]:

MODEL_NAME = "all-MiniLM-L6-v2"

def build_vectorstore(documents: List[Document], persist_directory: str = None):
    hf_embeddings = HuggingFaceEmbeddings(model_name=MODEL_NAME, model_kwargs={"device": "cpu"})
    texts = [d.page_content for d in documents]
    metadatas = [d.metadata for d in documents]
    vectorstore = FAISS.from_texts(texts, hf_embeddings, metadatas=metadatas)
    if persist_directory:
        vectorstore.save_local(persist_directory)
    return vectorstore

# Example usage (if data folder available):
# docs = load_corpus_from_txts("data")
# chunks = chunk_documents(docs)
# vectorstore = build_vectorstore(chunks, persist_directory="faiss_store")


In [None]:

def build_conversational_chain(vectorstore, llm_choice: str = "mock", openai_api_key: str = None):
    memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True)
    if llm_choice == "openai":
        if not openai_api_key:
            raise ValueError("OpenAI API key required")
        os.environ["OPENAI_API_KEY"] = openai_api_key
        llm = OpenAI(temperature=0)
    else:
        from langchain.llms.fake import FakeListLLM
        fake = FakeListLLM(responses=["[Mock reply] Replace with real LLM for actual answers."])
        llm = fake
    qa = ConversationalRetrievalChain.from_llm(llm=llm, retriever=vectorstore.as_retriever(), memory=memory)
    return qa


In [None]:

# Example (after building vectorstore):
# qa_chain = build_conversational_chain(vectorstore, llm_choice="mock")
# res = qa_chain.run(input="What is this document about?")
# print(res)



## 🚀 Streamlit Deployment

Once your dataset and vectorstore are ready, you can deploy the chatbot with Streamlit.

1. Save the main script as `Task4_ContextAware_RAG_Chatbot.py` (provided separately).
2. In terminal, run:
   ```bash
   streamlit run Task4_ContextAware_RAG_Chatbot.py
   ```
3. Open the browser link (usually `http://localhost:8501`) to interact with the chatbot.




## ✅ Final Summary

- We created a **context-aware chatbot** using LangChain with RAG.

- Used **sentence-transformers embeddings** + **FAISS vectorstore**.

- Added **conversation memory**.

- Deployment is done via **Streamlit**.


You can now upload this notebook and script to GitHub for submission.
