## Problem Statement: InsureLLM Internal Knowledge Chatbot
**Project Title**: InsureLLM-Bot: An Intelligent Assistant for Internal Document Retrieval

**Introduction**:
InsureLLM is a growing insurance technology company with an expanding internal knowledge base. Critical information about the company's history, employee roles, product specifications, and client contracts is currently stored in a collection of Markdown (.md) files distributed across multiple folders. As the company scales, employees find it increasingly difficult and time-consuming to manually search through these documents to find specific answers, leading to decreased productivity and inconsistent information sharing.

**The Problem**:
There is no centralized, intelligent system for employees to quickly and accurately query the company's internal knowledge base. An employee needing to know the CEO's prior experience or the specifics of a product's features has to manually locate and read through potentially dozens of files. This process is inefficient and prone to human error.

**Objective**:
The goal of this project is to develop a Retrieval-Augmented Generation (RAG) application that serves as an internal chatbot for InsureLLM. This chatbot will allow employees to ask questions in natural language and receive concise, accurate answers sourced directly from the company's private documents.

**Technical Requirements**:

- **Data Ingestion**: The system must automatically load all .md files from a knowledge-base directory containing the subfolders: company, employee, product, and contract.

- **Vector Database**: The loaded documents must be chunked, converted into vector embeddings, and stored in a local ChromaDB vector store for efficient similarity searches.

- **LLM Integration**: The application will use one of Google's powerful Gemini models (e.g., gemini-1.5-flash-latest) via the GoogleGenerativeAI LangChain integration to understand queries and generate answers.

- **RAG Pipeline**: A conversational retrieval chain must be implemented to:

        - Take a user's question.
        
        - Retrieve the most relevant document chunks from ChromaDB.
        
        - Augment the LLM's context with this retrieved information.
        
        - Generate a factually grounded answer.

- **User Interface**: A simple, web-based chat interface will be created using Gradio to allow for interactive and conversational queries.

- **Memory**: The chatbot must have conversational memory to understand follow-up questions within the context of an ongoing conversation.

## Success Criteria:
The project will be considered a success when an employee, such as a new hire, can ask the chatbot a series of questions and receive accurate answers. For example:

- "Can you describe InsureLLM in a few sentences?"

- "Who is the CEO?"

- "What was Avery Lancaster's experience before joining the company?"

The final deliverable will be a well-documented Jupyter Notebook that demonstrates the entire end-to-end process, from data loading to the interactive chat application.

## Sample Knowledge Base Structure & Content
To make the demo work, your knowledge-base folder should be structured like this, with .md files inside:

''' knowledge-base/
├── company/
│   └── about_us.md
├── employee/
│   └── avery_lancaster_ceo.md
├── product/
│   └── policyguard_pro.md
└── contract/
    └── standard_terms.md 
    '''

In [None]:
# imports
import os
import glob
import gradio as gr
from langchain.document_loaders import DirectoryLoader, TextLoader
from langchain.text_splitter import CharacterTextSplitter
from langchain_google_genai import GoogleGenerativeAIEmbeddings, GoogleGenerativeAI
from langchain.vectorstores import Chroma
from langchain.memory import ConversationBufferMemory
from langchain.chains import ConversationalRetrievalChain

In [None]:
# Load API Key
# --------------------
# IMPORTANT: This cell sets your Google API key as an environment variable.
# LangChain's Google libraries automatically look for this variable to authenticate.
# Replace "YOUR_API_KEY_HERE" with your actual Gemini API key.

os.environ["GOOGLE_API_KEY"] = "YOUR_API_KEY_HERE"

In [None]:
# Load Documents from Knowledge Base
# ------------------------------------------
# This cell scans the 'knowledge-base' directory and all its subfolders for Markdown (.md) files.
# It uses LangChain's DirectoryLoader to read the content of each file.
# Each document is also tagged with metadata indicating its source folder (e.g., 'company', 'employee').

print("Loading documents from the knowledge base...")
folders = glob.glob("knowledge-base/*")
text_loader_kwargs = {'encoding': 'utf-8'}
documents = []

for folder in folders:
    doc_type = os.path.basename(folder)
    loader = DirectoryLoader(
        folder,
        glob="**/*.md",
        loader_cls=TextLoader,
        loader_kwargs=text_loader_kwargs
    )
    folder_docs = loader.load()
    for doc in folder_docs:
        doc.metadata["doc_type"] = doc_type
        documents.append(doc)
print(f"Successfully loaded {len(documents)} documents.")

In [None]:
# Cell 4: Split Documents into Chunks
# -----------------------------------
# Large documents are too big to fit into the context window of an LLM.
# Here, we split the loaded documents into smaller chunks of text.
# `chunk_overlap` ensures that there is some continuity between chunks to not lose context.

print("Splitting documents into chunks...")
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
chunks = text_splitter.split_documents(documents)
print(f"Created {len(chunks)} document chunks.")

In [None]:
#: Create and Persist the Vector Database
# -----------------------------------------------
# This is the core of the "Retrieval" part of RAG.
# 1. Initialize the Gemini embedding model ('models/embedding-001').
# 2. Use Chroma.from_documents to:
#    a. Convert all text chunks into numerical vectors (embeddings) using the Gemini model.
#    b. Store these vectors in a local directory named 'vector_db'.
# This process only needs to be run once. For subsequent runs, we can load the saved database.

print("Creating and persisting the vector database...")
embeddings = GoogleGenerativeAIEmbeddings(model="models/embedding-001")
vector_store = Chroma.from_documents(
    documents=chunks,
    embedding=embeddings,
    persist_directory="./vector_db"
)
print("✅ Vector store created successfully in the 'vector_db' folder!")

In [None]:
#: Set Up the Conversational RAG Chain
# -------------------------------------------
# This cell assembles all the components into a complete, stateful chatbot pipeline.
# 1. Load the persisted ChromaDB from disk.
# 2. Create a 'retriever' which is a component that can search the vector store for relevant chunks.
# 3. Initialize the Gemini LLM ('gemini-1.5-flash-latest') for generating answers.
# 4. Set up ConversationBufferMemory to store the chat history.
# 5. Create the ConversationalRetrievalChain, which orchestrates the entire process:
#    - Takes a question.
#    - Uses memory to rephrase it if needed.
#    - Sends it to the retriever.
#    - Bundles the question and retrieved documents into a prompt for the LLM.
#    - Gets the final answer from the LLM.

print("Setting up the conversational RAG chain...")
db = Chroma(persist_directory="./vector_db", embedding_function=embeddings)
retriever = db.as_retriever(search_kwargs={'k': 3}) # Retrieve top 3 relevant chunks
llm = GoogleGenerativeAI(model="gemini-1.5-flash-latest", temperature=0.1)
memory = ConversationBufferMemory(memory_key='chat_history', return_messages=True)

qa_chain = ConversationalRetrievalChain.from_llm(
    llm=llm,
    retriever=retriever,
    memory=memory
)
print("✅ Chatbot is ready.")

In [None]:
# Launch the Gradio Web Interface
# ---------------------------------------
# This final cell creates and launches the user interface for our chatbot.
# 1. Define a 'chat' function that takes a user's message and history.
# 2. Inside the function, it calls our 'qa_chain' to get the answer.
# 3. It then launches a Gradio ChatInterface, which provides a clean, web-based UI
#    that appears directly in the Jupyter Notebook output.

def chat(message, history):
    """Function to handle the chat logic for the Gradio interface."""
    result = qa_chain.invoke({"question": message})
    return result["answer"]

print("Launching Gradio Chat Interface...")
view = gr.ChatInterface(
    fn=chat,
    title="InsureLLM RAG Chatbot 🤖",
    description="Ask me anything about InsureLLM's company info, employees, products, or contracts."
).launch()
