# Supervisor-Worker Agentic Architecture for RAG Chatbot

This notebook implements a Supervisor-Worker agentic architecture using the LangGraph framework to build a Retrieval-Augmented Generation (RAG) chatbot that answers queries about a university Master's program in Applied Data Science.

## Architecture Overview

The architecture consists of:

1. **Supervisor Agent**: Manages workflow execution, routes queries, evaluates results, and makes decisions about re-retrieval or answer refinement.
2. **Worker Agents**:
   - **Router Agent**: Classifies incoming queries to identify the type of information needed
   - **Retriever Agent**: Performs semantic similarity retrieval from ChromaDB
   - **Grader/Verifier Agent**: Evaluates the quality and relevance of retrieved documents
   - **Reasoner/Answer Generator**: Synthesizes answers from graded documents
   - **Summarizer Agent** (optional): Generates concise summaries of lengthy content

## Data Context

The chatbot uses information stored in a ChromaDB vector database containing markdown files with:
- Faculty and instructor bios
- Detailed course descriptions
- Tuition, fees, and financial aid details
- General program information, events, and academic content

## Setup and Imports

First, let's import the necessary libraries and set up the environment.

In [None]:
# Core imports
import os
from typing import List, Dict, Any, Literal, TypedDict, Optional, Annotated, Tuple, Union
import json
from pprint import pprint

# LangGraph imports
from langgraph.graph import StateGraph, END
from langgraph.checkpoint.memory import MemorySaver
from langgraph.prebuilt import create_agent_executor
import langsmith

# LangChain imports
from langchain_core.messages import HumanMessage, AIMessage, SystemMessage, FunctionMessage
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate, PromptTemplate
from langchain_openai import ChatOpenAI

# Vector database
import chromadb
from sentence_transformers import SentenceTransformer

# For visualizing the graph (optional)
import networkx as nx
import matplotlib.pyplot as plt

## Environment Setup

Set up the OpenAI API key for our agents and define the ChromaDB paths.

In [None]:
# Environment variables
import os
from dotenv import load_dotenv

# Load environment variables from .env file (if you have one)
load_dotenv()

# Set your OpenAI API key here if not using .env file
# os.environ["OPENAI_API_KEY"] = "your-api-key"

# Path to ChromaDB
CHROMA_DIR = "../data/chroma_db"
COLLECTION_NAME = "ms_applied_data_science"

In [None]:
# Initialize ChromaDB client and connect to existing collection
def initialize_chroma_client():
    """Initialize the ChromaDB client and connect to the existing collection."""
    try:
        client = chromadb.PersistentClient(path=CHROMA_DIR)
        collection = client.get_collection(name=COLLECTION_NAME)
        print(f"Successfully connected to ChromaDB collection: {COLLECTION_NAME}")
        return client, collection
    except Exception as e:
        print(f"Error connecting to ChromaDB: {e}")
        print("Please make sure you've run the vector_database.ipynb notebook first.")
        return None, None

# Initialize Sentence Transformer model for query embedding
def initialize_embedding_model(model_name="all-MiniLM-L6-v2"):
    try:
        model = SentenceTransformer(model_name)
        print(f"Loaded Sentence Transformer model: {model_name}")
        return model
    except Exception as e:
        print(f"Error loading model: {e}")
        return None

# Initialize ChromaDB and embedding model
chroma_client, chroma_collection = initialize_chroma_client()
embedding_model = initialize_embedding_model()