# Pre-requisites
- WSL
- Miniconda3 

# Setup environment
- Create conda env `conda create langchain python=3.11`
- Set the "langchain" env that has been just created as the running env in VS code


Install langchain and openai package

In [14]:
! pip install langchain langchain-openai langchain-community chromadb pypdf duckduckgo-search langchain-chroma



# Init variables

## ⚠️ Important: Azure OpenAI Configuration

Before running this notebook, you need to create a `.env` file in the project root directory with your Azure OpenAI credentials:

```
AZURE_OPENAI_ENDPOINT=https://your-resource-name.openai.azure.com/
AZURE_OPENAI_KEY=your-api-key-here
AZURE_OPENAI_API_VERSION=2024-02-15-preview
AZURE_OPENAI_DEPLOYMENT_NAME=your-chat-model-deployment-name
AZURE_OPENAI_EMBEDDING_DEPLOYMENT=your-embedding-model-deployment-name
```

A `.env.example` file has been created in the project root for reference. Copy it to `.env` and update with your actual credentials.

In [15]:
import openai, os
from dotenv import load_dotenv

load_dotenv()

# Azure OpenAI Configuration
AZURE_OPENAI_ENDPOINT = os.getenv("AZURE_OPENAI_ENDPOINT")
AZURE_OPENAI_KEY = os.getenv("AZURE_OPENAI_KEY")
AZURE_OPENAI_API_VERSION = os.getenv("AZURE_OPENAI_API_VERSION", "2024-02-15-preview")
AZURE_OPENAI_DEPLOYMENT_NAME = os.getenv("AZURE_OPENAI_DEPLOYMENT_NAME")
AZURE_OPENAI_EMBEDDING_DEPLOYMENT = os.getenv("AZURE_OPENAI_EMBEDDING_DEPLOYMENT_NAME", "text-embedding-3-small")

print("✓ Environment variables loaded")

✓ Environment variables loaded


# Overviews
The BonBon FAQ.pdf file contains frequently asked questions and answers for customer support scenario. The topics are around IT related issue troubleshooting such as networking, software, hardware. You are requested to provide a solution to build a chat bot capable of answering the user questions with LangChain.

## Assignment 1: Document Indexing (mandatory)

- The content of BonBon FAQ.pdf should be indexed to the local Chroma vector DB from where the chatbot can lookup the appropriate information to answer questions.
- Should use some embedding model such as Azure Open AI text-embedding-3-small to create vectors, feel free to use any other open source embedding model if it works.

In [16]:
# Assignment 1: Document Indexing
from langchain_community.document_loaders import PyPDFLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_openai import AzureOpenAIEmbeddings
from langchain_chroma import Chroma

print("Step 1: Loading PDF document...")
# Load the PDF document
loader = PyPDFLoader("data/BonBon FAQ.pdf")
documents = loader.load()
print(f"✓ Loaded {len(documents)} pages from BonBon FAQ.pdf")

print("\nStep 2: Splitting documents into chunks...")
# Split the documents into smaller chunks for better retrieval
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=200,
    length_function=len,
)
splits = text_splitter.split_documents(documents)
print(f"✓ Created {len(splits)} text chunks")

print("\nStep 3: Creating embeddings...")
# Initialize Azure OpenAI Embeddings
embeddings = AzureOpenAIEmbeddings(
    azure_endpoint=AZURE_OPENAI_ENDPOINT,
    api_key=AZURE_OPENAI_KEY,
    api_version=AZURE_OPENAI_API_VERSION,
    deployment=AZURE_OPENAI_EMBEDDING_DEPLOYMENT
)
print("✓ Embeddings model initialized")

print("\nStep 4: Creating Chroma vector store...")
# Create Chroma vector store
vectorstore = Chroma.from_documents(
    documents=splits,
    embedding=embeddings,
    persist_directory="./chroma_db",
    collection_name="bonbon_faq"
)
print(f"✓ Vector store created with {vectorstore._collection.count()} vectors")

print("\n✅ Document indexing completed successfully!")
print("   - Source: BonBon FAQ.pdf")
print(f"   - Pages indexed: {len(documents)}")
print(f"   - Text chunks: {len(splits)}")
print("   - Vector DB: Chroma (local)")
print(f"   - Embedding model: {AZURE_OPENAI_EMBEDDING_DEPLOYMENT}")

Step 1: Loading PDF document...
✓ Loaded 14 pages from BonBon FAQ.pdf

Step 2: Splitting documents into chunks...
✓ Created 36 text chunks

Step 3: Creating embeddings...
✓ Embeddings model initialized

Step 4: Creating Chroma vector store...
✓ Vector store created with 36 vectors

✅ Document indexing completed successfully!
   - Source: BonBon FAQ.pdf
   - Pages indexed: 14
   - Text chunks: 36
   - Vector DB: Chroma (local)
   - Embedding model: text-embedding-3-large


## Assignment 2: Building Chatbot (mandatory)
- You are requested to build a chatbot solution for customer support scenario using Conversational ReAct agent supported in LangChain
- The chatbot is able to support user to answer FAQs in the sample BonBon FAQ.pdf file.
- The chatbot should use Azure Open AI GPT-4o LLM as the reasoning engine.
- The chatbot should be context aware, meaning that it should be able to chat with users in the conversation manner.
- The agent is equipped the following tools:
  - Internet Search: Help the chatbot automatically find out more about something using Duck Duck Go internet search
  - Knowledge Base Search: Help the chatbot to lookup information in the private knowledge base
- In case user asks for information related to topics in the BonBon FAQ.pdf file such as internet connection, printer, malware issues the chatbot must use the private knowledge base, otherwise it should search on the internet to answer the question.
- In the answer of chatbot, it should mention the source file and the page that the answer belongs to, for example the answer should mention "BonBon FQA.pdf (page 2)"

In [17]:
# Assignment 2: Building Chatbot with Conversational ReAct Agent
from langchain_openai import AzureChatOpenAI
from langchain_core.tools import tool
from langchain_community.tools import DuckDuckGoSearchRun
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langgraph.prebuilt import create_react_agent
from langgraph.checkpoint.memory import MemorySaver

print("Step 1: Initializing Azure OpenAI GPT-4o...")
# Initialize the LLM (GPT-4o)
llm = AzureChatOpenAI(
    azure_endpoint=AZURE_OPENAI_ENDPOINT,
    api_key=AZURE_OPENAI_KEY,
    api_version=AZURE_OPENAI_API_VERSION,
    deployment_name=AZURE_OPENAI_DEPLOYMENT_NAME,
    temperature=0,
    streaming=True
)
print("✓ GPT-4o initialized")

print("\nStep 2: Creating tools...")

# Tool 1: Internet Search using DuckDuckGo
try:
    internet_search = DuckDuckGoSearchRun()
    print("✓ Internet search tool created")
except Exception as e:
    print(f"⚠️ DuckDuckGo search not available: {e}")
    print("  Install with: pip install -U ddgs")
    # Create a mock tool for demonstration
    @tool
    def internet_search(query: str) -> str:
        """Search the internet for information using DuckDuckGo."""
        return "Internet search is not configured. Please install ddgs package."

# Tool 2: Knowledge Base Search
@tool
def knowledge_base_search(query: str) -> str:
    """
    Search the BonBon FAQ knowledge base for IT support information.
    Use this tool when users ask about:
    - Internet connection issues
    - Printer problems
    - Malware or virus issues
    - Software troubleshooting
    - Hardware problems
    - Any IT-related FAQs
    
    Args:
        query: The search query to find information in the knowledge base
        
    Returns:
        Relevant information from the BonBon FAQ with source citation
    """
    # Load the existing vector store
    vectorstore = Chroma(
        persist_directory="./chroma_db",
        embedding_function=embeddings,
        collection_name="bonbon_faq"
    )
    
    # Perform similarity search
    results = vectorstore.similarity_search(query, k=3)
    
    if not results:
        return "No relevant information found in the knowledge base."
    
    # Format the response with source citations
    response_parts = []
    for i, doc in enumerate(results, 1):
        page_num = doc.metadata.get('page', 'unknown')
        source = doc.metadata.get('source', 'BonBon FAQ.pdf')
        content = doc.page_content.strip()
        
        response_parts.append(
            f"[Source {i}: {source} (page {page_num + 1})]\n{content}"
        )
    
    return "\n\n".join(response_parts)

print("✓ Knowledge base search tool created")

# Combine tools
tools = [internet_search, knowledge_base_search]
print(f"✓ Total tools available: {len(tools)}")

print("\nStep 3: Creating conversational agent...")

# Create system prompt
system_prompt = """You are a helpful IT support chatbot assistant for BonBon company.

Your responsibilities:
1. Answer user questions about IT-related issues using the knowledge base
2. For topics in the BonBon FAQ (networking, printer, malware, software, hardware), ALWAYS use the knowledge_base_search tool
3. For general questions outside the FAQ, use the internet_search tool
4. Always cite your sources with the format "BonBon FAQ.pdf (page X)" when using the knowledge base
5. Be conversational and remember the context of the conversation
6. Be helpful, professional, and clear in your responses

When you provide an answer from the knowledge base, always mention the source file and page number."""

# Create prompt template
prompt = ChatPromptTemplate.from_messages([
    ("system", system_prompt),
    ("placeholder", "{messages}")
])

# Create memory for conversation context
memory = MemorySaver()

# Create the ReAct agent with memory
agent = create_react_agent(
    llm,
    tools,
    prompt=prompt,
    checkpointer=memory
)

print("✓ Conversational ReAct agent created")

print("\n✅ Chatbot setup completed successfully!")
print("\nAgent capabilities:")
print("  - Context-aware conversations (with memory)")
print("  - Knowledge base search (BonBon FAQ)")
print("  - Internet search (DuckDuckGo)")
print("  - Source citation for answers")
print("\nAgent is ready to chat!")

Step 1: Initializing Azure OpenAI GPT-4o...
✓ GPT-4o initialized

Step 2: Creating tools...
⚠️ DuckDuckGo search not available: Could not import ddgs python package. Please install it with `pip install -U ddgs`.
  Install with: pip install -U ddgs
✓ Knowledge base search tool created
✓ Total tools available: 2

Step 3: Creating conversational agent...
✓ Conversational ReAct agent created

✅ Chatbot setup completed successfully!

Agent capabilities:
  - Context-aware conversations (with memory)
  - Knowledge base search (BonBon FAQ)
  - Internet search (DuckDuckGo)
  - Source citation for answers

Agent is ready to chat!


/var/folders/bm/rywd5s097fz2htlkqjrbkp440000gn/T/ipykernel_11513/834732341.py:112: LangGraphDeprecatedSinceV10: create_react_agent has been moved to `langchain.agents`. Please update your import to `from langchain.agents import create_agent`. Deprecated in LangGraph V1.0 to be removed in V2.0.
  agent = create_react_agent(


## Testing the Chatbot

Let's test the chatbot with different types of questions:

In [18]:
# Helper function to chat with the agent
def chat(message, thread_id="default-session"):
    """
    Send a message to the chatbot and get a response.
    
    Args:
        message: The user's message
        thread_id: Session ID to maintain conversation context
    """
    config = {"configurable": {"thread_id": thread_id}}
    
    print(f"\n{'='*60}")
    print(f"USER: {message}")
    print(f"{'='*60}")
    
    response = agent.invoke(
        {"messages": [("human", message)]},
        config
    )
    
    assistant_message = response["messages"][-1].content
    print(f"ASSISTANT: {assistant_message}")
    print(f"{'='*60}\n")
    
    return assistant_message

print("✓ Chat helper function ready")

✓ Chat helper function ready


### Test 1: Knowledge Base Question (IT Support from FAQ)

In [19]:
# Test with an IT support question that should be in the FAQ
chat("My internet connection is not working. How can I fix it?")


USER: My internet connection is not working. How can I fix it?
ASSISTANT: To troubleshoot your internet connection issue, please follow these steps:

1. **Check Physical Connections**:
   - Ensure all cables (Ethernet, modem, router, etc.) are securely connected.
   - Power cycle your modem and router by unplugging them from the power source, waiting for 30 seconds, and then plugging them back in.

2. **Verify Wi-Fi Settings (if using wireless)**:
   - Make sure the Wi-Fi on your device is turned on.
   - Check if you are connected to the correct Wi-Fi network.
   - Try disconnecting and reconnecting to the Wi-Fi network.

3. **Test Connectivity on Other Devices**:
   - Check if other devices (e.g., smartphones, tablets, or other computers) can connect to the internet. This will help determine if the issue is specific to your device or a broader network problem.

4. **Restart Your Device**:
   - Restart your computer or device to refresh the network settings.

5. **Disable/Enable Netw

'To troubleshoot your internet connection issue, please follow these steps:\n\n1. **Check Physical Connections**:\n   - Ensure all cables (Ethernet, modem, router, etc.) are securely connected.\n   - Power cycle your modem and router by unplugging them from the power source, waiting for 30 seconds, and then plugging them back in.\n\n2. **Verify Wi-Fi Settings (if using wireless)**:\n   - Make sure the Wi-Fi on your device is turned on.\n   - Check if you are connected to the correct Wi-Fi network.\n   - Try disconnecting and reconnecting to the Wi-Fi network.\n\n3. **Test Connectivity on Other Devices**:\n   - Check if other devices (e.g., smartphones, tablets, or other computers) can connect to the internet. This will help determine if the issue is specific to your device or a broader network problem.\n\n4. **Restart Your Device**:\n   - Restart your computer or device to refresh the network settings.\n\n5. **Disable/Enable Network Adapters**:\n   - For Windows:\n     - Go to the Cont

### Test 2: Another Knowledge Base Question (Printer Issue)

In [None]:
# Test with a printer-related question
chat("My printer is not printing. What should I do?")


USER: My printer is not printing. What should I do?


### Test 3: Testing Conversation Context (Memory)

In [None]:
# Test conversation context - asking a follow-up question
# This should remember the previous printer question
chat("Can you give me more details about that solution?")

### Test 4: Internet Search Question (Outside FAQ)

In [None]:
# Test with a general question that's NOT in the FAQ
# Should use internet search
chat("What is the latest version of Python?")

## Summary

This solution implements:

### Assignment 1: Document Indexing ✅
- Loads the BonBon FAQ.pdf file
- Splits documents into chunks for better retrieval
- Uses Azure OpenAI embeddings (text-embedding-3-small)
- Stores vectors in local Chroma database
- Maintains page metadata for source citation

### Assignment 2: Conversational ReAct Agent ✅
- Uses Azure OpenAI GPT-4o as the reasoning engine
- Context-aware with conversation memory (MemorySaver)
- Equipped with two tools:
  1. **Internet Search**: DuckDuckGo search for general questions
  2. **Knowledge Base Search**: Searches BonBon FAQ for IT support
- Automatically chooses the right tool based on the question
- Provides source citations with file name and page number
- Conversational and maintains context across messages

### Key Features:
- ✅ PDF document indexing with Chroma
- ✅ Azure OpenAI embeddings
- ✅ GPT-4o powered chatbot
- ✅ Conversational memory
- ✅ Tool-based agent (ReAct pattern)
- ✅ Internet and knowledge base search
- ✅ Source citation (file + page number)
- ✅ Context-aware conversations

## How to Run

1. **Set up environment variables** in `.env` file:
   ```
   AZURE_OPENAI_ENDPOINT=your-endpoint
   AZURE_OPENAI_KEY=your-key
   AZURE_OPENAI_API_VERSION=2024-02-15-preview
   AZURE_OPENAI_DEPLOYMENT_NAME=gpt-4o
   AZURE_OPENAI_EMBEDDING_DEPLOYMENT=text-embedding-3-small
   ```

2. **Install dependencies**:
   Run the first code cell to install all required packages

3. **Run cells in order**:
   - Initialize variables
   - Document indexing (Assignment 1)
   - Build chatbot (Assignment 2)
   - Test the chatbot

4. **Optional**: Install DuckDuckGo search support:
   ```bash
   pip install -U ddgs
   ```

The chatbot is now ready to answer questions! You can use the `chat()` function to interact with it.