# Pre-requisites
- WSL
- Miniconda3 

# Setup environment
- Create conda env `conda create langchain python=3.11`
- Set the "langchain" env that has been just created as the running env in VS code


Install langchain and openai package

In [1]:
! pip install python-dotenv



In [2]:
! pip install langchain openai



# Init variables

You need to set value of `OPENAI_API_KEY` that you get from the training team in the .env file

In [3]:
import os
import openai
from dotenv import load_dotenv

load_dotenv()
openai.api_type = "azure"
openai.api_key = os.getenv("OPENAI_API_KEY")
openai.api_base = os.getenv("OPENAI_API_BASE")
openai.api_version = os.getenv("OPENAI_API_VERSION")


# Overviews
The BonBon FAQ.pdf file contains frequently asked questions and answers for customer support scenario. The topics are around IT related issue troubleshooting such as networking, software, hardware. You are requested to provide a solution to build a chat bot capable of answering the user questions with LangChain.

## Assignment 1: Document Indexing (mandatory)

- The content of BonBon FAQ.pdf should be indexed to the local Chroma vector DB from where the chatbot can lookup the appropriate information to answer questions.
- Should use some embedding model such as Azure Open AI text-embedding-ada-002 to create vectors, feel free to use any other open source embedding model if it works.

In [4]:
! pip install chromadb tiktoken pypdf duckduckgo-search chromadb
! pip install -U langchain-community
! pip install -U langchain-openai
! pip install -U langchain-chroma
! pip install -U langchain-core
! pip install scikit-learn



In [5]:
from langchain.document_loaders import PyPDFLoader
from langchain_openai import AzureOpenAIEmbeddings
from langchain.vectorstores import Chroma
from dotenv import load_dotenv
import os

# Load environment variables
load_dotenv()

def initialize_faq_vectorstore(pdf_path: str = "data/BonBon FAQ.pdf", persist_dir: str = "chroma_db") -> Chroma:
    pdf_loader = PyPDFLoader(pdf_path)
    faq_documents = pdf_loader.load()

    embedding_model = AzureOpenAIEmbeddings(
        deployment=os.getenv("OPENAI_API_DELOYMENT_NAME"),
        model=os.getenv("OPENAI_API_MODEL_NAME"),
        openai_api_key=os.getenv("OPENAI_API_KEY"),
        azure_endpoint=os.getenv("AZURE_OPENAI_ENDPOINT"),
        openai_api_version=os.getenv("OPENAI_API_VERSION")
    )

    vector_store = Chroma.from_documents(
        documents=faq_documents,
        embedding=embedding_model,
        persist_directory=persist_dir
    )
    return vector_store

# Initialize the vectorstore
vectordb = initialize_faq_vectorstore()


Failed to send telemetry event ClientStartEvent: capture() takes 1 positional argument but 3 were given
Failed to send telemetry event ClientCreateCollectionEvent: capture() takes 1 positional argument but 3 were given


## Assignment 2: Building Chatbot (mandatory)
- You are requested to build a chatbot solution for customer support scenario using Conversational ReAct agent supported in LangChain
- The chatbot is able to support user to answer FAQs in the sample BonBon FAQ.pdf file.
- The chatbot should use Azure Open AI GPT-3.5 LLM as the reasoning engine.
- The chatbot should be context aware, meaning that it should be able to chat with users in the conversation manner.
- The agent is equipped the following tools:
  - Internet Search: Help the chatbot automatically find out more about something using Duck Duck Go internet search
  - Knowledge Base Search: Help the chatbot to lookup information in the private knowledge base
- In case user asks for information related to topics in the BonBon FAQ.pdf file such as internet connection, printer, malware issues the chatbot must use the private knowledge base, otherwise it should search on the internet to answer the question.
- In the answer of chatbot, it should mention the source file and the page that the answer belongs to, for example the answer should mention "BonBon FQA.pdf (page 2)"

In [None]:
import os
import sys
import time
import logging
from typing import Optional
from pathlib import Path


from dotenv import load_dotenv
from langchain.document_loaders import PyPDFLoader
from langchain_openai import AzureOpenAIEmbeddings, AzureChatOpenAI
from langchain.chains import RetrievalQA
from langchain.tools import DuckDuckGoSearchRun
from langchain.schema.messages import HumanMessage

try:
    from langchain_chroma import Chroma
except ImportError:
    from langchain.vectorstores import Chroma


class SmartFAQBot:
    def __init__(self, pdf_file: str = "data/BonBon FAQ.pdf", db_path: str = "chroma_db", max_results: int = 3):
        self.pdf_file = Path(pdf_file)
        self.db_path = Path(db_path)
        self.max_results = max_results
        
        self.embedder: Optional[AzureOpenAIEmbeddings] = None
        self.vector_store: Optional[Chroma] = None
        self.llm: Optional[AzureChatOpenAI] = None
        self.search_tool: Optional[DuckDuckGoSearchRun] = None
        
        logging.basicConfig(level=logging.ERROR)
        self.logger = logging.getLogger(__name__)
        
        load_dotenv()
        self._init_all()
    
    def _init_all(self):
        try:
            self._init_embedder()
            self._load_pdf_data()
            self._init_chat()
            self._init_search()
        except Exception as e:
            self.logger.error(f"Bot initialization failed: {e}")
            raise
    
    def _init_embedder(self):
        self.embedder = AzureOpenAIEmbeddings(
            deployment=os.getenv("OPENAI_API_DELOYMENT_NAME"),
            model=os.getenv("OPENAI_API_MODEL_NAME"),
            openai_api_key=os.getenv("OPENAI_API_KEY"),
            azure_endpoint=os.getenv("AZURE_OPENAI_ENDPOINT"),
            openai_api_version=os.getenv("OPENAI_API_VERSION")
        )
    
    def _load_pdf_data(self):
        if not self.pdf_file.exists():
            raise FileNotFoundError(f"PDF file not found: {self.pdf_file}")
        
        if self.db_path.exists() and any(self.db_path.iterdir()):
            self.vector_store = Chroma(persist_directory=str(self.db_path), embedding_function=self.embedder)
        else:
            loader = PyPDFLoader(str(self.pdf_file))
            documents = loader.load()
            
            if not documents:
                raise ValueError("PDF contains no readable content")
            
            self.vector_store = Chroma.from_documents(
                documents=documents, embedding=self.embedder, persist_directory=str(self.db_path)
            )
            self.vector_store.persist()
    
    def _init_chat(self):
        try:
            endpoint = os.getenv("AZURE_OPENAI_CHAT_ENDPOINT") or os.getenv("AZURE_OPENAI_ENDPOINT")
            api_key = os.getenv("OPENAI_API_CHAT_KEY") or os.getenv("OPENAI_API_KEY")
            
            self.llm = AzureChatOpenAI(
                deployment_name=os.getenv("OPENAI_API_CHAT_DEPLOYMENT_NAME"),
                model_name=os.getenv("OPENAI_API_CHAT_MODEL_NAME"),
                openai_api_key=api_key,
                azure_endpoint=endpoint,
                openai_api_version=os.getenv("OPENAI_API_VERSION"),
                temperature=0.1,
                max_tokens=1000
            )
        except Exception as e:
            self.logger.error(f"Chat model setup failed: {e}")
            raise
    
    def _init_search(self):
        self.search_tool = DuckDuckGoSearchRun()
    
    def _is_faq_query(self, question: str) -> bool:
        tech_keywords = [
            "internet", "printer", "computer", "software", "hardware", "network", 
            "connection", "troubleshoot", "fix", "error", "slow", "not working", 
            "can't", "won't", "malware", "virus", "install", "setup", "configure", 
            "password", "wifi", "browser", "email", "login", "update", "crash",
            "freeze", "boot", "startup", "driver", "device", "usb", "monitor",
            "keyboard", "mouse", "sound", "audio", "video", "screen", "display"
        ]
        
        question_lower = question.lower()
        return any(keyword in question_lower for keyword in tech_keywords)
    
    def _get_faq_answer(self, question: str) -> str:
        try:
            retriever = self.vector_store.as_retriever(search_kwargs={"k": self.max_results})
            qa_chain = RetrievalQA.from_chain_type(
                llm=self.llm, retriever=retriever, return_source_documents=True, verbose=False
            )
            
            result = qa_chain.invoke({"query": question})
            answer = result["result"]
            docs = result.get("source_documents", [])
            
            page_refs = set()
            for doc in docs:
                source = doc.metadata.get("source", "BonBon FAQ.pdf")
                page_num = doc.metadata.get("page")
                if page_num is not None:
                    page_refs.add((Path(source).name, int(page_num) + 1))
            
            response = f"\n\n**Answer:**\n{answer.strip()}"
            
            if page_refs:
                sources = "\n".join([f"- {file} (Page {page})" for file, page in sorted(page_refs)])
                response += f"\n\n**Sources:**\n{sources}"
            
            return response + "\n" + "-" * 50
            
        except Exception as e:
            return f"\n\n**FAQ Error:**\nCould not retrieve answer: {str(e)}\n" + "-" * 50
    
    def _web_search(self, question: str) -> str:
        try:
            time.sleep(1)
            result = self.search_tool.run(question)
            return f"\n\n**Web Search:**\n{result.strip()}\n" + "-" * 50
        except Exception:
            return (f"\n\n**Search Unavailable:**\n"
                   f"Web search is currently unavailable. Please try again later or "
                   f"ask about topics in our FAQ database.\n" + "-" * 50)
    
    def get_response(self, question: str) -> str:
        if not question.strip():
            return "Please ask a valid question."
        
        try:
            if self._is_faq_query(question):
                return self._get_faq_answer(question)
            else:
                return self._web_search(question)
        except Exception as e:
            return f"**Error:** An unexpected error occurred: {str(e)}"
    
    def start_chat(self):
        print("Smart FAQ Bot is ready!")
        print("Ask technical questions or general topics.")
        print("Commands: 'exit', 'quit', 'q' to stop.\n")
        
        while True:
            try:
                user_question = input("Your question (or 'exit'): ").strip()
                
                if user_question.lower() in ["exit", "quit", "q"]:
                    print("Goodbye!")
                    break
                
                if not user_question:
                    print("Please enter a question.\n")
                    continue
                
                response = self.get_response(user_question)
                print(response)
                print()
                
            except KeyboardInterrupt:
                print("\nBye!")
                break
            except Exception as e:
                print(f"Error: {e}\n")


def init_vector_db(pdf_file: str = "data/BonBon FAQ.pdf", db_path: str = "chroma_db") -> Chroma:
    try:
        bot = SmartFAQBot(pdf_file=pdf_file, db_path=db_path)
        return bot.vector_store
    except Exception as e:
        raise


def create_faq_bot(pdf_file: str = "data/BonBon FAQ.pdf", db_path: str = "chroma_db") -> SmartFAQBot:
    return SmartFAQBot(pdf_file=pdf_file, db_path=db_path)


if __name__ == "__main__":
    try:
        bot = create_faq_bot()
        bot.start_chat()
    except Exception as e:
        print(f"Bot startup failed: {e}")
        sys.exit(1)

ERROR:chromadb.telemetry.product.posthog:Failed to send telemetry event ClientStartEvent: capture() takes 1 positional argument but 3 were given
ERROR:chromadb.telemetry.product.posthog:Failed to send telemetry event ClientCreateCollectionEvent: capture() takes 1 positional argument but 3 were given


Smart FAQ Bot is ready!
Ask technical questions or general topics.
Commands: 'exit', 'quit', 'q' to stop.



Your question (or 'exit'):  1 + 1  = ?




**Web Search:**
Trong toán học, 1 + 1 luôn bằng 2. Nhưng trong đời sống vì sao 1 + 1 lại có nhiều đáp số? Ý thức và thái độ quyết định kết quả. Đối với nhiều người, câu hỏi tưởng như vô cùng đơn giản: "Tại sao 1 + 1 = 2?" lại là một trong những câu hỏi khó trả lời nhất. Tại sao? Vì nó gần như là hiển nhiên. Bạn có 1 trái táo, sau đó có người cho bạn 1 trái nữa, thì bạn có 2 trái, tự nhiên nó đã như thế. Ever wondered why 1 + 1 always equals 2 and not 3, 4, or 5? In just one minute, we break down the fundamental principles of mathematics that make this equation a universal truth. Explore the ... 1.1.1.1 with WARP replaces your connection with a more modern and optimized protocol — making things much more private compared to traditional wired protocols. Available on both PC and mobile, this anti-spy software makes sure that your data is always safe each time you go online. What is 1 + 1 in Boolean Algebra? A Beginner's Guide Boolean algebra, a fascinating branch of mathematics, forms t

Your question (or 'exit'):   How do I reset my password?


ERROR:chromadb.telemetry.product.posthog:Failed to send telemetry event CollectionQueryEvent: capture() takes 1 positional argument but 3 were given




**Answer:**
To reset your password, you can go to the "Where to Reset my Password for which application" web page at the following link: www.anycorp.intranet.passwordreset.com. There, you will be able to select the application for which you need to reset your password and follow the provided instructions.

**Sources:**
- BonBon FAQ.pdf (Page 3)
--------------------------------------------------



## Assignment 3: Build a new assistant based on BonBon source code (optional)
The objective
- Run the code and index the sample BonBon FAQ.pdf file to Azure Cognitive Search
- Explore the code and implement a new assistant that has the same behavior as above
- Explore other features such as RBACs, features on admin portal

Please contact the training team in case you need to get the source code of BonBon.