# Pre-requisites
- WSL
- Miniconda3 

# Setup environment
- Create conda env `conda create langchain python=3.11`
- Set the "langchain" env that has been just created as the running env in VS code


Install langchain and openai package

In [1]:
! pip install python-dotenv



In [2]:
! pip install langchain openai



# Init variables

You need to set value of `OPENAI_API_KEY` that you get from the training team in the .env file

In [3]:
import os
import openai
from dotenv import load_dotenv

load_dotenv()
openai.api_type = "azure"
openai.api_key = os.getenv("OPENAI_API_KEY")
openai.api_base = os.getenv("OPENAI_API_BASE")
openai.api_version = os.getenv("OPENAI_API_VERSION")


# Overviews
The BonBon FAQ.pdf file contains frequently asked questions and answers for customer support scenario. The topics are around IT related issue troubleshooting such as networking, software, hardware. You are requested to provide a solution to build a chat bot capable of answering the user questions with LangChain.

## Assignment 1: Document Indexing (mandatory)

- The content of BonBon FAQ.pdf should be indexed to the local Chroma vector DB from where the chatbot can lookup the appropriate information to answer questions.
- Should use some embedding model such as Azure Open AI text-embedding-ada-002 to create vectors, feel free to use any other open source embedding model if it works.

In [4]:
! pip install chromadb tiktoken pypdf duckduckgo-search chromadb
! pip install -U langchain-community
! pip install -U langchain-openai
! pip install -U langchain-chroma
! pip install -U langchain-core
! pip install scikit-learn

Collecting langchain-community
  Downloading langchain_community-0.3.27-py3-none-any.whl.metadata (2.9 kB)
Downloading langchain_community-0.3.27-py3-none-any.whl (2.5 MB)
[2K   [38;2;114;156;31m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.5/2.5 MB[0m [31m4.4 MB/s[0m eta [36m0:00:00[0m MB/s[0m eta [36m0:00:01[0m:01[0m
[?25hInstalling collected packages: langchain-community
  Attempting uninstall: langchain-community
    Found existing installation: langchain-community 0.3.26
    Uninstalling langchain-community-0.3.26:
      Successfully uninstalled langchain-community-0.3.26
Successfully installed langchain-community-0.3.27


In [5]:
from langchain.document_loaders import PyPDFLoader
from langchain_openai import AzureOpenAIEmbeddings
from langchain.vectorstores import Chroma
from dotenv import load_dotenv
import os

# Load environment variables
load_dotenv()

def initialize_faq_vectorstore(pdf_path: str = "data/BonBon FAQ.pdf", persist_dir: str = "chroma_db") -> Chroma:
    pdf_loader = PyPDFLoader(pdf_path)
    faq_documents = pdf_loader.load()

    embedding_model = AzureOpenAIEmbeddings(
        deployment=os.getenv("OPENAI_API_DELOYMENT_NAME"),
        model=os.getenv("OPENAI_API_MODEL_NAME"),
        openai_api_key=os.getenv("OPENAI_API_KEY"),
        azure_endpoint=os.getenv("AZURE_OPENAI_ENDPOINT"),
        openai_api_version=os.getenv("OPENAI_API_VERSION")
    )

    vector_store = Chroma.from_documents(
        documents=faq_documents,
        embedding=embedding_model,
        persist_directory=persist_dir
    )
    return vector_store

# Initialize the vectorstore
vectordb = initialize_faq_vectorstore()


Failed to send telemetry event ClientStartEvent: capture() takes 1 positional argument but 3 were given
Failed to send telemetry event ClientCreateCollectionEvent: capture() takes 1 positional argument but 3 were given


## Assignment 2: Building Chatbot (mandatory)
- You are requested to build a chatbot solution for customer support scenario using Conversational ReAct agent supported in LangChain
- The chatbot is able to support user to answer FAQs in the sample BonBon FAQ.pdf file.
- The chatbot should use Azure Open AI GPT-3.5 LLM as the reasoning engine.
- The chatbot should be context aware, meaning that it should be able to chat with users in the conversation manner.
- The agent is equipped the following tools:
  - Internet Search: Help the chatbot automatically find out more about something using Duck Duck Go internet search
  - Knowledge Base Search: Help the chatbot to lookup information in the private knowledge base
- In case user asks for information related to topics in the BonBon FAQ.pdf file such as internet connection, printer, malware issues the chatbot must use the private knowledge base, otherwise it should search on the internet to answer the question.
- In the answer of chatbot, it should mention the source file and the page that the answer belongs to, for example the answer should mention "BonBon FQA.pdf (page 2)"

In [None]:
import os
import sys
import time
import logging
from typing import Optional
from pathlib import Path


from dotenv import load_dotenv
from langchain.document_loaders import PyPDFLoader
from langchain_openai import AzureOpenAIEmbeddings, AzureChatOpenAI
from langchain.chains import RetrievalQA
from langchain.tools import DuckDuckGoSearchRun
from langchain.schema.messages import HumanMessage

try:
    from langchain_chroma import Chroma
except ImportError:
    from langchain.vectorstores import Chroma


class SmartFAQBot:
    def __init__(self, pdf_file: str = "data/BonBon FAQ.pdf", db_path: str = "chroma_db", max_results: int = 3):
        self.pdf_file = Path(pdf_file)
        self.db_path = Path(db_path)
        self.max_results = max_results
        
        self.embedder: Optional[AzureOpenAIEmbeddings] = None
        self.vector_store: Optional[Chroma] = None
        self.llm: Optional[AzureChatOpenAI] = None
        self.search_tool: Optional[DuckDuckGoSearchRun] = None
        
        logging.basicConfig(level=logging.ERROR)
        self.logger = logging.getLogger(__name__)
        
        self._init_all()
    
    def _init_all(self):
        try:
            self._init_embedder()
            self._load_pdf_data()
            self._init_chat()
            self._init_search()
        except Exception as e:
            self.logger.error(f"Bot initialization failed: {e}")
            raise
    
    def _init_embedder(self):
        self.embedder = AzureOpenAIEmbeddings(
            deployment=os.getenv("OPENAI_API_DELOYMENT_NAME"),
            model=os.getenv("OPENAI_API_MODEL_NAME"),
            openai_api_key=os.getenv("OPENAI_API_KEY"),
            azure_endpoint=os.getenv("AZURE_OPENAI_ENDPOINT"),
            openai_api_version=os.getenv("OPENAI_API_VERSION")
        )
    
    def _load_pdf_data(self):
        if not self.pdf_file.exists():
            raise FileNotFoundError(f"PDF file not found: {self.pdf_file}")
        
        if self.db_path.exists() and any(self.db_path.iterdir()):
            self.vector_store = Chroma(persist_directory=str(self.db_path), embedding_function=self.embedder)
        else:
            loader = PyPDFLoader(str(self.pdf_file))
            documents = loader.load()
            
            if not documents:
                raise ValueError("PDF contains no readable content")
            
            self.vector_store = Chroma.from_documents(
                documents=documents, embedding=self.embedder, persist_directory=str(self.db_path)
            )
            self.vector_store.persist()
    
    def _init_chat(self):
        try:
            endpoint = os.getenv("AZURE_OPENAI_CHAT_ENDPOINT") or os.getenv("AZURE_OPENAI_ENDPOINT")
            api_key = os.getenv("OPENAI_API_CHAT_KEY") or os.getenv("OPENAI_API_KEY")
            
            self.llm = AzureChatOpenAI(
                deployment_name=os.getenv("OPENAI_API_CHAT_DEPLOYMENT_NAME"),
                model_name=os.getenv("OPENAI_API_CHAT_MODEL_NAME"),
                openai_api_key=api_key,
                azure_endpoint=endpoint,
                openai_api_version=os.getenv("OPENAI_API_VERSION"),
                temperature=0.1,
                max_tokens=1000
            )
        except Exception as e:
            self.logger.error(f"Chat model setup failed: {e}")
            raise
    
    def _init_search(self):
        self.search_tool = DuckDuckGoSearchRun()
    
    def _classify_query(self, question: str) -> str:
        try:
            classification_prompt = f"""
            You are a query classifier. Based on the following question, determine whether it should be answered using:
            1. FAQ - if it's about technical issues, troubleshooting, software/hardware problems, IT support, or specific product/service questions that would typically be in a FAQ document
            2. WEB - if it's about current events, general knowledge, recent information, or topics not typically covered in technical FAQ documents
            
            Question: "{question}"
            
            Respond with only one word: either "FAQ" or "WEB"
            """
            
            message = HumanMessage(content=classification_prompt)
            response = self.llm.invoke([message])
            
            classification = response.content.strip().upper()
            
            if classification not in ["FAQ", "WEB"]:
                tech_keywords = [
                    "internet", "printer", "computer", "software", "hardware", "network", 
                    "connection", "troubleshoot", "fix", "error", "slow", "not working", 
                    "can't", "won't", "malware", "virus", "install", "setup", "configure", 
                    "password", "wifi", "browser", "email", "login", "update", "crash",
                    "freeze", "boot", "startup", "driver", "device", "usb", "monitor",
                    "keyboard", "mouse", "sound", "audio", "video", "screen", "display"
                ]
                
                question_lower = question.lower()
                if any(keyword in question_lower for keyword in tech_keywords):
                    return "FAQ"
                else:
                    return "WEB"
            
            return classification
            
        except Exception as e:
            self.logger.error(f"Query classification failed: {e}")
            tech_keywords = [
                "internet", "printer", "computer", "software", "hardware", "network", 
                "connection", "troubleshoot", "fix", "error", "slow", "not working", 
                "can't", "won't", "malware", "virus", "install", "setup", "configure", 
                "password", "wifi", "browser", "email", "login", "update", "crash",
                "freeze", "boot", "startup", "driver", "device", "usb", "monitor",
                "keyboard", "mouse", "sound", "audio", "video", "screen", "display"
            ]
            
            question_lower = question.lower()
            if any(keyword in question_lower for keyword in tech_keywords):
                return "FAQ"
            else:
                return "WEB"
    
    def _get_faq_answer(self, question: str) -> str:
        try:
            retriever = self.vector_store.as_retriever(search_kwargs={"k": self.max_results})
            qa_chain = RetrievalQA.from_chain_type(
                llm=self.llm, retriever=retriever, return_source_documents=True, verbose=False
            )
            
            result = qa_chain.invoke({"query": question})
            answer = result["result"]
            docs = result.get("source_documents", [])
            
            page_refs = set()
            for doc in docs:
                source = doc.metadata.get("source", "BonBon FAQ.pdf")
                page_num = doc.metadata.get("page")
                if page_num is not None:
                    page_refs.add((Path(source).name, int(page_num) + 1))
            
            response = f"\n\n**Answer:**\n{answer.strip()}"
            
            if page_refs:
                sources = "\n".join([f"- {file} (Page {page})" for file, page in sorted(page_refs)])
                response += f"\n\n**Sources:**\n{sources}"
            
            return response + "\n" + "-" * 50
            
        except Exception as e:
            return f"\n\n**FAQ Error:**\nCould not retrieve answer: {str(e)}\n" + "-" * 50
    
    def _web_search(self, question: str) -> str:
        try:
            result = self.search_tool.run(question)
            return f"\n\n**Web Search:**\n{result.strip()}\n" + "-" * 50
        except Exception:
            return (f"\n\n**Search Unavailable:**\n"
                   f"Web search is currently unavailable. Please try again later or "
                   f"ask about topics in our FAQ database.\n" + "-" * 50)
    
    def get_response(self, question: str) -> str:
        if not question.strip():
            return "Please ask a valid question."
        
        try:
            query_type = self._classify_query(question)
            
            if query_type == "FAQ":
                return self._get_faq_answer(question)
            else:
                return self._web_search(question)
                
        except Exception as e:
            return f"**Error:** An unexpected error occurred: {str(e)}"
    
    def start_chat(self):
        print("Smart FAQ Bot is ready!")
        print("Ask technical questions or general topics.")
        print("Commands: 'exit', 'quit', 'q' to stop.\n")
        
        while True:
            try:
                user_question = input("Your question (or 'exit'): ").strip()
                
                if user_question.lower() in ["exit", "quit", "q"]:
                    print("Goodbye!")
                    break
                
                if not user_question:
                    print("Please enter a question.\n")
                    continue
                
                response = self.get_response(user_question)
                print(response)
                print()
                
            except KeyboardInterrupt:
                print("\nBye!")
                break
            except Exception as e:
                print(f"Error: {e}\n")


def create_faq_bot(pdf_file: str = "data/BonBon FAQ.pdf", db_path: str = "chroma_db") -> SmartFAQBot:
    return SmartFAQBot(pdf_file=pdf_file, db_path=db_path)


if __name__ == "__main__":
    try:
        bot = create_faq_bot()
        bot.start_chat()
    except Exception as e:
        print(f"Bot startup failed: {e}")
        sys.exit(1)

ERROR:chromadb.telemetry.product.posthog:Failed to send telemetry event ClientStartEvent: capture() takes 1 positional argument but 3 were given
ERROR:chromadb.telemetry.product.posthog:Failed to send telemetry event ClientCreateCollectionEvent: capture() takes 1 positional argument but 3 were given


Smart FAQ Bot is ready!
Ask technical questions or general topics.
Commands: 'exit', 'quit', 'q' to stop.



Your question (or 'exit'):  How do I connect to Any Corp’s Corporate Wi-Fi network


ERROR:chromadb.telemetry.product.posthog:Failed to send telemetry event CollectionQueryEvent: capture() takes 1 positional argument but 3 were given




**Answer:**
To connect to Any Corp’s Corporate Wi-Fi network, follow these steps based on your device's operating system:
- For Windows: Click on the Wi-Fi icon in the system tray at the bottom right corner of the screen, then click "Network & Internet settings" and select "Wi-Fi" from the left-hand menu.
- For Mac: Click on the Wi-Fi icon in the menu bar at the top right corner of the screen.
- For iOS (iPhone/iPad): Go to "Settings" > "Wi-Fi."
- For Android: Go to "Settings" > "Network & internet" > "Wi-Fi."

**Sources:**
- BonBon FAQ.pdf (Page 4)
--------------------------------------------------



Your question (or 'exit'):  who is president in Russia




**Web Search:**
Dmitry Anatolyevich Medvedev[a][b] (born 14 September 1965) is a Russian politician and lawyer who has served as Deputy Chairman of the Security Council of Russia since 2020. [2] Medvedev was also President of Russia between 2008 and 2012 and Prime Minister of Russia between 2012 and 2020. [3] Medvedev was elected President in the 2008 election. Vladimir Putin (born October 7, 1952, Leningrad, Russia, U.S.S.R. [now St. Petersburg, Russia]) is a Russian intelligence officer and politician who has served as president (1999-2008 and 2012- ) of Russia and as the country's prime minister (1999 and 2008-12). Who Is The Current President Of Russia? The current President of Russia is Vladimir Putin. He has been in top political positions in Russia since 1999, either as president or prime minister. Putin first became president in 2000, winning the election with about 53% of the vote after Boris Yeltsin resigned. Vladimir Putin was first elected president of Russia in March 200

Your question (or 'exit'):  How do I connect to the Any Corp’s Corporate VPN (Virtual Private Network)?




**Answer:**
To connect to Any Corp's Corporate VPN, you need to follow these general steps:

1) Obtain VPN Credentials: Your company's IT department will provide you with the necessary credentials, including a username, password, and possibly the VPN server address.
2) Install VPN Software (if required): Download and install any custom VPN client software provided by Any Corp on your computer or device.
3) Configure VPN Settings: If using built-in clients, you can configure the VPN connection using the built-in VPN clients of your operating system. For Windows, go to "Settings" > "Network & Internet" > "VPN." For macOS, go to "System Preferences" > "Network."
4) Mobile Device Configuration: If connecting from a mobile device, enter the VPN details provided by your IT department in the device's network or connection settings.
5) Manual Configuration (if necessary): If you need to set up the VPN manually, gather specific information like the VPN server address, protocol, username, pass

Your question (or 'exit'):  What is nodeJS




**Web Search:**
Node.js is a powerful, open-source, and cross-platform JavaScript runtime environment built on Chrome's V8 engine. It allows you to run JavaScript code outside the browser, making it ideal for building scalable server-side and networking applications. Node.js is a runtime environment that allows you to execute JavaScript on the server side. Traditionally, JavaScript was confined to web browsers, but Node.js extends its capabilities by allowing it to run on servers. Learn what Node.js is, how it works, and why it is used for web development. Node.js is a JavaScript runtime environment that runs web applications outside the browser, using an asynchronous, event-driven model. Node.js is a JavaScript Runtime Environment that allows developers to create server-side applications. Previously, JavaScript was used only for client-side scripting in browsers. With the evolution of Node.js, JavaScript is also used for writing backend logic. Launched in 2009, Node.js focused on br



**FAQ Error:**
Could not retrieve answer: Error code: 400 - {'error': {'message': "'input' is a required property", 'type': 'invalid_request_error', 'param': None, 'code': None}}
--------------------------------------------------



Your question (or 'exit'):  My computer is infected with malware. What steps should I take to remove it?




**FAQ Error:**
Could not retrieve answer: Missing some input keys: {'query'}
--------------------------------------------------



Your question (or 'exit'):  My computer is infected with malware. What steps should I take to remove it?




**FAQ Error:**
Could not retrieve answer: Error code: 400 - {'error': {'message': "'input' is a required property", 'type': 'invalid_request_error', 'param': None, 'code': None}}
--------------------------------------------------



Your question (or 'exit'):  


Please enter a question.



Your question (or 'exit'):  What steps should I take to remove it?




**Web Search:**
Learn how to protect yourself, how to tell if your device has malware, and how to remove it. What Is Malware? Malware is harmful software that's installed on your device without your knowledge. Viruses, spyware, and ransomware are common types of malware. This in-depth guide will provide readers with expert methodology for detecting, troubleshooting, and completely ridding Windows PCs of viruses, adware, spyware, ransomware, bots, trojans, rogue security software, browser hijackers, and other malware using failsafe removal processes. We look at the steps you should take to get your PC virus-free. Does My Computer Have A Virus? Chances are that you'll spot certain signs if your computer has a virus. It may perform more... To remove malware from your PC, disconnect from the internet, enter Safe Mode, check Task Manager for suspicious processes or high resource usage, scan for malware, analyze your web browser for malicious extensions, and then clear caches and temporary

## Assignment 3: Build a new assistant based on BonBon source code (optional)
The objective
- Run the code and index the sample BonBon FAQ.pdf file to Azure Cognitive Search
- Explore the code and implement a new assistant that has the same behavior as above
- Explore other features such as RBACs, features on admin portal

Please contact the training team in case you need to get the source code of BonBon.