# Pre-requisites
- WSL
- Miniconda3

# Setup environment
- Create conda env `conda create langchain python=3.11`
- Set the "langchain" env that has been just created as the running env in VS code

Install langchain and openai package

In [None]:
! pip install langchain openai

In [1]:
rm -rf chroma_bonbon_azure

# Init variables

You need to set value of `OPEN_API_KEY` that you get from .env file

In [2]:
import os
from dotenv import load_dotenv

load_dotenv()

True

In [4]:
api_type = os.getenv("EMBED_OPENAI_API_TYPE")
embed_openai_api_key = os.getenv("EMBED_OPENAI_API_KEY")
embed_openai_api_version = os.getenv("EMBED_OPENAI_API_VERSION")
embed_azure_openai_endpoint = os.getenv("EMBED_AZURE_OPENAI_ENDPOINT")
embed_model = os.getenv("EMBED_MODEL")
embed_deployment = os.getenv("EMBED_DEPLOYMENT")

persist_directory = "chroma_bonbon_azure"
pdf_path = "BonBon_FAQ.pdf"

# Overviews

The BonBon FAQ.pdf file contains frequently asked questions and answers for customer support scenario. The topics are around IT related issue troubleshooting such as networking, software, hardware. You are requested to provide a solution to build a chat bot capable of answering the user questions with LangChain.

## Assignment 1: Document Indexing (mandatory)


- The content of BonBon FAQ.pdf should be indexed to the local Chroma vector DB from where the chatbot can lookup the appropriate information to answer questions.
- Should use some embedding model such as Azure Open AI text-embedding-3-small to create vectors, feel free to use any other open source embedding model if it works.

In [12]:
from langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_chroma import Chroma
from langchain_openai import AzureOpenAIEmbeddings

loader = PyPDFLoader(pdf_path)
documents = loader.load_and_split()

for i, doc in enumerate(documents):
    doc.metadata["source"] = "BonBon_FAQ.pdf"
    doc.metadata["page"] = i + 1

text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
docs = text_splitter.split_documents(documents)

embedding_model = AzureOpenAIEmbeddings(
    deployment=embed_deployment,  
    model=embed_model,
)

vectordb = Chroma.from_documents(
    documents=docs,
    embedding=embedding_model,
    persist_directory=persist_directory
)

## Assignment 2: Building Chatbot (mandatory)

- You are requested to build a chatbot solution for customer support scenario using Conversational ReAct agent supported in LangChain
- The chatbot is able to support user to answer FAQs in the sample BonBon FAQ.pdf file.
- The chatbot should use Azure Open AI GPT-4o LLM as the reasoning engine.
- The chatbot should be context aware, meaning that it should be able to chat with users in the conversation manner.
- The agent is equipped the following tools:
  - Internet Search: Help the chatbot automatically find out more about something using Duck Duck Go internet search
  - Knowledge Base Search: Help the chatbot to lookup information in the private knowledge base
- In case user asks for information related to topics in the BonBon FAQ.pdf file such as internet connection, printer, malware issues the chatbot must use the private knowledge base, otherwise it should search on the internet to answer the question.
- In the answer of chatbot, it should mention the source file and the page that the answer belongs to, for example the answer should mention "BonBon FQA.pdf (page 2)"

In [13]:
def format_qa_answer(docs):
  result = []
  for doc in docs:
    content = doc.page_content.strip()
    source = doc.metadata.get("source", "Unknown")
    page = doc.metadata.get("page", "?")
    result.append(f"{content}\n(Source: {source} - page {page})")
  return "\n\n".join(result)

def knowledge_search_tool(query: str) -> str:
  results = vectordb.similarity_search(query, k=2)
  return format_qa_answer(results)

In [15]:
chat_openai_api_key = os.getenv("CHAT_OPENAI_API_KEY")
chat_openai_api_version = os.getenv("CHAT_OPENAI_API_VERSION")
chat_azure_openai_endpoint = os.getenv("CHAT_AZURE_OPENAI_ENDPOINT")
chat_model = os.getenv("CHAT_MODEL")
chat_deployment = os.getenv("CHAT_DEPLOYMENT")

In [20]:
from langchain_openai import AzureChatOpenAI
from langchain.agents import initialize_agent, Tool, AgentType
from langchain.memory import ConversationBufferMemory
from langchain_chroma import Chroma
from langchain.tools import DuckDuckGoSearchRun

llm = AzureChatOpenAI(
  azure_deployment=chat_deployment,
  model=chat_model,
  api_version=chat_openai_api_version,
  azure_endpoint=chat_azure_openai_endpoint,
  openai_api_key=chat_openai_api_key
)

embedding_model = AzureOpenAIEmbeddings(
  deployment=embed_deployment,  
  model=embed_model
)

vectordb = Chroma(
  persist_directory=persist_directory,
  embedding_function=embedding_model
)

search_tool = DuckDuckGoSearchRun()
tools = [
  Tool(
    name="KnowledgeBaseSearch",
    func=knowledge_search_tool,
    description="Useful for answering questions about internal systems like internet connection, printer, malware issues based on BonBon_FAQ.pdf. Must include source and page in final answer."
  ),
  Tool(
    name="InternetSearch",
    func=search_tool.run,
    description="Useful for answering general questions or anything not found in BonBon_FAQ.pdf"
  ),
]

memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True)
agent = initialize_agent(
  tools=tools,
  llm=llm,
  agent=AgentType.CHAT_CONVERSATIONAL_REACT_DESCRIPTION,
  verbose=True,
  memory=memory,
  handle_parsing_errors=True
)

In [21]:
agent.run("I want to reset my password, how to do it?")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m```json
{
    "action": "KnowledgeBaseSearch",
    "action_input": "resetting password"
}
```[0m
Observation: [36;1m[1;3mthe priority of
incidents so that the Service Desk team can effectively allocate resources 
and provide timely
support.
Frequently Asked Questions:
 Q How do I reset my password?
A Go to “Where to Reset my Password for which applicationˮ web page @ 
the following link –
www.anycorp.intranet.passwordreset/com. There you will be able to select 
application for which you
need to reset your password and will receive further instructions
(Source: BonBon_FAQ.pdf - page 4)

the priority of
incidents so that the Service Desk team can effectively allocate resources 
and provide timely
support.
Frequently Asked Questions:
 Q How do I reset my password?
A Go to “Where to Reset my Password for which applicationˮ web page @ 
the following link –
www.anycorp.intranet.passwordreset/com. There you will be able t

"To reset your password, go to the 'Where to Reset my Password for which application' web page at the following link: www.anycorp.intranet.passwordreset/com. There, you can select the application for which you need to reset your password and receive further instructions. (Source: BonBon_FAQ.pdf - page 4)"

## [SKIP] Assignment 3: Build a new assistant based on BonBon source code (optional)
The objective
- Run the code and index the sample BonBon FAQ.pdf file to Azure Cognitive Search
- Explore the code and implement a new assistant that has the same behavior as above
- Explore other features such as RBACs, features on admin portal

Please contact the training team in case you need to get the source code of BonBon.