# Pre-requisites
- WSL
- Miniconda3 

# Setup environment
- Create conda env `conda create langchain python=3.11`
- Set the "langchain" env that has been just created as the running env in VS code


Install langchain and openai package

In [37]:
! pip install langchain openai



# Init variables

You need to set value of `OPENAI_API_KEY` that you get from the training team in the .env file

In [38]:
import openai, os
from dotenv import load_dotenv

load_dotenv()
openai.api_type = "azure"
openai.api_version = "2023-07-01-preview"

# Overviews
The BonBon FAQ.pdf file contains frequently asked questions and answers for customer support scenario. The topics are around IT related issue troubleshooting such as networking, software, hardware. You are requested to provide a solution to build a chat bot capable of answering the user questions with LangChain.

## Assignment 1: Document Indexing (mandatory)

- The content of BonBon FAQ.pdf should be indexed to the local Chroma vector DB from where the chatbot can lookup the appropriate information to answer questions.
- Should use some embedding model such as Azure Open AI text-embedding-ada-002 to create vectors, feel free to use any other open source embedding model if it works.

In [39]:
from langchain.document_loaders import PyPDFLoader
from langchain.document_loaders import DirectoryLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_openai import AzureOpenAIEmbeddings
from langchain_chroma import Chroma

OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
OPENAI_DEPLOYMENT_ENDPOINT = os.getenv("AZURE_OPENAI_ENDPOINT")
OPENAI_DEPLOYMENT_VERSION = "2024-06-01"
OPENAI_EMBEDDING_MODEL_NAME = "text-embedding-ada-002"

# Load and process all pdf files in chosen directory
loader = DirectoryLoader('./data', glob="./*.pdf", loader_cls=PyPDFLoader)
original_documents = loader.load()

# Function to adjust page numbers in the original documents
def adjust_page_numbers(documents):
    for doc in documents:
        # Convert from 0-based to 1-based
        doc.metadata['page'] += 1  # Directly modify the original document
    return documents  # Return the modified documents (optional)

# Adjust page numbers
documents = adjust_page_numbers(original_documents)

# Split the text into smaller chunks
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=200,
    separators=[".", ":", "&", "\n\n", "\n", " ", ""]
    )
texts = text_splitter.split_documents(documents)

# Import the Azure OpenAI Embeddings model
embeddings = AzureOpenAIEmbeddings(
    model=OPENAI_EMBEDDING_MODEL_NAME,
    azure_endpoint=OPENAI_DEPLOYMENT_ENDPOINT,
    chunk_size=1,
    api_key=OPENAI_API_KEY,
    api_version=OPENAI_DEPLOYMENT_VERSION,
)

# Supplying a persist_directory will store the embeddings on disk
persist_directory = 'anhtuan_vectorstore_db'
vectordb = Chroma.from_documents(documents=texts, 
                                 embedding=embeddings,
                                 persist_directory=persist_directory)

# Now we can load the persisted database from disk, and use it as normal. 
vectordb = None
vectordb = Chroma(persist_directory=persist_directory, 
                  embedding_function=embeddings)

# For search testing
retriever = vectordb.as_retriever(search_kwargs={"k": 2})
answers = retriever.invoke("How can i reset my password?")
print(answers)


[Document(metadata={'page': 3, 'source': 'data/BonBon FAQ.pdf'}, page_content=".  \n• Remember, the priority levels may vary based on the organization's policies and the nature \nof the incident. It's important to have well -defined criteria for determining the priority of \nincidents so that the Service Desk team can effectively allocate resour ces and provide timely \nsupport.  \n \nFrequently Asked Questions:  \n \n1) Q: How do I reset my password?  \nA: Go to “Where to Reset my Password for which application” web page @ the following link – \nwww.anycorp.intranet.passwordreset/com . There you will be able to select application for which you \nneed to reset your password and will receive further instructions."), Document(metadata={'page': 3, 'source': 'data/BonBon FAQ.pdf'}, page_content=".  \n• Remember, the priority levels may vary based on the organization's policies and the nature \nof the incident. It's important to have well -defined criteria for determining the priority of \n

## Assignment 2: Building Chatbot (mandatory)
- You are requested to build a chatbot solution for customer support scenario using Conversational ReAct agent supported in LangChain
- The chatbot is able to support user to answer FAQs in the sample BonBon FAQ.pdf file.
- The chatbot should use Azure Open AI GPT-3.5 LLM as the reasoning engine.
- The chatbot should be context aware, meaning that it should be able to chat with users in the conversation manner.
- The agent is equipped the following tools:
  - Internet Search: Help the chatbot automatically find out more about something using Duck Duck Go internet search
  - Knowledge Base Search: Help the chatbot to lookup information in the private knowledge base
- In case user asks for information related to topics in the BonBon FAQ.pdf file such as internet connection, printer, malware issues the chatbot must use the private knowledge base, otherwise it should search on the internet to answer the question.
- In the answer of chatbot, it should mention the source file and the page that the answer belongs to, for example the answer should mention "BonBon FQA.pdf (page 2)"

In [43]:
from langchain_openai import AzureChatOpenAI
from langchain.agents import Tool
from duckduckgo_search import DDGS
from langchain import hub
from langchain.memory import ConversationBufferMemory
from langchain.agents import AgentExecutor
from langchain.agents import create_react_agent

OPENAI_MODEL_NAME = "gpt-4o"

# Import the Azure OpenAI Chat model
llm = AzureChatOpenAI(
    deployment_name=OPENAI_MODEL_NAME,
    azure_endpoint=OPENAI_DEPLOYMENT_ENDPOINT,
    api_version=OPENAI_DEPLOYMENT_VERSION,
    api_key=OPENAI_API_KEY,
    temperature=0,
    max_tokens=None,
    timeout=None,
    max_retries=3
)

# Custom DuckDuckGo Chat Function (to prevent rate limiting)
def duckduckgo_chat_function(query: str) -> str:
    """Function to use DuckDuckGo's chat capabilities."""
    # Call DDGS().chat with the specified model
    return DDGS().chat(query, model='claude-3-haiku')

# DuckDuckGo Internet Search Tool
duckduckgo_tool = Tool(
    name="Internet Search",
    description="Search the internet for answers using DuckDuckGo, only search one time",
    func=duckduckgo_chat_function
)

# Knowledge Base Search Tool (retriever from Chroma vector store)
knowledge_base_tool = Tool(
    name="Knowledge Base Search (BonBon FAQ)",
    description="Search the BonBon FAQ.pdf",
    func=vectordb.as_retriever(search_kwargs={"k": 1}).get_relevant_documents
)

# Create a list of tools
tools = [duckduckgo_tool, knowledge_base_tool]

# Define the template for the React prompt
template = """
Answer the following questions as best you can. You can use history {chat_history} to fill in unknown context. You have access to the following tools:

{tools}

Do not repeat thoughts or actions, if repetitive occurs, return the final answer immediately.
Return the answer immediately if you encounter any error or reach the max iteration.

Use the following format:
Remember: if any invalid format occurs, terminate and return answer
Question: the input question you must answer
Thought: you should always think about what to do
Only show 'Action' if it existed > Action: the action to take, find in Search BonBon tool first, then should be one of [{tool_names}] (one of [Internet Search, Knowledge Base Search (BonBon FAQ)]). If not, terminate answer
Only show 'Action Input' if it existed > Action Input: the input to the action
Observation: the result of the action
... (this Thought/Action/Action Input/Observation can repeat at most 5 times.)
Thought: i now know the final answer.
Only show 'Action' if it existed > Action: return the final answer
Final Answer: the final answer to the original input question. If use Search BonBon then include the page of the PDF file that has the question

Remember: Do not show 'Action', 'Action Input' or 'Thought' if the lines is in invalid format or non-existed.

Remember: Remember to return the page number of the PDF file that has the question if you use the BonBon FAQ. (e.g. Page ? of BonBon FAQ)

Remember: Add newline before print out the 'Final Answer' and the 'Thought', and after the action input has been provided.

Begin!

Current conversation:
Chat History:
{chat_history}
Last line:
Human: {input}
Thought: {agent_scratchpad}
"""
prompt_react = hub.pull("hwchase17/react")
prompt_react.template = template

# Initialize the ReAct agent
react_agent = create_react_agent(llm=llm, tools=tools, prompt=prompt_react)

# Define the Conversational ReAct agent with tools
conversational_agent = AgentExecutor(
    agent=react_agent,
    tools=tools,
    verbose=True,
    handle_parsing_errors=True,
    memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True),
    max_iterations=3,
)

# Chatbot interaction loop
while True:
    user_input = input("input: ")
    
    # Exit condition
    if user_input.lower() in ['exit', 'quit']:
        print("Chatbot: Goodbye!")
        break
    
    # Process user input through the agent
    response = conversational_agent.invoke({"input": user_input})
    print(response.get("output"))





[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mHello! How can I assist you today?[0mInvalid Format: Missing 'Action:' after 'Thought:[32;1m[1;3mHello! How can I assist you today?[0mInvalid Format: Missing 'Action:' after 'Thought:[32;1m[1;3mHello! How can I assist you today?[0mInvalid Format: Missing 'Action:' after 'Thought:[32;1m[1;3m[0m

[1m> Finished chain.[0m
Agent stopped due to iteration limit or time limit.


[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mQuestion: why is sunlight at 12pm dangerous
Thought: I should search the BonBon FAQ to see if there is any information related to the dangers of sunlight at 12pm.

Action: Knowledge Base Search (BonBon FAQ)
Action Input: sunlight at 12pm dangerous
[0m[33;1m[1;3m[Document(metadata={'page': 6, 'source': 'data/BonBon FAQ.pdf'}, page_content='. \nOpen the Task Manager (Ctrl+Shift+Esc on Windows) and go to the "Startup" tab. Disable any \nunnecessary programs from starting up.  \n5. Disable 

## Assignment 3: Build a new assistant based on BonBon source code (optional)
The objective
- Run the code and index the sample BonBon FAQ.pdf file to Azure Cognitive Search
- Explore the code and implement a new assistant that has the same behavior as above
- Explore other features such as RBACs, features on admin portal

Please contact the training team in case you need to get the source code of BonBon.