<a href="https://colab.research.google.com/github/amadeus-art/azure-openai-coding-dojo/blob/main/azure_openai_qa.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>
# Question Answering on Documents using Azure OpenAI, Langchain and ChromaDB

One of the main problems of Large Language Models (LLMs) is that they hallucinate (produce inaccurate or false information) when asked questions that are out of their scope. Also, their knowledge is up-to-date only if they are retrained or fine-tuned on recent data.

In this tutorial we will showcase how to use the `langchain` library to perform question answering using "knowledge base informed large language models".

## Install dependencies

In [18]:
!pip install langchain==0.1 langchain_openai python-dotenv chromadb -q


[notice] A new release of pip is available: 23.2.1 -> 23.3.2
[notice] To update, run: python.exe -m pip install --upgrade pip


## Environment setup
Before executing the following cells, make sure to set the following environment variables in the `.env` file or export them:
* `AZURE_OPENAI_KEY`
* `AZURE_OPENAI_ENDPOINT`
* `MODEL_DEPLOYMENT_NAME`
* `EMBEDDING_DEPLOYMENT_NAME`

<br/>
<img src="../assets/keys_endpoint.png" width="800"/>

In [4]:
import os

from dotenv import load_dotenv, find_dotenv

_ = load_dotenv(find_dotenv())  # read local .env file

openai_api_version = '2023-08-01-preview'  # latest as per today (15-09-2023), may change in the future

# these are the name of the deployments you created in the Azure portal within the above resource
model_deployment_name = os.getenv("MODEL_DEPLOYMENT_NAME")
embedding_deployment_name = os.getenv("EMBEDDING_DEPLOYMENT_NAME")

### Example of LLM response on a question after its knowledge cut

In [6]:
from langchain_openai import AzureChatOpenAI
from langchain.schema import HumanMessage

llm = AzureChatOpenAI(
    # key and endpoint are read from the .env file
    openai_api_version=openai_api_version,
    deployment_name=model_deployment_name,
    temperature=0
)

question_1 = HumanMessage(content="Who won the 2022 football world cup?")
question_2 = HumanMessage(content="List the finalists of the 2022 football world cup")

print(llm.invoke([question_1]).content, "\n")
print(llm.invoke([question_2]).content)

As an AI language model, I cannot predict future events. The 2022 FIFA World Cup is scheduled to take place in Qatar from November 21 to December 18, 2022. The winner of the tournament will be determined at that time. 

As an AI language model, I cannot provide real-time information. The 2022 FIFA World Cup has not yet taken place, so the finalists are not known at this time. The tournament is scheduled to be held in Qatar from November 21 to December 18, 2022. The finalists will be determined through the qualification process, which is still ongoing.


## Q&A on Docs
The overall architecture of the Q&A on Docs is depicted below:
<br/>
<img src="../assets/qa_docs.png"/>

### Step. 1 - Load the document(s)
Specify a `DocumentLoader` to load in your unstructured data as `Documents`. A `Document` is a piece of text (the page_content) and associated metadata.

In [7]:
from langchain.document_loaders import WebBaseLoader

# NOTICE: this loader is not specifically designed for Wikipedia, it is just an example
loader = WebBaseLoader("https://en.wikipedia.org/wiki/2022_FIFA_World_Cup")
data = loader.load()

### Step. 2 - Split
Split the `Document` into chunks for embedding and vector storage

In [8]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=600,
    chunk_overlap=0
)
all_splits = text_splitter.split_documents(data)

print(len(all_splits))

428


### Step. 3 - Store
To be able to look up our document splits, we first need to store them where we can later look them up. The most common way to do this is to embed the contents of each document then store the embedding and document in a vector store, with the embedding being used to index the document.

NOTICE: Azure OpenAI embedding models currently only support batches of, at most, 16 chunks. For such reason, we set the `chunk_size` to 16.

In [9]:
from langchain_openai import AzureOpenAIEmbeddings
from langchain.vectorstores import Chroma

embedding = AzureOpenAIEmbeddings(
    # keys and endpoint are read from the .env file
    openai_api_version=openai_api_version,
    deployment=embedding_deployment_name,
)
vectorstore = Chroma.from_documents(documents=all_splits, embedding=embedding)

### Step 4. - Retrieve
Retrieve relevant splits for any question using similarity search. By default, langchain retrieves the top 4 docs. Later on we will increase this number to increase the accuracy of the answer.

In [10]:
question = "Who won 2022 world cup"
docs = vectorstore.similarity_search(question)
len(docs)

4

### Step 5. Generate
Distill the retrieved documents into an answer using an LLM/Chat model with `RetrievalQA` chain.

In this example we customized our prompt, for didactic purpose. This is however not mandatory.



In [11]:
from langchain.chains import RetrievalQA
from langchain_openai import AzureChatOpenAI

llm = AzureChatOpenAI(
    openai_api_version=openai_api_version,
    deployment_name=model_deployment_name,
    temperature=0
)

In [24]:
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain.prompts import PromptTemplate

template = """Use the following pieces of context to answer the question at the end.
Use three sentences maximum and keep the answer as concise as possible.
Don't try to make up the answer, only use the context to answer the question.
The pieces of context refer to the Football World Cup 2022.

Context:
{context}

Question: {question}
Helpful Answer:"""

prompt = PromptTemplate.from_template(template)

# by default, langchain retrieves the top 4 chunks, here we prefer to
# retrieve more chunks to increase the chances of finding the answer
retriever = vectorstore.as_retriever(search_kwargs={"k": 12})

qa_chain = (
    {"context": retriever,  "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

# Alternative (without LCEL):
# qa_chain = RetrievalQA.from_chain_type(
#     llm,
#     retriever=vectorstore.as_retriever(search_kwargs={"k": 12}),
#     chain_type_kwargs={"prompt": prompt},
# )

response = qa_chain.invoke(question)
response

'Argentina won the 2022 World Cup.'

In [26]:
qa_chain.invoke("Summarize the final of the 2022 world cup with a few sentences with a lot of details")

'In the final of the 2022 World Cup, Argentina and France played to a thrilling 3-3 draw after extra time. The match was eventually decided by a penalty shootout, with Argentina emerging as the champions with a 4-2 victory. Lionel Messi was named the best player of the tournament, while Kylian Mbappé finished as the top scorer with 8 goals.'

In [27]:
qa_chain.invoke("Tell me about the open ceremony of the world cup 2022")

"The opening ceremony of the World Cup 2022 took place on November 20, 2022, at the Al Bayt Stadium in Al Khor, Qatar. It featured appearances by Morgan Freeman and Ghanim Al-Muftah, as well as performances by South Korean singer Jungkook and Qatari singer Fahad Al Kubaisi. It was the first time that the Qur'an had been recited as part of the opening ceremony."

In [28]:
qa_chain.invoke("Which team did the the Netherlands play after the groups?")

'The Netherlands played against the United States after the groups.'

In [29]:
# sad question ...
qa_chain.invoke("How about Italy? Did they participate in the tournament?")

'No, Italy did not participate in the tournament.'

In [30]:
# tricky question
qa_chain.invoke("After France won the final, what happened?")

'France did not win the final. Argentina won the final against France.'