# Question Answering

In this lesson, we're going to cover how to do question answering with the documents that you've just retrieved. \
This comes after we've done the whole storage and ingestion, after we've retrieved the relevant splits.\
Now we need to pass that into a language model to get to an answer. 

![](images/retrievalqa.png)

- The general flow for this goes, the question comes in, we look up the relevant documents, we then pass those splits along with a system prompt and the human question to the language model and get the answer. 
- By default, we just pass all the chunks into the same context window, into the same call of the language model.

However, there are a few different methods we can use that have pros and cons to that. \
Most of the pros come from the fact that sometimes there can be a lot of documents and you just simply can't pass them all into the same context window.

![](images/qa_add_met.png)

- Map_reduce, Refine, and Map_rerank are three methods to get around this issue of short context windows, and we'll cover a few of them in the lesson today.

In [1]:
import os
import openai
import sys
sys.path.append('../..')

# from dotenv import load_dotenv, find_dotenv
# _ = load_dotenv(find_dotenv()) # read local .env file
openai.api_key = os.environ.get('OPENAI_API_KEY')

The code below was added to assign the openai LLM version filmed until it is deprecated, currently in Sept 2023. 
LLM responses can often vary, but the responses may be significantly different when using a different model version.

In [2]:
import datetime
current_date = datetime.datetime.now().date()
if current_date < datetime.date(2023, 10, 9):
    llm_name = "gpt-3.5-turbo-0301"
else:
    llm_name = "gpt-3.5-turbo"
print(llm_name)

gpt-3.5-turbo


We will then load in our vector database that was persisted from before. 
And I'm going to check that it is correct. 

In [3]:
from langchain.vectorstores import Chroma
from langchain.embeddings.openai import OpenAIEmbeddings

persist_directory = 'docs/chroma/'
embedding = OpenAIEmbeddings()
vectordb = Chroma(persist_directory=persist_directory, embedding_function=embedding)

In [4]:
print(vectordb._collection.count())

209


And we can see that it has the same 209 documents from before.

We do a quick little check of similarity search just to make sure it's working for this first question of, what are major topics for this class? 

In [5]:
question = "What are major topics for this class?"
docs = vectordb.similarity_search(question,k=3)
len(docs)

3

Now, we initialize the language model that we're going to use to answer the question. 
 
We're going to use the chat open AI model, GPT 3.5, and we're going to set **temperature equal to zero**.\
This is really good when we want factual answers to come out, because it's going to have a low variability and usually just give us the highest fidelity, most reliable answers.

In [6]:
from langchain.chat_models import ChatOpenAI

llm = ChatOpenAI(model_name=llm_name, temperature=0)

We're then going to import the RetrievalQA chain. 
- This is doing question answering backed by a retrieval step. 

In [7]:
from langchain.chains import RetrievalQA

- We can create it by passing in a language model, and then the vector database as a retriever. 

In [8]:
qa_chain = RetrievalQA.from_chain_type(
    llm,
    retriever=vectordb.as_retriever()
)

- We can then call it with the query being equal to the question that we want to ask. 

In [9]:
result = qa_chain({"query": question})

Retrying langchain.embeddings.openai.embed_with_retry.<locals>._embed_with_retry in 4.0 seconds as it raised APIConnectionError: Error communicating with OpenAI: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response')).


- And then when we look at the result, we get an answer.

In [10]:
result["result"]

'The major topics for this class are machine learning and its various extensions.'

### Prompt

The main part that's important here is the prompt that we're using. \
This is the prompt that takes in the documents and the question and passes it to a language model. 
- Here we define a prompt template. 
- It has some instructions about how to use the following pieces of context, and then it has a placeholder for a context variable. 
- This is where the documents will go, and a placeholder for the questions variable.

In [11]:
from langchain.prompts import PromptTemplate

# Build prompt
template = """Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer. Use three sentences maximum. Keep the answer as concise as possible. Always say "thanks for asking!" at the end of the answer. 
{context}
Question: {question}
Helpful Answer:"""
QA_CHAIN_PROMPT = PromptTemplate.from_template(template)


We can now create a new RetrievalQA chain. \
We're going to use the same language model as before and the same vector databases as before, but we're going to pass in a few new arguments.

In [12]:
# Run chain
qa_chain = RetrievalQA.from_chain_type(
    llm,
    retriever=vectordb.as_retriever(),
    return_source_documents=True,
    chain_type_kwargs={"prompt": QA_CHAIN_PROMPT}
)

- We've got the **return_source_document**, so we're going to set this equals to true. 
  - This will let us easily inspect the documents that we retrieve. 
- Then we're also going to pass in a prompt equals to the QA chain prompt that we defined above.

In [13]:
# Run chain
qa_chain = RetrievalQA.from_chain_type(
    llm,
    retriever=vectordb.as_retriever(),
    return_source_documents=True,
    chain_type_kwargs={"prompt": QA_CHAIN_PROMPT}
)

Let's try out a new question.

In [14]:
question = "Is probability a class topic?"

In [15]:
result = qa_chain({"query": question})

Retrying langchain.embeddings.openai.embed_with_retry.<locals>._embed_with_retry in 4.0 seconds as it raised APIConnectionError: Error communicating with OpenAI: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response')).


In [16]:
result["result"]

'Yes, probability is a topic that will be covered in the class. Thanks for asking!'

For a little bit better intuition as to where it's getting this data from, we can take a look at some of the source documents that were returned. 

In [17]:
result["source_documents"][0]

Document(page_content="of this class will not be very program ming intensive, although we will do some \nprogramming, mostly in either MATLAB or Octa ve. I'll say a bit more about that later.  \nI also assume familiarity with basic proba bility and statistics. So most undergraduate \nstatistics class, like Stat 116 taught here at Stanford, will be more than enough. I'm gonna \nassume all of you know what ra ndom variables are, that all of you know what expectation \nis, what a variance or a random variable is. And in case of some of you, it's been a while \nsince you've seen some of this material. At some of the discussion sections, we'll actually \ngo over some of the prerequisites, sort of as  a refresher course under prerequisite class. \nI'll say a bit more about that later as well.  \nLastly, I also assume familiarity with basi c linear algebra. And again, most undergraduate \nlinear algebra courses are more than enough. So if you've taken courses like Math 51, \n103, Math 113 or 

If you look through them, you should see that all the information that was answered is in one of these source documents.

 So far, we've been using the **stuff** technique, the technique that we use by default, which basically just stuffs all the documents into the final prompt. 
 - This is really good because it only involves one call to the language model.
 - However, this does have the limitation that if there's too many documents, they may not all be able to fit inside the context window.

### RetrievalQA chain types

A different type of technique that we can use to do question answering over documents is the **map_reduce** technique.
- In this technique, each of the individual documents is first sent to the language model by itself to get an original answer. 
- And then those answers are composed into a final answer with a final call to the language model. 
- This involves many more calls to the language model, but it does have the advantage in that it can  operate over arbitrarily many documents.

In [18]:
qa_chain_mr = RetrievalQA.from_chain_type(
    llm,
    retriever=vectordb.as_retriever(),
    chain_type="map_reduce"
)

In [19]:
result = qa_chain_mr({"query": question})

Retrying langchain.embeddings.openai.embed_with_retry.<locals>._embed_with_retry in 4.0 seconds as it raised APIConnectionError: Error communicating with OpenAI: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response')).
Retrying langchain.chat_models.openai.ChatOpenAI.completion_with_retry.<locals>._completion_with_retry in 4.0 seconds as it raised RateLimitError: Rate limit reached for default-gpt-3.5-turbo in organization org-HpabZxaRCyuECOSl8BrC9hl5 on requests per min. Limit: 3 / min. Please try again in 20s. Contact us through our help center at help.openai.com if you continue to have issues. Please add a payment method to your account to increase your rate limit. Visit https://platform.openai.com/account/billing to add a payment method..
Retrying langchain.chat_models.openai.ChatOpenAI.completion_with_retry.<locals>._completion_with_retry in 4.0 seconds as it raised RateLimitError: Rate limit reached for default-gpt-3.5-turbo in organization or

In [20]:
result["result"]

'Based on the provided information, it is not clear whether probability is a specific topic covered in the class.'

When we run the previous question through this chain, we can see another limitation of this method. Or actually, we can see two. 
- One, it's a lot slower. Two, the result is actually worse. 
- There is no clear answer on this question based on the given portion of the document. 
- This may occur because it's answering based on each document individually. 
- And so, if there is information that's spread across two documents, it doesn't have it all in the same context.

In [21]:
qa_chain_mr = RetrievalQA.from_chain_type(
    llm,
    retriever=vectordb.as_retriever(),
    chain_type="refine"
)
result = qa_chain_mr({"query": question})
result["result"]

Retrying langchain.embeddings.openai.embed_with_retry.<locals>._embed_with_retry in 4.0 seconds as it raised APIConnectionError: Error communicating with OpenAI: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response')).


'Based on the new context provided, it is still not explicitly mentioned whether probability is a class topic. The discussion sections will cover statistics and algebra as refreshers, and later in the quarter, they will go over extensions for the material taught in the main lectures. While probability is not mentioned directly, it is possible that it could be covered as an extension or in relation to the machine learning field. Therefore, the original answer still stands.'

You'll notice that this is a better result than the **map_reduce** chain. That's because using the **refine** chain does allow you to combine information, albeit sequentially, and it actually encourages more carrying over of information than the **map_reduce** chain.

### RetrievalQA limitations
 
QA fails to preserve conversational history.

In [22]:
qa_chain = RetrievalQA.from_chain_type(
    llm,
    retriever=vectordb.as_retriever()
)

In [23]:
question = "Is probability a class topic?"
result = qa_chain({"query": question})
result["result"]

'Yes, probability is a topic that will be covered in this class. The instructor assumes familiarity with basic probability and statistics, so it is expected that students have prior knowledge in this area.'

In [24]:
question = "why are those prerequesites needed?"
result = qa_chain({"query": question})
result["result"]

'The prerequisites are needed because they provide the foundational knowledge and skills necessary to understand and apply machine learning algorithms. Basic knowledge of computer science and computer skills are important for programming and implementing machine learning algorithms. Familiarity with probability and statistics is necessary for understanding the mathematical concepts and principles underlying machine learning. Basic knowledge of linear algebra is required for working with matrices and vectors, which are fundamental components of many machine learning algorithms.'

That doesn't relate at all to the answer before where we were asking about probability. \
What's going on here? 
- Basically, the chain that we're using doesn't have any concept of state. 
- It doesn't remember what previous questions or previous answers were. 
- For that, we'll need to introduce **memory**, and that's what we'll cover in the next section. 