# Question Answering

<img src="https://raw.githubusercontent.com/Vishesh8/databricks-tests/refs/heads/main/training-images/question-answering.png" width="768">

In [0]:
%pip install -qU openai databricks-langchain langchain-chroma transformers langsmith
dbutils.library.restartPython()

[43mNote: you may need to restart the kernel using %restart_python or dbutils.library.restartPython() to use updated packages.[0m


In [0]:
from openai import OpenAI
import os

DATABRICKS_TOKEN = dbutils.secrets.get(scope = "db-field-eng", key = "va-pat-token")

client = OpenAI(
  api_key=DATABRICKS_TOKEN,
  base_url="https://e2-demo-field-eng.cloud.databricks.com/serving-endpoints"
)

In [0]:
from databricks_langchain import ChatDatabricks
from databricks_langchain import DatabricksEmbeddings
from langchain_chroma import Chroma

In [0]:
# Set Temperature = 0 for generation model in our Q&A application for low variability and factual answers
llm = ChatDatabricks(endpoint="databricks-meta-llama-3-3-70b-instruct", temperature=0)
embedding = DatabricksEmbeddings(endpoint="databricks-gte-large-en")

persist_directory = './data/docs/chroma/'
vectordb = Chroma(persist_directory=persist_directory, embedding_function=embedding)

print(vectordb._collection.count())

208


In [0]:
question = "What are major topics for this class?"
docs = vectordb.similarity_search(question, k=3)

docs[0].page_content[0:500]

"okay?  \nSo as an overview of what we're going to do in this class, this class is sort of organized \ninto four major sections. We're gonna talk about four major topics in this class, the first \nof which is supervised learning. So let me give you an example of that.  \nSo suppose you collect a data set of housing prices. And one of the TAs, Dan Ramage, \nactually collected a data set for me last week to use in the example later. But suppose that \nyou go to collect statistics about how much houses co"

## RetrievalQA Chain

General flow for Q&A chain:
- question comes in
- lookup relevant documents
- pass retrieved chunks along with a `system prompt` and `human question` to the LLM to generate a response

`chain_type` by default is `stuff`, e.g., we just pass all the chunks into the same context window

In [0]:
from langchain.chains import RetrievalQA

In [0]:
qa_chain = RetrievalQA.from_chain_type(
  llm=llm,
  retriever=vectordb.as_retriever(search_type="mmr"),   # MMR added to remove duplicate chunks
  chain_type_kwargs={"verbose":True}
)

result = qa_chain.invoke(question)
result



[1m> Entering new StuffDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mSystem: Use the following pieces of context to answer the user's question. 
If you don't know the answer, just say that you don't know, don't try to make up an answer.
----------------
okay?  
So as an overview of what we're going to do in this class, this class is sort of organized 
into four major sections. We're gonna talk about four major topics in this class, the first 
of which is supervised learning. So let me give you an example of that.  
So suppose you collect a data set of housing prices. And one of the TAs, Dan Ramage, 
actually collected a data set for me last week to use in the example later. But suppose that 
you go to collect statistics about how much houses cost in a certain geographic area. And 
Dan, the TA, collected data from housing prices in Portland, Oregon. So what you can do 
is let's say plot the square footage of the house aga




[1m> Finished chain.[0m

[1m> Finished chain.[0m


{'query': 'What are major topics for this class?',
 'result': 'The class is organized into four major sections, and the first major topic is supervised learning. The other three topics are not specified in the provided context.'}

### Prompt

In [0]:
from langchain.prompts import PromptTemplate

# Build prompt
template = """Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer. Use three sentences maximum. Keep the answer as concise as possible. Always say "thanks for asking!" at the end of the answer. 
{context}
Question: {question}
Helpful Answer:"""

QA_CHAIN_PROMPT = PromptTemplate.from_template(template)

In [0]:
# Run chain
qa_chain = RetrievalQA.from_chain_type(
  llm=llm,
  retriever=vectordb.as_retriever(search_type="mmr", search_kwargs={"k": 3}),
  return_source_documents=True,                     # Inspect Retrieved docs
  chain_type_kwargs={"prompt": QA_CHAIN_PROMPT, "verbose":True}     # Use this prompt in the chain instead of default that doesn't contain instructions around response length and thanking at the end
)

In [0]:
question = "Is probability a class topic?"

result = qa_chain.invoke(question)
result["result"]



[1m> Entering new StuffDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mUse the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer. Use three sentences maximum. Keep the answer as concise as possible. Always say "thanks for asking!" at the end of the answer. 
statistics for a while or maybe algebra, we'll go over those in the discussion sections as a 
refresher for those of you that want one.  
Later in this quarter, we'll also use the discussion sections to go over extensions for the 
material that I'm teaching in the main lectures. So machine learning is a huge field, and 
there are a few extensions that we really want to teach but didn't have time in the main 
lectures for.

of this class will not be very programming intensive, although we will do some 
programming, mostly in either MATLAB or Octave. I'll say a bit more




[1m> Finished chain.[0m

[1m> Finished chain.[0m


'The class assumes familiarity with basic probability and statistics, and will review some of these concepts in discussion sections as a refresher. Probability is a prerequisite for the class, but it will not be a primary topic of focus. thanks for asking!'

In [0]:
result["source_documents"][0]

Document(id='9b309f2a-af07-49f1-b087-82a1140a7e4e', metadata={'author': '', 'creationdate': '2008-07-11T11:25:23-07:00', 'creator': 'PScript5.dll Version 5.2.2', 'moddate': '2008-07-11T11:25:23-07:00', 'page': 8, 'page_label': '9', 'producer': 'Acrobat Distiller 8.1.0 (Windows)', 'source': './data/docs/cs229_lectures/MachineLearning-Lecture01.pdf', 'title': '', 'total_pages': 22}, page_content="statistics for a while or maybe algebra, we'll go over those in the discussion sections as a \nrefresher for those of you that want one.  \nLater in this quarter, we'll also use the discussion sections to go over extensions for the \nmaterial that I'm teaching in the main lectures. So machine learning is a huge field, and \nthere are a few extensions that we really want to teach but didn't have time in the main \nlectures for.")

### RetrievalQA Chain Types
Different chain types can also be used for applications other than question-answering. For example, one common use case for `map_reduce` chain is summarization task

#### 1. stuff
Simply stuffs all data into the prompt as context to pass to the language model. This is the most common chain type method used and the default one if we don't explicitly pass this parameter

<img src="https://raw.githubusercontent.com/Vishesh8/databricks-tests/refs/heads/main/training-images/chain-type-stuff.png" width="768">

**Pros:** makes a single call to the LLM and LLM has access to all the data at once. It's quite simple to understand and cheap

**Cons:** LLMs have a context length, and for large or many documents this might result in a prompt larger than the context length

#### 2. map_reduce
In this technique, each of the individual retrieved doc is sent alongwith a question to a language model to get the answer and then uses another LLM call to summarize all the individual responses into a final answer

<img src="https://raw.githubusercontent.com/Vishesh8/databricks-tests/refs/heads/main/training-images/chain-type-map-reduce.png" width="768">

**Pros:** powerful as it can operate over arbitrarily any number of documents and we can also do individual questions in parallel

**Cons:** it makes a lot more LLM calls and it also treats all documents as independent which may not be the most desired thing

In [0]:
# Note that prompt override is not applicable for Map Reduce
qa_chain_mr = RetrievalQA.from_chain_type(
  llm=llm,
  retriever=vectordb.as_retriever(search_type="mmr", search_kwargs={"k": 10}),   # Can include more chunks with map reduce as it operates on individual chunks separately so context window limitation doesn't hit
  chain_type="map_reduce",
)

In [0]:
result = qa_chain_mr.invoke(question)
result["result"]

None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.
Unexpected internal error when monkey patching `PreTrainedModel.from_pretrained`: 
PreTrainedModel requires the PyTorch library but it was not found in your environment. Checkout the instructions on the
installation page: https://pytorch.org/get-started/locally/ and follow the ones that match your environment.
Please note that you may need to restart your runtime after installation.

Unexpected internal error when monkey patching `Trainer.train`: 
Trainer requires the PyTorch library but it was not found in your environment. Checkout the instructions on the
installation page: https://pytorch.org/get-started/locally/ and follow the ones that match your environment.
Please note that you may need to restart your runtime after installation.



"It appears that probability is likely a relevant topic in the class, although the text does not always explicitly mention it. In some cases, the text mentions statistics, machine learning, or other related fields that often involve probability, and in one case, it explicitly states that the instructor assumes familiarity with basic probability and statistics. However, without more context, it's difficult to say for certain whether probability is a specific class topic. Based on the available information, it seems probable that probability is a class topic, but it's not a definitive answer."

To understand what's going on inside these chains, we can use `mlflow tracing` on Databricks.
If you want to use `LangSmith Platform`. Pre-requisites: 
- Go to [LangSmith](https://www.langchain.com/langsmith) and sign up
- Create an API key from your account's settings
- Uncomment below code: \
<code>os.environ["LANGSMITH_TRACING"] = "true"</code> \
<code>os.environ["LANGSMITH_ENDPOINT"] = "https://api.smith.langchain.com"</code> \
<code>os.environ["LANGSMITH_API_KEY"] = "<lc_api_key>"</code>

In [0]:
import mlflow
mlflow.langchain.autolog()

qa_chain_mr = RetrievalQA.from_chain_type(
  llm=llm,
  retriever=vectordb.as_retriever(search_type="mmr", search_kwargs={"k": 10}),    # Can include more chunks with map reduce as it operates on individual chunks separately so context window limitation doesn't hit
  chain_type="map_reduce",
)

result = qa_chain_mr.invoke(question)
result["result"]

'Yes, probability is a relevant topic, as the instructor assumes familiarity with basic probability and statistics, and mentions that it may be covered as a refresher topic in discussion sections.'

Trace(request_id=tr-010819ca8acb4dd18f30b0ea27581ef3)

From the above trace, we can see there was one LLM call for each chunk and then one final call that had response from all the chunks as context to generate the final response

#### 3. refine
This is also used for many documents like map reduce but it builds upon the answer from previous document iteratively

<img src="https://raw.githubusercontent.com/Vishesh8/databricks-tests/refs/heads/main/training-images/chain-type-refine.png" width="768">

**Pros:** great for combining information and building up answer over time where documents might be dependent

**Cons:** generally leads to longer answers. Number of calls are same as map_reduce, but it's not as fast as the LLM calls aren't independent

In [0]:
qa_chain_ref = RetrievalQA.from_chain_type(
  llm=llm,
  retriever=vectordb.as_retriever(search_type="mmr", search_kwargs={"k": 10}),
  chain_type="refine",
)

result = qa_chain_ref.invoke(question)
result["result"]

'Yes, probability is likely a class topic. The context suggests that the class is focused on machine learning, and the conversation between the instructor (Andrew Ng) and the student involves discussing mathematical concepts and algorithms, such as least squares regression. This implies that the class covers theoretical foundations of machine learning, which often involve probability theory. Therefore, it is reasonable to assume that probability will be covered in the class.'

Trace(request_id=tr-6203a6dbbcd2498890f0c7b4ae81969b)

The above trace shows that there were sequential LLM call for each chunk. Each subsequent call to the language model asks for an improved response based on the previous response and current chunk as additional context. The result is also better than the `map reduce` chain because using refined chain allows to combine information sequentially and encourages more copy over of information than `map reduce`

#### 4. map_rerank
Does a single call to the LLM for each document and also asks it to return a score. Then we select the highest score. This relies on the language model to know what the score should be so we often have to tell it for higher relevance, the score is higher and refine instructions around scoring

<img src="https://raw.githubusercontent.com/Vishesh8/databricks-tests/refs/heads/main/training-images/chain-type-map-rerank.png" width="768">

**Pros:** all the LLM calls are independent so it can be batched and is faster

**Cons:** since we are making a bunch of LLM calls, it'll be expensive

### RetrievalQA Limitations

In a chat application, we get a conversational experience as we can also ask for followup questions. But RetrievalQA chain fails to preserve conversational history

In [0]:
mlflow.langchain.autolog(disable=True)

qa_chain = RetrievalQA.from_chain_type(
  llm=llm,
  retriever=vectordb.as_retriever(search_type="mmr", search_kwargs={"k": 3}),
)

In [0]:
question = "Is probability a class topic?"

result = qa_chain.invoke(question)
result["result"]

"The class assumes familiarity with basic probability, but it doesn't appear to be a main topic of the class. Instead, the instructor mentions that they will review some probability concepts, such as random variables, expectation, and variance, in the discussion sections as a refresher for those who need it."

In [0]:
question = "why are those prerequesites needed?"

result = qa_chain.invoke(question)
result["result"]

'The prerequisites are needed because the class will be covering topics that build upon those foundational concepts. \n\n* Big O notation, data structures, and programming skills are likely needed for implementing and analyzing algorithms.\n* Probability and statistics are needed for understanding and working with random variables, expectations, and variances, which are probably crucial concepts in the class.\n* Linear algebra is needed for working with matrices, vectors, and possibly eigenvectors, which are likely used in the class for solving problems or modeling systems.\n\nThe instructor assumes that students have a solid grasp of these concepts so that they can focus on the more advanced topics that the class will be covering, without having to spend too much time reviewing the basics.'

Note how the above answer doesn't relate to our first question around probability at all. The QA chain doesn't have any concept of state so it doesn't remember previous questions and answers. We can fix this with `LangChain Memory`