# Question Answering

### Overview 

Recall the overall workflow for retrieval augmented generation (RAG):

![](images/lesson-2-overview.png)

We discussed `Document Loading` and `Splitting` as well as `Storage` and `Retrieval`.

### Setup
Let's load our vectorDB. 

In [1]:
import os
import openai

openai.api_key  = os.environ['OPENAI_API_KEY']

The code below was added to assign the openai LLM version filmed until it is deprecated, currently in Sept 2023. 
LLM responses can often vary, but the responses may be significantly different when using a different model version.

In [2]:
import datetime
current_date = datetime.datetime.now().date()
if current_date < datetime.date(2023, 9, 2):
    llm_name = "gpt-3.5-turbo-0301"
else:
    llm_name = "gpt-3.5-turbo"
print(llm_name)

gpt-3.5-turbo


In [3]:
from langchain.vectorstores import Chroma
from langchain.embeddings.openai import OpenAIEmbeddings

persist_directory = 'docs/chroma/'
embedding = OpenAIEmbeddings()
vectordb = Chroma(persist_directory=persist_directory, embedding_function=embedding)

In [4]:
print(vectordb._collection.count())

223


In [5]:
question = "What are major disorders addressed in this corpus?"
docs = vectordb.similarity_search(question,k=3)
len(docs)

3

In [6]:
docs[0]

Document(page_content='Table 3. continued from previous page.\nFeatureFrequency\nCommentNearly allCommon\xa01InfrequentDisinhibition● Impulsivity, socially unacceptable behavior, risk takingApathy●Indifference,  lack of interest Delusions/hallucinations●Often  bizarre delusions, mostly visual hallucinations Psychosis● Psychosis, often  as initial symptom Anxiety● Generalized stress & apprehensionRepetitive, compulsive behavior●Often  complex, ritualistic behaviors mimicking OCD Preference for sweet food●↑ craving for sweet foodsMotor symptomsUpper MND● Weakness, spasticity, altered muscle toneLower MND● Weakness, fasciculations, atrophy\nBulbar involvementDysarthria● Motor language deficit Dysphagia● Problems swallowing food &/or liquidsParkinsonism●Extrapyramidal findings  such as resting tremor, rigidity, \nakinesiaMND = motor neuron disease; OCD = obsessive compulsive disorder1. Features are ranked as common if present in >33%, if frequency was mentioned.\nInitial manifestations may

In [7]:
from langchain.chat_models import ChatOpenAI

llm = ChatOpenAI(model_name=llm_name, temperature=0)

### RetrievalQA chain

In [8]:
from langchain.chains import RetrievalQA

In [9]:
qa_chain = RetrievalQA.from_chain_type(
    llm,
    retriever=vectordb.as_retriever()
)

In [10]:
question = "What are major disorders addressed in this corpus?"

result = qa_chain({"query": question})

In [11]:
print(result['result'])

The major disorders addressed in this corpus are:

1. Frontotemporal Dementia (FTD)
2. Cleidocranial Dysplasia (CCD) Spectrum Disorder


### Prompt

In [12]:
from langchain.prompts import PromptTemplate

# Build prompt
template = """Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer. Use three sentences maximum. Keep the answer as concise as possible. Always say "thanks for asking!" at the end of the answer. 
{context}
Question: {question}
Helpful Answer:"""
QA_CHAIN_PROMPT = PromptTemplate.from_template(template)


In [13]:
# Run chain
qa_chain = RetrievalQA.from_chain_type(
    llm,
    retriever=vectordb.as_retriever(),
    return_source_documents=True,
    chain_type_kwargs={"prompt": QA_CHAIN_PROMPT}
)

In [14]:
result = qa_chain({"query": "Is cancer a disease that is discussed?"})
print(result['result'])

Based on the given context, there is no mention of cancer being discussed in relation to genetic risk, prenatal testing, or preimplantation genetic testing. Therefore, it cannot be determined whether cancer is a disease that is discussed. Thanks for asking!


In [15]:
result = qa_chain({"query": "Is ALS a disease that is discussed?"})
print(result['result'])

Yes, ALS is a disease that is discussed in the given context. Thanks for asking!


### RetrievalQA chain types

So far the qa chain has used the "stuff" type, which is the default. There are other types of chains that can be used.

If you wish to experiment on the `LangChain plus platform`:

 * Go to [langchain plus platform](https://www.langchain.plus/) and sign up
 * Create an API key from your account's settings
 * Use this API key in the code below   
 * uncomment the code  
 Note, the endpoint in the video differs from the one below. Use the one below.

In [16]:
#import os
#os.environ["LANGCHAIN_TRACING_V2"] = "true"
#os.environ["LANGCHAIN_ENDPOINT"] = "https://api.langchain.plus"
#os.environ["LANGCHAIN_API_KEY"] = "..." # replace dots with your api key

In [17]:
qa_chain_map_reduce = RetrievalQA.from_chain_type(
    llm,
    retriever=vectordb.as_retriever(),
    chain_type="map_reduce"
)

In [18]:
question = "Is dementia a disease that is discussed?"
result = qa_chain_map_reduce({"query": question})
result["result"]

'Yes, dementia is discussed in the text.'

In [19]:
qa_chain_refine = RetrievalQA.from_chain_type(
    llm,
    retriever=vectordb.as_retriever(),
    chain_type="refine"
)
result = qa_chain_refine({"query": question})
result["result"]

"Dementia is a disease that receives significant attention and discussion in various fields, including the medical, scientific, and caregiving communities. It refers to a group of symptoms that affect cognitive abilities, such as memory, thinking, and reasoning, to the extent that it interferes with daily functioning. Dementia can be caused by different underlying conditions, including Alzheimer's disease, Parkinson's disease, Huntington's disease, and dementia with Lewy bodies.\n\nThe impact of dementia is profound, not only on individuals affected by the disease but also on their families and caregivers. As a progressive condition, dementia worsens over time, leading to a decline in cognitive function and often requiring long-term care.\n\nExtensive research and discussions surrounding dementia focus on understanding its causes, developing effective treatments, and improving care for affected individuals. Scientists and healthcare professionals explore various aspects of dementia, in

### RetrievalQA limitations
 
QA fails to preserve conversational history.

In [20]:
qa_chain = RetrievalQA.from_chain_type(
    llm,
    retriever=vectordb.as_retriever()
)

In [21]:
question = "Is cancer a disease that is discussed?"
result = qa_chain({"query": question})
result["result"]

'The provided context does not specifically mention cancer. Therefore, it is unclear whether cancer is being discussed in this particular context.'

In [22]:
question = "What is the age of onset (typical, min, max)?"
result = qa_chain({"query": question})
result["result"]

'The typical age of onset for C9orf72 frontotemporal dementia and/or amyotrophic lateral sclerosis (C9orf72-FTD/ALS) is usually between 50-64 years. However, the age of onset can range from 20 to 91 years.'

The point is simply that the model does not have access to past questions or answers, this will be covered in the next section.