# **LangChain:** Question Answering

## Overview

Recall the overall workflow for retrieval augmented generation (RAG):

![overview.jpeg](Images/RAG.jpg)

We discussed `Document Loading` and `Splitting` as well as `Storage` and `Retrieval`.

Let's load our vectorDB. 

In [1]:
import os
import openai
import sys
sys.path.append('../..')

from dotenv import load_dotenv, find_dotenv
_ = load_dotenv(find_dotenv()) # read local .env file

openai.api_key  = os.environ['OPENAI_API_KEY']

The code below was added to assign the openai LLM version filmed until it is deprecated, currently in Sept 2023. 
LLM responses can often vary, but the responses may be significantly different when using a different model version.

In [2]:
import datetime
current_date = datetime.datetime.now().date()
if current_date < datetime.date(2023, 9, 2):
    llm_name = "gpt-3.5-turbo-0301"
else:
    llm_name = "gpt-3.5-turbo"
print(llm_name)

gpt-3.5-turbo


In [3]:
from langchain.vectorstores import Chroma
from langchain.embeddings.openai import OpenAIEmbeddings
persist_directory = '/home/centrox_ai/Desktop/ABDULLAH/langchain/LangChain-Chat-with-your-Data/chroma/'
embedding = OpenAIEmbeddings()
vectordb = Chroma(persist_directory=persist_directory, embedding_function=embedding)

  warn_deprecated(


In [4]:
print(vectordb._collection.count())

2225


In [11]:
question = "who was the successor of Aurelian after he was murdered?"
docs = vectordb.similarity_search(question,k=3)
len(docs)

3

In [12]:
from langchain.chat_models import ChatOpenAI
llm = ChatOpenAI(model_name=llm_name, temperature=0)

### RetrievalQA chain

In [13]:
from langchain.chains import RetrievalQA

In [14]:
qa_chain = RetrievalQA.from_chain_type(
    llm,
    retriever=vectordb.as_retriever()
)

In [15]:
result = qa_chain({"query": question})

In [16]:
result["result"]

'After Aurelian was murdered, he was succeeded by Tacitus, who ruled for less than two years before meeting a similar fate. Following Tacitus, Marcus Aurelius Probus, an able Illyrian officer, became the emperor.'

### Prompt

In [17]:
from langchain.prompts import PromptTemplate

# Build prompt
template = """Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer. Use three sentences maximum. Keep the answer as concise as possible. Always say "thanks for asking!" at the end of the answer. 
{context}
Question: {question}
Helpful Answer:"""
QA_CHAIN_PROMPT = PromptTemplate.from_template(template)


In [18]:
# Run chain
qa_chain = RetrievalQA.from_chain_type(
    llm,
    retriever=vectordb.as_retriever(),
    return_source_documents=True,
    chain_type_kwargs={"prompt": QA_CHAIN_PROMPT}
)

In [28]:
question = "who was the successor of Aurelian after he was murdered?"

In [29]:
result = qa_chain({"query": question})

In [30]:
result["result"]

'After Aurelian was murdered, he was succeeded by Tacitus, who also met a similar fate after ruling for less than two years. Thanks for asking!'

In [22]:
result["source_documents"][0]

Document(page_content='Instructor (Andrew Ng) :Yeah, yeah. I mean, you’re asking about overfitting, whether \nthis is a good model. I thi nk let’s – the thing’s you’re mentioning are maybe deeper \nquestions about learning algorithms  that we’ll just come back to later, so don’t really \nwant to get into that right now. Any more questions? Okay.  \nSo this endows linear regression with a proba bilistic interpretati on. I’m actually going to \nuse this probabil – use this, sort of, probabilist ic interpretation in order to derive our next \nlearning algorithm, which will be our first classification algorithm. Okay? So you’ll recall \nthat I said that regression problems are where the variable Y that you’re trying to predict \nis continuous values. Now I’m actually gonna ta lk about our first cl assification problem, \nwhere the value Y you’re trying to predict will be discreet value. You can take on only a \nsmall number of discrete values and in th is case I’ll talk about binding class

### RetrievalQA chain types

In [31]:
qa_chain_mr = RetrievalQA.from_chain_type(
    llm,
    retriever=vectordb.as_retriever(),
    chain_type="map_reduce"
)

In [32]:
result = qa_chain_mr({"query": question})

In [33]:
result["result"]

'The successor of Aurelian after he was murdered was Marcus Claudius Tacitus.'

If you wish to experiment on the `LangChain plus platform`:

 * Go to [langchain plus platform](https://www.langchain.plus/) and sign up
 * Create an API key from your account's settings
 * Use this API key in the code below   
 * uncomment the code  
 Note, the endpoint in the video differs from the one below. Use the one below.

In [None]:
#import os
#os.environ["LANGCHAIN_TRACING_V2"] = "true"
#os.environ["LANGCHAIN_ENDPOINT"] = "https://api.langchain.plus"
#os.environ["LANGCHAIN_API_KEY"] = "..." # replace dots with your api key

In [34]:
qa_chain_mr = RetrievalQA.from_chain_type(
    llm,
    retriever=vectordb.as_retriever(),
    chain_type="map_reduce"
)
result = qa_chain_mr({"query": question})
result["result"]

'The successor of Aurelian after he was murdered was Marcus Claudius Tacitus.'

In [35]:
qa_chain_mr = RetrievalQA.from_chain_type(
    llm,
    retriever=vectordb.as_retriever(),
    chain_type="refine"
)
result = qa_chain_mr({"query": question})
result["result"]

'After Aurelian was murdered in 275 A.D., he was succeeded by Tacitus, who ruled for less than two years before being succeeded by Marcus Aurelius Probus. Probus, an able Illyrian officer, continued Aurelian\'s work of restoring unity to the empire. Prior to Aurelian\'s reign, the empire had faced challenges such as the breakaway Gallic Empire under the rule of Tetricus, who eventually surrendered to Aurelian at the Battle of Chalons in 274 A.D. This victory allowed Aurelian to restore unity to the empire and earn the title of "Restorer of the World" (restitutor orbis).'

### RetrievalQA limitations
 
QA fails to preserve conversational history.

In [36]:
qa_chain = RetrievalQA.from_chain_type(
    llm,
    retriever=vectordb.as_retriever()
)

In [37]:
question = "who was the successor of Aurelian after he was murdered?"
result = qa_chain({"query": question})
result["result"]

'After Aurelian was murdered, he was succeeded by Tacitus, who also met a similar fate after ruling for less than two years.'

In [38]:
question = "when was Aurelian murdered?"
result = qa_chain({"query": question})
result["result"]

'Aurelian was murdered in 275 A.D.'

Note, The LLM response varies. Some responses **do** include a reference to probability which might be gleaned from referenced documents. The point is simply that the model does not have access to past questions or answers, this will be covered in the next section.