# Document Question Answering Chatbot

In [1]:
from langchain.vectorstores import Chroma
from langchain.embeddings import OpenAIEmbeddings
from langchain.chat_models import ChatOpenAI
from langchain.chains import RetrievalQA, ConversationalRetrievalChain
from langchain.chains.question_answering import load_qa_chain
from langchain.prompts import PromptTemplate

In [2]:
import os
import textwrap
from dotenv import load_dotenv, find_dotenv
load_dotenv(find_dotenv())

open.api_key = os.environ['OPENAI_API_KEY']

In [3]:
def ask_question_with_context(qa, question, chat_history):
    result = qa({"question": question, "chat_history": chat_history})
    print(f"\n{textwrap.fill(result['answer'])}\n")
    chat_history = [(question, result["answer"])]
    return chat_history

## Initialize ChromaDB

In [4]:
embeddings = OpenAIEmbeddings()
persist_directory = 'chroma/'
vectordb = Chroma(persist_directory=persist_directory, embedding_function=embeddings)
print(vectordb._collection.count())
vectordb.persist()

199


## Create the chain and a helper function

Initialize the chain we will use for question answering.

In [5]:
llm_model_name = "gpt-3.5-turbo"
#llm_model_name = "gpt-4"
llm = ChatOpenAI(model_name=llm_model_name, temperature=0)

retriever = vectordb.as_retriever(search_type="similarity", search_kwargs={"k":5})

In [6]:
QUESTION_PROMPT = PromptTemplate.from_template("""Given the following conversation and a follow up question, rephrase the follow up question to be a standalone question.

    Chat History:
    {chat_history}
    Follow Up Input: {question}
    Standalone question:""")

In [7]:
qa = ConversationalRetrievalChain.from_llm(
    llm=llm,
    retriever=retriever,
    condense_question_prompt=QUESTION_PROMPT,
    return_source_documents=True,
    verbose=False)

In [8]:
chat_history = []

## Ask questions!

Now we can use the chain to ask questions!

In [9]:
while True:
    query = input('you: ')
    if query == 'q':
        break
    chat_history = ask_question_with_context(qa, query, chat_history)

you:  What is the question generation task in natural language processing?



The question generation task in natural language processing involves
training a machine to produce questions based on a given context and
answer. The goal is to generate fluent and high-quality questions that
are relevant to the context and can be answered correctly.



you:  Can you summarize the paper on that topic included in our data set?



The paper discusses recent advances in neural question generation,
which is a task in natural language processing. The authors explore
different models trained on various datasets, such as SQuAD, Natural
Questions (NQ), Question Answering in Context (QuAC), and TriviaQA.
They evaluate the models using metrics that account for semantic
similarity and lexical similarity to measure the quality of the
generated questions. The paper highlights the success of their models
in producing fluent and high-quality questions.



you:  Who wrote that paper?



The authors of the paper included in the dataset are Liangming Pan,
Wenqiang Lei, Tat-Seng Chua, and Min-Yen Kan.



you:  Are you sure about that?  I thought those people wrote a paper referenced in the paper you summarized.



Based on the provided context, there is no information about whether
the people mentioned in the context wrote a paper referenced in the
summarized paper.



you:  I thought Richard Robbins was an author of the paper we are talking about.



No, Richard Robbins is not mentioned as an author in the context
provided.



you:  What the paper written for a class project?



Based on the provided context, there is no information to suggest that
the paper was written for a class project.



you:  What does the paper say about semantic evaluation metrics?



The paper states that metrics that account for semantic similarity
produce scores that are more reflective of successful question
generation than those based on lexical similarity. It also mentions
that semantic metrics are more robust indicators of success at
question generation than lexical metrics. Additionally, the paper
suggests that semantic metrics can be used together with lexical
metrics for a more comprehensive evaluation of question quality.



you:  What training data did they use?



The paper used four source datasets for training the models: SQuAD,
Natural Questions (NQ), Question Answering in Context (QuAC), and
TriviaQA. Additionally, they created a blended dataset by combining
the four source datasets and randomly shuffling the data. The models
were trained on this shuffled blended dataset.



you:  Which model performed the best?



Based on the information provided, the model trained on the shuffled
blended dataset had the highest performance.



you:  Did the models work well with the QuAC data?



The models struggled with the QuAC dataset in general, and the results
were low compared to the best performance on other datasets.



you:  Why?



The reason for the models struggling with the QuAC dataset and
achieving low results compared to other datasets is not explicitly
mentioned in the given context.



you:  q
