## Credit

Notes are taken from NLPlanet Practical NLP with Python course section 2.17 Question Answering 
* https://www.nlplanet.org/course-practical-nlp/02-practical-nlp-first-tasks/17-question-answering

Authored by Fabio Chiusano
* https://medium.com/@chiusanofabio94

**All quotes '' are sourced from the NLPlanet course.**

## Question Answering

<u>Question Answering (QA) Models:</u>
* Models capable of retrieving answers to a question from a given text.
* Used for searching for specific information in large documents.
* Answers can be directly extracted or generated by the model.

<u>Question Answering Variants:</u>
* <u>How answers are created:</u>
    * Extractive QA:
        * 'The model extracts the answer from a context and provides it directly to the user. It is usually solved with BERT-like models.'
    * Generative QA:
        * 'The model generates free text directly based on the context. It leverages Text Generation models.'
* <u>Where answers are taken from:</u>
    * Open QA:
        * 'The answer is taken from a context.'
    * Closed QA:
        * 'No context is provided and the answer is completely generated by a model.'

<u>QA Datasets (Academic Benchmarks for Extractive QA):</u>
* SQuAD:
    * (The Stanford Question Answering Dataset)
    * 'A reading comprehension dataset, consisting of questions posed by crowd-workers on a set of Wikipedia articles, where the answer to every question is a segment of text from the corresponding reading passage.'
    * 'Contains 100,000+ question-answer pairs on 500+ articles.'
* SQuAD v2:
    * A harder benchmark than SQuAD including unanswered questions.
    * Combines the 100,000+ questions from SQuAD with 50,000+ unanswerable questions written by crowd-workers to appear similar to answerable questions.

## QA in Python

In [None]:
# Install transformers library
!pip install transformers

In [1]:
# Imports
from transformers import pipeline

In [3]:
# Question Answering Example

# Download Model
qa_model = pipeline("question-answering")

# Test Model
question = "How cold is snow?"
context = "Snow begins to form at temperatures at or below 0 degrees Celsius or 32 degrees Fahrenheit."
qa_response = qa_model(question=question, context=context)
# Model returns a dictionary containing keys:
    # score: the confidence of the model in extracting the answer from the context
    # start: the index of the character in the context that corresponds to the start of the extracted answer.
    # end: the index of the character in the context that corresponds to the end of the extracted answer.
    # answer: the text extracted from the context, which should contain the answer.
print(qa_response)

No model was supplied, defaulted to distilbert-base-cased-distilled-squad and revision 626af31 (https://huggingface.co/distilbert-base-cased-distilled-squad).
Using a pipeline without specifying a model name and revision in production is not recommended.


{'score': 0.2543421983718872, 'start': 48, 'end': 90, 'answer': '0 degrees Celsius or 32 degrees Fahrenheit'}


In [4]:
# More Complex Example

context = "Python is a high-level, general-purpose programming language. Its design " \
"philosophy emphasizes code readability with the use of significant indentation. " \
"Python is dynamically typed and garbage-collected. It supports multiple programming " \
"paradigms, including structured (particularly procedural), object-oriented and " \
"functional programming. It is often described as a \"batteries included\" language " \
"due to its comprehensive standard library."

question = "What does the Python design emphasize?"

qa_response = qa_model(question=question, context=context)
print(qa_response)

{'score': 0.7124265432357788, 'start': 95, 'end': 111, 'answer': 'code readability'}


## Fast QA
* Using a passage ranking model found on HuggingFace prior to the QA model can speed up the answer search

In [22]:
# Using a passage ranking model

passage_pipe = pipeline("text-classification", model="cross-encoder/ms-marco-TinyBERT-L-2")

passages = [
    "Wormholes, theorized by Einstein, are tunnels in spacetime that could potentially create shortcuts for long journeys across the universe.",
    "The concept of wormholes remains a theoretical construct, suggesting the existence of shortcuts in spacetime to traverse vast cosmic distances.",
    "Einstein's theory of general relativity introduced the idea of wormholes, hypothetical tunnels connecting distant points in space."
]

question = "Who introduced the concept of wormholes?"

passages_with_question = [f"{question} {passage}" for passage in passages]

# Embed
rankings = passage_pipe(passages_with_question)
print(f"Rankings Look Like: \n{rankings}") 

# Get top-ranked passage
temp_score = 0
for i, rank in enumerate(rankings):
    if rank['score'] > temp_score:
        top_passage_index = i
    temp_score = rank['score']
top_passage = passages[top_passage_index]

# Find answer
qa_response = qa_model(question=question, context=top_passage)

print(f"""
qa_response:
{qa_response}

Confidence:
{qa_response['score']}

Answer:
{qa_response['answer']}
""")

Rankings Look Like: 
[{'label': 'LABEL_0', 'score': 0.7719500660896301}, {'label': 'LABEL_0', 'score': 0.7145923376083374}, {'label': 'LABEL_0', 'score': 0.6634820103645325}]

qa_response:
{'score': 0.9973866939544678, 'start': 24, 'end': 32, 'answer': 'Einstein'}

Confidence:
0.9973866939544678

Answer:
Einstein

