# **Question Answering**

Extractive question answering is the task of extracting an answer from a text to a given question. In question answering, text summarization methods are used to find answers to user questions in documents [[1]](#scrollTo=5aLTXh5Sa1bC).

This notebook shows an example of extractive question answering with the SQuAD dataset and the DistilBERT transformers model.

## **Question answering with SQuAD dataset**

The Stanford Question Answering Dataset (SQuAD) is a question answering dataset consisting of 100,000+ questions about a set of Wikipedia articles, where the answer to each question is a segment of text from the corresponding reading passage [[??]](https://nlp.stanford.edu/pubs/rajpurkar2016squad.pdf). The dataset is freely available at [[??]](https://stanford-qa.com).

As transformers model, we use DistilBERT which is maller and faster than BERT [[??]](https://huggingface.co/distilbert-base-uncased).

For question answering, we will apply the following steps:
* Install the ``transformers`` library
* Import the ``pipeline`` class from the the ``transformers`` library
* Add the ``distilbert-base-uncased-distilled-squad`` transformers model
* Create a sample text
* Perform question answering task on the given text






### Install ``transformers``

To use DistilBERT transformers model in our task, we have to install the ``transformers`` library first.

In [1]:
!pip install transformers

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting transformers
  Downloading transformers-4.20.1-py3-none-any.whl (4.4 MB)
[K     |████████████████████████████████| 4.4 MB 7.8 MB/s 
Collecting pyyaml>=5.1
  Downloading PyYAML-6.0-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (596 kB)
[K     |████████████████████████████████| 596 kB 37.4 MB/s 
[?25hCollecting tokenizers!=0.11.3,<0.13,>=0.11.1
  Downloading tokenizers-0.12.1-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (6.6 MB)
[K     |████████████████████████████████| 6.6 MB 10.1 MB/s 
Collecting huggingface-hub<1.0,>=0.1.0
  Downloading huggingface_hub-0.8.1-py3-none-any.whl (101 kB)
[K     |████████████████████████████████| 101 kB 12.2 MB/s 
Installing collected packages: pyyaml, tokenizers, huggingface-hub, transformers
  Attempting uninstall: pyyaml
    Found existing installation: PyYAML 3.13
    Uninstal

### Import ``pipeline``

The ``pipeline`` class is used to ceate all available transformers pipelines. We import it to create a question answering pipeline.




In [2]:
# Import pipeline from transformers library
from transformers import pipeline

### Create question answering pipeline

We create a question answering pipeline by using the ``pipeline()`` function. For this, we can follow two methods:


1.   Define the pipeline type: ``nlp = pipeline("question-answering")``
2.   Define the model name: ``nlp = pipeline(model="distilbert-base-uncased-distilled-squad")``





  using the following task identifier: "question-answering".

from pipeline() using the following task identifier: "question-answering". To immediately use a model on a given text, we use the ``pipeline`` API. Pipelines group together a pre-trained model with the preprocessing that was used during that model's training. Many NLP tasks have a pre-trained pipeline ready to use  [[3]](https://pypi.org/project/transformers/).

The pipeline abstraction is a wrapper around all the other available pipelines. I



The pipelines are a great and easy way to use models for inference. These pipelines are objects that abstract most of the complex code from the library, offering a simple API dedicated to several tasks, including Named Entity Recognition, Masked Language Modeling, Sentiment Analysis, Feature Extraction and Question Answering. 

the model ``distilbert-base-uncased-distilled-squad`` which is fine-tuned using SQuAD v1.1.  

In [4]:
nlp = pipeline(model="distilbert-base-uncased-distilled-squad")

Downloading:   0%|          | 0.00/5.68k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/451 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/253M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/28.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/226k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/455k [00:00<?, ?B/s]

### Create a sample text

In [6]:
# Create some text
context = r"""
... Extractive Question Answering is the task of extracting an answer from a text given a question. An example of a
... question answering dataset is the SQuAD dataset, which is entirely based on that task. If you would like to fine-tune
... a model on a SQuAD task, you may leverage the examples/question-answering/run_squad.py script.
... """

### Apply question answering model on the given text

In [5]:
# Print the answers for the following questions:
## "What is extractive question answering?"
## "What is a good example of a question answering dataset?"

result = nlp(question="What is extractive question answering?", context=context)
print(f"Answer: '{result['answer']}', score: {round(result['score'], 4)}, start: {result['start']}, end: {result['end']}")

result = nlp(question="What is a good example of a question answering dataset?", context=context)
print(f"Answer: '{result['answer']}', score: {round(result['score'], 4)}, start: {result['start']}, end: {result['end']}")

Answer: 'the task of extracting an answer from a text given a question', score: 0.564, start: 38, end: 99
Answer: 'SQuAD dataset', score: 0.4472, start: 155, end: 168


# **References**

- [1] NLP and Computer Vision_DLMAINLPCV01 Course Book
- [2] https://huggingface.co/transformers/task_summary.html#extractive-question-answering)
- [3] https://pypi.org/project/transformers/

Copyright © 2022 IU International University of Applied Sciences