抽取式问答是从给定问题的文本中抽取答案的任务。问答数据集的一个例子是SQuAD数据集，它完全基于该任务。

In [0]:
!pip install transformers
import torch

Collecting transformers
[?25l  Downloading https://files.pythonhosted.org/packages/a3/78/92cedda05552398352ed9784908b834ee32a0bd071a9b32de287327370b7/transformers-2.8.0-py3-none-any.whl (563kB)
[K     |▋                               | 10kB 23.6MB/s eta 0:00:01[K     |█▏                              | 20kB 29.7MB/s eta 0:00:01[K     |█▊                              | 30kB 35.5MB/s eta 0:00:01[K     |██▎                             | 40kB 29.3MB/s eta 0:00:01[K     |███                             | 51kB 15.0MB/s eta 0:00:01[K     |███▌                            | 61kB 13.6MB/s eta 0:00:01[K     |████                            | 71kB 13.3MB/s eta 0:00:01[K     |████▋                           | 81kB 14.1MB/s eta 0:00:01[K     |█████▎                          | 92kB 12.6MB/s eta 0:00:01[K     |█████▉                          | 102kB 13.7MB/s eta 0:00:01[K     |██████▍                         | 112kB 13.7MB/s eta 0:00:01[K     |███████                         | 

In [0]:
torch.cuda.get_device_name(0)

'Tesla P4'

In [0]:
from transformers import pipeline

nlp = pipeline("question-answering")

context = r"""
Extractive Question Answering is the task of extracting an answer from a text given a question. An example of a
question answering dataset is the SQuAD dataset, which is entirely based on that task. If you would like to fine-tune
a model on a SQuAD task, you may leverage the `run_squad.py`.
"""

print(nlp(question="What is extractive question answering?", context=context))
print(nlp(question="What is a good example of a question answering dataset?", context=context))

HBox(children=(IntProgress(value=0, description='Downloading', max=895, style=ProgressStyle(description_width=…




HBox(children=(IntProgress(value=0, description='Downloading', max=213450, style=ProgressStyle(description_wid…




HBox(children=(IntProgress(value=0, description='Downloading', max=230, style=ProgressStyle(description_width=…




HBox(children=(IntProgress(value=0, description='Downloading', max=987, style=ProgressStyle(description_width=…




HBox(children=(IntProgress(value=0, description='Downloading', max=260793700, style=ProgressStyle(description_…




convert squad examples to features: 100%|██████████| 1/1 [00:00<00:00, 249.16it/s]
add example index and unique id: 100%|██████████| 1/1 [00:00<00:00, 5203.85it/s]


{'score': 0.622231578121184, 'start': 34, 'end': 96, 'answer': 'the task of extracting an answer from a text given a question.'}


convert squad examples to features: 100%|██████████| 1/1 [00:00<00:00, 1560.38it/s]
add example index and unique id: 100%|██████████| 1/1 [00:00<00:00, 411.97it/s]


{'score': 0.5115298754462394, 'start': 147, 'end': 161, 'answer': 'SQuAD dataset,'}


### 下面是一个使用模型和Tokenizer回答问题的示例。该过程如下：
– 从checkpoint名称实例化一个tokenizer和一个模型。该模型被识别为一个BERT模型，并用存储在checkpoint中的权重加载它。

– 定义一段文本和几个问题。

– 遍历问题并根据文本和当前问题构建一个序列，使用正确的模型特定分隔符标记类型id和注意力掩码将此序列传递到模型中。这将输出整个序列标记(问题和文本)的开始位置和结束位置的一系列分数。

– 计算结果的softmax以获得从标记的开始位置和停止位置对应的概率

– 将这些标记转换为字符串。

– 打印结果

In [0]:
from transformers import AutoTokenizer, AutoModelForQuestionAnswering

tokenizer = AutoTokenizer.from_pretrained("bert-large-uncased-whole-word-masking-finetuned-squad")
model = AutoModelForQuestionAnswering.from_pretrained("bert-large-uncased-whole-word-masking-finetuned-squad")

text = r"""
&#x1f917; Transformers (formerly known as pytorch-transformers and pytorch-pretrained-bert) provides general-purpose
architectures (BERT, GPT-2, RoBERTa, XLM, DistilBert, XLNet…) for Natural Language Understanding (NLU) and Natural
Language Generation (NLG) with over 32+ pretrained models in 100+ languages and deep interoperability between
TensorFlow 2.0 and PyTorch.
"""

questions = [
    "How many pretrained models are available in Transformers?",
    "What does Transformers provide?",
    "Transformers provides interoperability between which frameworks?",
]



HBox(children=(IntProgress(value=0, description='Downloading', max=398, style=ProgressStyle(description_width=…




HBox(children=(IntProgress(value=0, description='Downloading', max=231508, style=ProgressStyle(description_wid…




HBox(children=(IntProgress(value=0, description='Downloading', max=1340675298, style=ProgressStyle(description…




In [0]:
for question in questions:
    inputs = tokenizer.encode_plus(question, text, add_special_tokens=True, return_tensors="pt")
    input_ids = inputs["input_ids"].tolist()[0]

    text_tokens = tokenizer.convert_ids_to_tokens(input_ids)
    answer_start_scores, answer_end_scores = model(**inputs)

    answer_start = torch.argmax(
        answer_start_scores
    )  # Get the most likely beginning of answer with the argmax of the score
    answer_end = torch.argmax(answer_end_scores) + 1  # Get the most likely end of answer with the argmax of the score

    answer = tokenizer.convert_tokens_to_string(tokenizer.convert_ids_to_tokens(input_ids[answer_start:answer_end]))

    print(f"Question: {question}")
    print(f"Answer: {answer}\n")

Question: How many pretrained models are available in Transformers?
Answer: over 32 +

Question: What does Transformers provide?
Answer: general - purpose architectures

Question: Transformers provides interoperability between which frameworks?
Answer: tensorflow 2 . 0 and pytorch

