First, let's install some packages that we will use during the lab.

In [1]:
!pip install transformers

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting transformers
  Downloading transformers-4.27.1-py3-none-any.whl (6.7 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m6.7/6.7 MB[0m [31m22.9 MB/s[0m eta [36m0:00:00[0m
Collecting tokenizers!=0.11.3,<0.14,>=0.11.1
  Downloading tokenizers-0.13.2-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.6 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.6/7.6 MB[0m [31m24.5 MB/s[0m eta [36m0:00:00[0m
Collecting huggingface-hub<1.0,>=0.11.0
  Downloading huggingface_hub-0.13.2-py3-none-any.whl (199 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m199.2/199.2 KB[0m [31m5.7 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: tokenizers, huggingface-hub, transformers
Successfully installed huggingface-hub-0.13.2 tokenizers-0.13.2 transformers-4.27.1


## Pipeline:
We will import pipeline from transformers for creating pipelines.


In [2]:
from transformers import pipeline 

Now, we will create the pipeline for question-answering, which uses the DistilBert model for extractive question answering (i.e., answering questions with the exact wording provided in the context).

In [3]:
question_answerer = pipeline(task="question-answering", model="distilbert-base-cased-distilled-squad")

Downloading (…)lve/main/config.json:   0%|          | 0.00/473 [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/261M [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/29.0 [00:00<?, ?B/s]

Downloading (…)solve/main/vocab.txt:   0%|          | 0.00/213k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/436k [00:00<?, ?B/s]

For example, here are the first few paragraphs from the Wikipedia entry for  NLP that you will use as the context.

In [5]:
context = '''
Natural language processing (NLP) is an interdisciplinary subfield of linguistics, computer science, 
and artificial intelligence concerned with the interactions between computers and human language, 
in particular how to program computers to process and analyze large amounts of natural language data. 
The goal is a computer capable of "understanding" the contents of documents, including the contextual nuances of 
the language within them. The technology can then accurately extract information and insights contained in the documents 
as well as categorize and organize the documents themselves. Challenges in natural language processing frequently involve speech recognition, 
natural-language understanding, and natural-language generation.
'''

Now, you can ask your model anything related to that passage. 

In [9]:
result = question_answerer(question="What is Natural language processing (NLP)?", context=context)
print(result['answer'])

an interdisciplinary subfield of linguistics, computer science, 
and artificial intelligence


You can also pass multiple questions to your pipeline within a list so that you can ask.

In [10]:
questions = ["What is Natural language processing (NLP)?",
             "What are the challenges in natural language processing?",
             "What is the goal of natural language processing?"]

results = question_answerer(question=questions, context=context)

for q, r in zip(questions, results):
    print(q, "\n>> " + r['answer'])

What is Natural language processing (NLP)? 
>> an interdisciplinary subfield of linguistics, computer science, 
and artificial intelligence
What are the challenges in natural language processing? 
>> speech recognition, 
natural-language understanding, and natural-language generation
What is the goal of natural language processing? 
>> a computer capable of "understanding" the contents of documents
