The following is a list of common NLP tasks, with some examples of each:

1. Classifying whole senetces (sentiment, email as spam or not, sentence grammatically correct, logically related)
2. Classifying each word in a sentence: Grammatical (noun, verb, adjective), or the named entities (person, location, organization)
3. Generating text content: Completing a prompt with auto-generated text, filling in the blanks in a text with masked words
4. Extracting an answer from a text: Given a question and a context, extracting the answer to the question based on the information provided in the context
5. Generating a new sentence from an input text: Translating a text into another language, summarizing a text
6. Generating a transcript of an audio sample or a description of an image.

In [1]:
# The most basic object in the 🤗 Transformers library is the pipeline() function. 
# It connects a model with its necessary preprocessing and postprocessing steps, 
# allowing us to directly input any text and get an intelligible answer:

%pip install datasets evaluate transformers[sentencepiece]

from transformers import pipeline
classifier = pipeline("sentiment-analysis")
classifier("I've been waiting for a HuggingFace course my whole life.")

# The model is downloaded and cached when you create the classifier object. 
# If you rerun the command, the cached model will be used instead and there is no need to download the model again.


classifier(
    ["I've been waiting for a HuggingFace course my whole life.", "I hate this so much!"]
)

zsh:1: no matches found: transformers[sentencepiece]
Note: you may need to restart the kernel to use updated packages.


  from .autonotebook import tqdm as notebook_tqdm
No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


[{'label': 'POSITIVE', 'score': 0.9598050713539124}]

Zero-shot classification

This is a common scenario in real-world projects because annotating text is usually time-consuming and requires domain expertise. 
For this use case, the zero-shot-classification pipeline is very powerful: it allows you to specify which labels to use for the classification, so you don’t have to rely on the labels of the pretrained model

In [3]:
classifier = pipeline("zero-shot-classification")
classifier(
    "This is a course about the Transformer library",
    candidate_labels=["education", "business", "politics"]
)

No model was supplied, defaulted to facebook/bart-large-mnli and revision c626438 (https://huggingface.co/facebook/bart-large-mnli).
Using a pipeline without specifying a model name and revision in production is not recommended.
Downloading: 100%|██████████| 1.15k/1.15k [00:00<00:00, 375kB/s]
Downloading: 100%|██████████| 1.63G/1.63G [01:05<00:00, 24.7MB/s]
Downloading: 100%|██████████| 26.0/26.0 [00:00<00:00, 7.17kB/s]
Downloading: 100%|██████████| 899k/899k [00:01<00:00, 651kB/s]  
Downloading: 100%|██████████| 456k/456k [00:01<00:00, 421kB/s]  
Downloading: 100%|██████████| 1.36M/1.36M [00:01<00:00, 1.19MB/s]


{'sequence': 'This is a course about the Transformer library',
 'labels': ['education', 'business', 'politics'],
 'scores': [0.9567619562149048, 0.03160896897315979, 0.011629075743258]}

Text generation

The main idea here is that you provide a prompt and the model will auto-complete it by generating the remaining text. This is similar to the predictive text feature that is found on many phones. Text generation involves randomness, so it’s normal if you don’t get the same results as shown below.

In [6]:
generator = pipeline("text-generation", model="distilgpt2")
generator(
    "In this course, we will teach you how to",
    max_length=30,
    num_return_sequences=2,
)

Downloading: 100%|██████████| 762/762 [00:00<00:00, 207kB/s]
Downloading: 100%|██████████| 353M/353M [00:13<00:00, 26.9MB/s] 
Downloading: 100%|██████████| 1.04M/1.04M [00:01<00:00, 755kB/s] 
Downloading: 100%|██████████| 456k/456k [00:01<00:00, 442kB/s] 
Downloading: 100%|██████████| 1.36M/1.36M [00:01<00:00, 756kB/s]
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'In this course, we will teach you how to manipulate the internet in the first place.'},
 {'generated_text': 'In this course, we will teach you how to create a virtual reality device that can be launched onto your computer.\n\n\n- View your real'}]

Fill-mask

To fill the blanks in a given text.  Note that here the model fills in the special <mask> word, which is often referred to as a mask token. Other mask-filling models might have different mask tokens, so it’s always good to verify the proper mask word when exploring 
other models

In [7]:
unmasker = pipeline("fill-mask")
unmasker("This course will teach you all about <mask> models.", top_k=2)

No model was supplied, defaulted to distilroberta-base and revision ec58a5b (https://huggingface.co/distilroberta-base).
Using a pipeline without specifying a model name and revision in production is not recommended.
Downloading: 100%|██████████| 480/480 [00:00<00:00, 121kB/s]
Downloading: 100%|██████████| 331M/331M [00:12<00:00, 27.0MB/s] 
Downloading: 100%|██████████| 899k/899k [00:02<00:00, 433kB/s]  
Downloading: 100%|██████████| 456k/456k [00:01<00:00, 314kB/s]  
Downloading: 100%|██████████| 1.36M/1.36M [00:02<00:00, 553kB/s] 


[{'score': 0.1961977630853653,
  'token': 30412,
  'token_str': ' mathematical',
  'sequence': 'This course will teach you all about mathematical models.'},
 {'score': 0.04052729532122612,
  'token': 38163,
  'token_str': ' computational',
  'sequence': 'This course will teach you all about computational models.'}]

Named entity recoginition

Named entity recognition (NER) is a task where the model has to find which parts of the input text correspond to entities such as persons, locations, or organizations.



In [8]:
ner = pipeline("ner", grouped_entities=True)
ner("My name is Monu Singh and I work at StanC in Bangalore")

No model was supplied, defaulted to dbmdz/bert-large-cased-finetuned-conll03-english and revision f2482bf (https://huggingface.co/dbmdz/bert-large-cased-finetuned-conll03-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
Downloading: 100%|██████████| 998/998 [00:00<00:00, 235kB/s]
Downloading: 100%|██████████| 1.33G/1.33G [01:04<00:00, 20.6MB/s] 
Downloading: 100%|██████████| 60.0/60.0 [00:00<00:00, 26.7kB/s]
Downloading: 100%|██████████| 213k/213k [00:01<00:00, 182kB/s]  


[{'entity_group': 'PER',
  'score': 0.9987305,
  'word': 'Monu Singh',
  'start': 11,
  'end': 21},
 {'entity_group': 'ORG',
  'score': 0.99483955,
  'word': 'StanC',
  'start': 36,
  'end': 41},
 {'entity_group': 'LOC',
  'score': 0.99677545,
  'word': 'Bangalore',
  'start': 45,
  'end': 54}]

Question Answering

Given information & context

In [9]:
question_answerer = pipeline("question-answering")
question_answerer(
    question="Where do I work?",
    context="My name is Monu Singh and I work at StanC in Bangalore",
)

No model was supplied, defaulted to distilbert-base-cased-distilled-squad and revision 626af31 (https://huggingface.co/distilbert-base-cased-distilled-squad).
Using a pipeline without specifying a model name and revision in production is not recommended.
Downloading: 100%|██████████| 473/473 [00:00<00:00, 136kB/s]
Downloading: 100%|██████████| 261M/261M [00:10<00:00, 25.6MB/s] 
Downloading: 100%|██████████| 29.0/29.0 [00:00<00:00, 5.74kB/s]
Downloading: 100%|██████████| 213k/213k [00:01<00:00, 206kB/s]  
Downloading: 100%|██████████| 436k/436k [00:01<00:00, 297kB/s]  


{'score': 0.4609309434890747, 'start': 36, 'end': 41, 'answer': 'StanC'}

Summarisation 

Summarization is the task of reducing a text into a shorter text while keeping all (or most) of the important aspects referenced in the text.



In [10]:
summarizer = pipeline("summarization")
summarizer(
    """
    America has changed dramatically during recent years. Not only has the number of 
    graduates in traditional engineering disciplines such as mechanical, civil, 
    electrical, chemical, and aeronautical engineering declined, but in most of 
    the premier American universities engineering curricula now concentrate on 
    and encourage largely the study of engineering science. As a result, there 
    are declining offerings in engineering subjects dealing with infrastructure, 
    the environment, and related issues, and greater concentration on high 
    technology subjects, largely supporting increasingly complex scientific 
    developments. While the latter is important, it should not be at the expense 
    of more traditional engineering.

    Rapidly developing economies such as China and India, as well as other 
    industrial countries in Europe and Asia, continue to encourage and advance 
    the teaching of engineering. Both China and India, respectively, graduate 
    six and eight times as many traditional engineers as does the United States. 
    Other industrial countries at minimum maintain their output, while America 
    suffers an increasingly serious decline in the number of engineering graduates 
    and a lack of well-educated engineers.
"""
)

No model was supplied, defaulted to sshleifer/distilbart-cnn-12-6 and revision a4f8f3e (https://huggingface.co/sshleifer/distilbart-cnn-12-6).
Using a pipeline without specifying a model name and revision in production is not recommended.
Downloading: 100%|██████████| 1.80k/1.80k [00:00<00:00, 1.72MB/s]
Downloading: 100%|██████████| 1.22G/1.22G [01:37<00:00, 12.6MB/s]
Downloading: 100%|██████████| 26.0/26.0 [00:00<00:00, 12.2kB/s]
Downloading: 100%|██████████| 899k/899k [00:01<00:00, 550kB/s]  
Downloading: 100%|██████████| 456k/456k [00:01<00:00, 297kB/s]  


[{'summary_text': ' America has changed dramatically during recent years . The number of engineering graduates in the U.S. has declined in traditional engineering disciplines such as mechanical, civil,    electrical, chemical, and aeronautical engineering . Rapidly developing economies such as China and India continue to encourage and advance the teaching of engineering .'}]