# Pipelines

The pipelines are an efficient way to utilize models for inference. These pipelines are objects that abstract the majority of complex codebase from the library, providing a simple API dedicated to multiple tasks, including sentiment analysis, named entity recognition, summarization, text generation, and question answering.

In [1]:
from transformers import pipeline

## Sentiment Analysis - Classification

### Passing one sentence

In [2]:
classifier = pipeline('sentiment-analysis')
classifier('I have been waiting to start Hugging Face course')

No model was supplied, defaulted to distilbert/distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/629 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

[{'label': 'POSITIVE', 'score': 0.9987032413482666}]

### Passing multiple sentences

In [3]:
classifier = pipeline('sentiment-analysis')
classifier(
    ['I have been waiting to start Hugging Face course',
     'That is such a bad news!'])

No model was supplied, defaulted to distilbert/distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


[{'label': 'POSITIVE', 'score': 0.9987032413482666},
 {'label': 'NEGATIVE', 'score': 0.9998062252998352}]

## Zero-shot Classification

In [5]:
classifier = pipeline("zero-shot-classification")
classifier(
    "This is a course about Natural language processing (NLP)",
    candidate_labels = ["education", "sports", "entertainment"]
)

No model was supplied, defaulted to facebook/bart-large-mnli and revision c626438 (https://huggingface.co/facebook/bart-large-mnli).
Using a pipeline without specifying a model name and revision in production is not recommended.


{'sequence': 'This is a course about Natural language processing (NLP)',
 'labels': ['education', 'entertainment', 'sports'],
 'scores': [0.614361047744751, 0.2008926123380661, 0.18474635481834412]}

## Text generation

In [6]:
generator = pipeline('text-generation')
generator("In this algorithm you will learn")

No model was supplied, defaulted to openai-community/gpt2 and revision 6c0e608 (https://huggingface.co/openai-community/gpt2).
Using a pipeline without specifying a model name and revision in production is not recommended.


config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/548M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'In this algorithm you will learn how to store a random number within a string, then compare the result against the array of strings stored in the array and assign a single token to the key of the token in the first key string in the array.\n'}]

In [8]:
generator = pipeline('text-generation', model = 'distilgpt2')
generator(
    "In this algortihm you will learn",
    max_length = 50,
    num_return_sequences = 3
)

config.json:   0%|          | 0.00/762 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/353M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'In this algortihm you will learn from your past and the future. The future is never to come without the knowledge of your future and from that present.\n\nYou must also follow me on Twitter: @jeffdew_'},
 {'generated_text': "In this algortihm you will learn to read as well as learn how to read!\n\nSo, once you read this, here's my first question.\n1) I will have to ask you to use your word correctly by"},
 {'generated_text': 'In this algortihm you will learn a lot about the way animals deal with the environment in a food-loving world. For example, how are animals able to handle the environment in which animals live?\n\n\n\nMany animals ('}]

## Fill Mask

In [14]:
fill_masker = pipeline('fill-mask')
fill_masker(
    "From this course, you will learn all about <mask> methods.",
    top_k = 3
)

No model was supplied, defaulted to distilbert/distilroberta-base and revision ec58a5b (https://huggingface.co/distilbert/distilroberta-base).
Using a pipeline without specifying a model name and revision in production is not recommended.
Some weights of the model checkpoint at distilbert/distilroberta-base were not used when initializing RobertaForMaskedLM: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
- This IS expected if you are initializing RobertaForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


[{'score': 0.042266182601451874,
  'token': 8326,
  'token_str': ' programming',
  'sequence': 'From this course, you will learn all about programming methods.'},
 {'score': 0.017150819301605225,
  'token': 25212,
  'token_str': ' optimization',
  'sequence': 'From this course, you will learn all about optimization methods.'},
 {'score': 0.016385400667786598,
  'token': 17325,
  'token_str': ' statistical',
  'sequence': 'From this course, you will learn all about statistical methods.'}]

## Named Entity Recognition (NER)

In [15]:
ner = pipeline('ner',
               grouped_entities = True
              )
ner("My name is John and I work at Amazon in Seattle.")

No model was supplied, defaulted to dbmdz/bert-large-cased-finetuned-conll03-english and revision f2482bf (https://huggingface.co/dbmdz/bert-large-cased-finetuned-conll03-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


config.json:   0%|          | 0.00/998 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.33G [00:00<?, ?B/s]

Some weights of the model checkpoint at dbmdz/bert-large-cased-finetuned-conll03-english were not used when initializing BertForTokenClassification: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


tokenizer_config.json:   0%|          | 0.00/60.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/213k [00:00<?, ?B/s]



[{'entity_group': 'PER',
  'score': 0.9986583,
  'word': 'John',
  'start': 11,
  'end': 15},
 {'entity_group': 'ORG',
  'score': 0.99745816,
  'word': 'Amazon',
  'start': 30,
  'end': 36},
 {'entity_group': 'LOC',
  'score': 0.99858034,
  'word': 'Seattle',
  'start': 40,
  'end': 47}]

## Question Answering

In [18]:
que_answer = pipeline('question-answering')
que_answer(
    question = 'what is my name?',
    context='My name is John and I work at Amazon in Seattle.'
)

No model was supplied, defaulted to distilbert/distilbert-base-cased-distilled-squad and revision 626af31 (https://huggingface.co/distilbert/distilbert-base-cased-distilled-squad).
Using a pipeline without specifying a model name and revision in production is not recommended.


{'score': 0.9963988065719604, 'start': 11, 'end': 15, 'answer': 'John'}

## Summarization

In [19]:
summarizer = pipeline('summarization')
summarizer(
    """
    Us is a 2019 psychological horror film written and directed by Jordan Peele, starring Lupita Nyong'o, Winston Duke, Elisabeth Moss, and Tim Heidecker.
    The film follows Adelaide Wilson (Nyong'o) and her family, who are attacked by a group of menacing doppelgängers, called the ‘Tethered’. The project was announced in February 2018, and much of the cast joined in the following months. Peele produced the film alongside Jason Blum and Sean McKittrick, having previously collaborated on Get Out and BlacKkKlansman, as well as Ian Cooper. Filming took place in California, mostly in Los Angeles, Pasadena and Santa Cruz, from July to October 2018.
    Us premiered at South by Southwest on March 8, 2019, and was theatrically released in the United States on March 22, 2019, by Universal Pictures. It was a critical and commercial success, grossing $256 million worldwide against a budget of $20 million, and received praise for Peele's screenplay and direction, Nyong'o's performance, and Michael Abels' score.
    """
)

No model was supplied, defaulted to sshleifer/distilbart-cnn-12-6 and revision a4f8f3e (https://huggingface.co/sshleifer/distilbart-cnn-12-6).
Using a pipeline without specifying a model name and revision in production is not recommended.


config.json:   0%|          | 0.00/1.80k [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/1.22G [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

[{'summary_text': " Us premiered at South by Southwest on March 8, 2019, and was theatrically released in the U.S. on March 22, 2019 . It was a critical and commercial success, grossing $256 million worldwide against a budget of $20 million . The film follows Adelaide Wilson (Nyong'o) and her family, who are attacked by a group of menacing doppelgängers ."}]

In [21]:
summarizer = pipeline('summarization')
summarizer(
    """
    Us is a 2019 psychological horror film written and directed by Jordan Peele, starring Lupita Nyong'o, Winston Duke, Elisabeth Moss, and Tim Heidecker.
    The film follows Adelaide Wilson (Nyong'o) and her family, who are attacked by a group of menacing doppelgängers, called the ‘Tethered’. The project was announced in February 2018, and much of the cast joined in the following months. Peele produced the film alongside Jason Blum and Sean McKittrick, having previously collaborated on Get Out and BlacKkKlansman, as well as Ian Cooper. Filming took place in California, mostly in Los Angeles, Pasadena and Santa Cruz, from July to October 2018.
    Us premiered at South by Southwest on March 8, 2019, and was theatrically released in the United States on March 22, 2019, by Universal Pictures. It was a critical and commercial success, grossing $256 million worldwide against a budget of $20 million, and received praise for Peele's screenplay and direction, Nyong'o's performance, and Michael Abels' score.
    """,
    min_length = 50,
    max_length = 200
)

No model was supplied, defaulted to sshleifer/distilbart-cnn-12-6 and revision a4f8f3e (https://huggingface.co/sshleifer/distilbart-cnn-12-6).
Using a pipeline without specifying a model name and revision in production is not recommended.


[{'summary_text': " Us was written and directed by Jordan Peele, starring Lupita Nyong'o, Winston Duke, Elisabeth Moss, and Tim Heidecker . The film follows Adelaide Wilson and her family, who are attacked by a group of menacing doppelgängers . It was a critical and commercial success, grossing $256 million worldwide against a budget of $20 million ."}]

## Translation

In [22]:
translator = pipeline('translation',
                      model = 'Helsinki-NLP/opus-mt-en-fr'
                    )
translator('I am learning Natural Language Processing (NLP) course.')


config.json:   0%|          | 0.00/1.42k [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/301M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/293 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/42.0 [00:00<?, ?B/s]

source.spm:   0%|          | 0.00/778k [00:00<?, ?B/s]

target.spm:   0%|          | 0.00/802k [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.34M [00:00<?, ?B/s]



[{'translation_text': "J'apprends le cours de traitement des langues naturelles (NLP)."}]