The pipeline selects a particular pretrained model that has been finetuned for the task specified (in this case sentiment analysis)

In [1]:
from transformers import pipeline 
classifier = pipeline("sentiment-analysis")
classifier("I've been waiting for Hugging face course my whole life")

  from .autonotebook import tqdm as notebook_tqdm
2024-02-03 22:44:54.048815: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-02-03 22:44:54.048861: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-02-03 22:44:54.050150: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-02-03 22:44:54.056800: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
No 

[{'label': 'POSITIVE', 'score': 0.998302698135376}]

Example of pipelines 

In [4]:
"""
Zero-shot classification: used to classify text that hasn't been labelled.
It allows you to specify which labels to use for the classification so you don't have 
to rely on the labels of the pretrained model
"""
classifier1 = pipeline("zero-shot-classification")
classifier1(
    "This is a course about the Transformers library",
    candidate_labels = ["education", "politics", "business"]
    )

No model was supplied, defaulted to roberta-large-mnli and revision 130fb28 (https://huggingface.co/roberta-large-mnli).
Using a pipeline without specifying a model name and revision in production is not recommended.
All PyTorch model weights were used when initializing TFRobertaForSequenceClassification.

All the weights of TFRobertaForSequenceClassification were initialized from the PyTorch model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFRobertaForSequenceClassification for predictions without further training.


{'sequence': 'This is a course about the Transformers library',
 'labels': ['education', 'business', 'politics'],
 'scores': [0.9562344551086426, 0.02697218582034111, 0.01679333671927452]}

In [5]:
# Text generation 
generator = pipeline("text-generation")
generator("In this course, we will teach you how to")


No model was supplied, defaulted to gpt2 and revision 6c0e608 (https://huggingface.co/gpt2).
Using a pipeline without specifying a model name and revision in production is not recommended.
config.json: 100%|██████████| 665/665 [00:00<00:00, 1.18MB/s]
model.safetensors: 100%|██████████| 548M/548M [01:05<00:00, 8.35MB/s] 
All PyTorch model weights were used when initializing TFGPT2LMHeadModel.

All the weights of TFGPT2LMHeadModel were initialized from the PyTorch model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFGPT2LMHeadModel for predictions without further training.
vocab.json: 100%|██████████| 1.04M/1.04M [00:00<00:00, 1.20MB/s]
merges.txt: 100%|██████████| 456k/456k [00:00<00:00, 1.50MB/s]
tokenizer.json: 100%|██████████| 1.36M/1.36M [00:00<00:00, 2.35MB/s]
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'In this course we will teach you how to create the most stylish design for online services. We will teach you the key things of mobile applications. We will review your application concepts to see what will work best for you, and we will tell you how'}]

We can choose a specific model we want to use in our pipeline 

In [7]:
generator2 = pipeline("text-generation", model="distilgpt2")
generator2(
    "In this course we will teach you how to",
    max_length = 30,
    num_return_sequences = 2,
)

model.safetensors: 100%|██████████| 353M/353M [00:43<00:00, 8.18MB/s] 
All PyTorch model weights were used when initializing TFGPT2LMHeadModel.

All the weights of TFGPT2LMHeadModel were initialized from the PyTorch model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFGPT2LMHeadModel for predictions without further training.
vocab.json: 100%|██████████| 1.04M/1.04M [00:00<00:00, 3.62MB/s]
merges.txt: 100%|██████████| 456k/456k [00:00<00:00, 1.74MB/s]
tokenizer.json: 100%|██████████| 1.36M/1.36M [00:00<00:00, 9.20MB/s]
Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
Setting `pad_token_id` to `eos_token_id`:50256 

[{'generated_text': 'In this course we will teach you how to make a good life and how to make a great home.'},
 {'generated_text': 'In this course we will teach you how to develop your own software on the web with a virtual machine (VMs) designed to be used for business'}]

In [8]:
# Mask filling: Fill in the blanks in a sentence
unmasker = pipeline("fill-mask")
unmasker("This course will teach you all about <mask> models.", top_k=2) #top_k controls how many possibilities you want to be displayed

No model was supplied, defaulted to distilroberta-base and revision ec58a5b (https://huggingface.co/distilroberta-base).
Using a pipeline without specifying a model name and revision in production is not recommended.
config.json: 100%|██████████| 480/480 [00:00<00:00, 849kB/s]
model.safetensors: 100%|██████████| 331M/331M [00:18<00:00, 17.7MB/s] 
All PyTorch model weights were used when initializing TFRobertaForMaskedLM.

All the weights of TFRobertaForMaskedLM were initialized from the PyTorch model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFRobertaForMaskedLM for predictions without further training.
vocab.json: 100%|██████████| 899k/899k [00:00<00:00, 31.9MB/s]
merges.txt: 100%|██████████| 456k/456k [00:00<00:00, 27.5MB/s]
tokenizer.json: 100%|██████████| 1.36M/1.36M [00:00<00:00, 25.8MB/s]


[{'score': 0.19619673490524292,
  'token': 30412,
  'token_str': ' mathematical',
  'sequence': 'This course will teach you all about mathematical models.'},
 {'score': 0.04052700474858284,
  'token': 38163,
  'token_str': ' computational',
  'sequence': 'This course will teach you all about computational models.'}]

In [9]:
# NER: Model has to find which parts of the text corresponds to a person, an organization, a location, a date, etc.
ner = pipeline("ner", grouped_entities=True)
ner("My name is Hawa and I work at Hugging Face in Paris")
# group entities regroups the parts of the text that correspooond to the same entity 

No model was supplied, defaulted to dbmdz/bert-large-cased-finetuned-conll03-english and revision f2482bf (https://huggingface.co/dbmdz/bert-large-cased-finetuned-conll03-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
config.json: 100%|██████████| 998/998 [00:00<00:00, 1.35MB/s]
model.safetensors: 100%|██████████| 1.33G/1.33G [00:44<00:00, 30.1MB/s]
All PyTorch model weights were used when initializing TFBertForTokenClassification.

All the weights of TFBertForTokenClassification were initialized from the PyTorch model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFBertForTokenClassification for predictions without further training.
tokenizer_config.json: 100%|██████████| 60.0/60.0 [00:00<00:00, 253kB/s]
vocab.txt: 100%|██████████| 213k/213k [00:00<00:00, 38.8MB/s]


[{'entity_group': 'PER',
  'score': 0.9984326,
  'word': 'Hawa',
  'start': 11,
  'end': 15},
 {'entity_group': 'ORG',
  'score': 0.98952407,
  'word': 'Hugging Face',
  'start': 30,
  'end': 42},
 {'entity_group': 'LOC',
  'score': 0.9970331,
  'word': 'Paris',
  'start': 46,
  'end': 51}]

In [11]:
# Question answering 
question_answer = pipeline("question-answering")
question_answer(
    question="Where do I work?",
    context="My name is Hawa and I work at Hugging Face in Paris"
)

No model was supplied, defaulted to distilbert-base-cased-distilled-squad and revision 626af31 (https://huggingface.co/distilbert-base-cased-distilled-squad).
Using a pipeline without specifying a model name and revision in production is not recommended.
All PyTorch model weights were used when initializing TFDistilBertForQuestionAnswering.

All the weights of TFDistilBertForQuestionAnswering were initialized from the PyTorch model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFDistilBertForQuestionAnswering for predictions without further training.


{'score': 0.521573543548584,
 'start': 30,
 'end': 51,
 'answer': 'Hugging Face in Paris'}

In [12]:
# Summarization
summarizer = pipeline("summarization")
summarizer(
      """
    America has changed dramatically during recent years. Not only has the number of 
    graduates in traditional engineering disciplines such as mechanical, civil, 
    electrical, chemical, and aeronautical engineering declined, but in most of 
    the premier American universities engineering curricula now concentrate on 
    and encourage largely the study of engineering science. As a result, there 
    are declining offerings in engineering subjects dealing with infrastructure, 
    the environment, and related issues, and greater concentration on high 
    technology subjects, largely supporting increasingly complex scientific 
    developments. While the latter is important, it should not be at the expense 
    of more traditional engineering.

    Rapidly developing economies such as China and India, as well as other 
    industrial countries in Europe and Asia, continue to encourage and advance 
    the teaching of engineering. Both China and India, respectively, graduate 
    six and eight times as many traditional engineers as does the United States. 
    Other industrial countries at minimum maintain their output, while America 
    suffers an increasingly serious decline in the number of engineering graduates 
    and a lack of well-educated engineers.
"""
)

No model was supplied, defaulted to t5-small and revision d769bba (https://huggingface.co/t5-small).
Using a pipeline without specifying a model name and revision in production is not recommended.
config.json: 100%|██████████| 1.21k/1.21k [00:00<00:00, 990kB/s]
model.safetensors: 100%|██████████| 242M/242M [00:12<00:00, 18.8MB/s] 
All PyTorch model weights were used when initializing TFT5ForConditionalGeneration.

All the weights of TFT5ForConditionalGeneration were initialized from the PyTorch model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFT5ForConditionalGeneration for predictions without further training.
tokenizer_config.json: 100%|██████████| 2.32k/2.32k [00:00<00:00, 7.86MB/s]
spiece.model: 100%|██████████| 792k/792k [00:01<00:00, 558kB/s]
tokenizer.json: 100%|██████████| 1.39M/1.39M [00:00<00:00, 27.9MB/s]
2024-02-03 23:08:57.261497: I external/local_xla/xla/service/service.cc:168] XLA service 0x55f97ce85490 initializ

[{'summary_text': 'the number of graduates in traditional engineering disciplines has declined . in most of the premier american universities engineering curricula now concentrate on and encourage largely the study of engineering science . rapidly developing economies such as China and India continue to encourage and advance the teaching of engineering .'}]

In [13]:
# Translation
translator = pipeline("translation", model="Helsinki-NLP/opus-mt-fr-en")
translator("Ce cours est produit par hugging face.")

config.json: 100%|██████████| 1.42k/1.42k [00:00<00:00, 3.10MB/s]
tf_model.h5: 100%|██████████| 301M/301M [00:16<00:00, 18.2MB/s] 
All model checkpoint layers were used when initializing TFMarianMTModel.

All the layers of TFMarianMTModel were initialized from the model checkpoint at Helsinki-NLP/opus-mt-fr-en.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFMarianMTModel for predictions without further training.
generation_config.json: 100%|██████████| 293/293 [00:00<00:00, 395kB/s]
tokenizer_config.json: 100%|██████████| 42.0/42.0 [00:00<00:00, 72.1kB/s]
source.spm: 100%|██████████| 802k/802k [00:00<00:00, 25.8MB/s]
target.spm: 100%|██████████| 778k/778k [00:00<00:00, 12.4MB/s]
vocab.json: 100%|██████████| 1.34M/1.34M [00:02<00:00, 606kB/s]


[{'translation_text': 'This course is produced by hugging face.'}]