# Pipeline

Pipeline is the most basic object in transformers. It takes in the input, pre processes it, and then passes it to the model. It also handles the post processing and gives us a clean output.

In [3]:
#Import pipeline
from transformers import pipeline

#Create a classifier, with the pipeline we need
classifier = pipeline("sentiment-analysis")

#Its better to have your pipeline in a seperate cell, because it takes around 10 to 20 seconds to load.

No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english)


In [5]:
#Now we can pass a text to the classifier and get the result
#This classification takes less than a second
classifier("I've been waiting for a HuggingFace course my whole life.")

[{'label': 'POSITIVE', 'score': 0.9598050713539124}]

In [7]:
#We can pass several sentences too

classifier(
    ["I've been waiting for a HuggingFace course my whole life.", "I hate this so much!"]
)

[{'label': 'POSITIVE', 'score': 0.9598050713539124},
 {'label': 'NEGATIVE', 'score': 0.9994558691978455}]

# Testing out different Pipelines

## Zero-shot classification

This allows you to classify text based on the labels you give it.

In [9]:
from transformers import pipeline

classifier = pipeline("zero-shot-classification")


No model was supplied, defaulted to facebook/bart-large-mnli (https://huggingface.co/facebook/bart-large-mnli)


In [10]:
classifier(
    "This is a course about the Transformers library",
    candidate_labels=["education", "politics", "business"],
)

{'sequence': 'This is a course about the Transformers library',
 'labels': ['education', 'business', 'politics'],
 'scores': [0.8445987701416016, 0.11197426170110703, 0.043426912277936935]}

## Text generation

This is exactly what it sounds like.

In [13]:
from transformers import pipeline

generator = pipeline("text-generation")

No model was supplied, defaulted to gpt2 (https://huggingface.co/gpt2)


In [16]:
text = generator("In this course, we will teach you how to")
print(text)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'In this course, we will teach you how to use Linux to build new Linux applications. As your first step in building your own applications, you will learn how to download and install various kinds of Linux versions, and how to develop them.'}]


## Using a custom model from HuggingFace in earlier examples

In [17]:
from transformers import pipeline

generator = pipeline("text-generation", model="distilgpt2")

Downloading: 100%|██████████| 762/762 [00:00<00:00, 540kB/s]
Downloading: 100%|██████████| 336M/336M [00:14<00:00, 24.0MB/s] 
Downloading: 100%|██████████| 0.99M/0.99M [00:01<00:00, 789kB/s] 
Downloading: 100%|██████████| 446k/446k [00:01<00:00, 386kB/s]  
Downloading: 100%|██████████| 1.29M/1.29M [00:01<00:00, 997kB/s] 


In [19]:
generator(
    "In this course, we will teach you how to",
    max_length=30,
    num_return_sequences=2,
)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'In this course, we will teach you how to play the video game style of your game. By learning how to play and be able to play with'},
 {'generated_text': 'In this course, we will teach you how to build a great computer.\n\n\n\nThis course will be taught in Python 2 and includes a'}]

## Mask filling

In other words, its called as fill in the blanks

In [22]:
from transformers import pipeline

unmasker = pipeline("fill-mask")

No model was supplied, defaulted to distilroberta-base (https://huggingface.co/distilroberta-base)


In [23]:
unmasker("This course will teach you all about <mask> models.", top_k=2)

[{'score': 0.19619831442832947,
  'token': 30412,
  'token_str': ' mathematical',
  'sequence': 'This course will teach you all about mathematical models.'},
 {'score': 0.04052725434303284,
  'token': 38163,
  'token_str': ' computational',
  'sequence': 'This course will teach you all about computational models.'}]

## Named entity recognition

Finds the entities in the text, like name, location and so on.

In [39]:
from transformers import pipeline

ner = pipeline("ner", grouped_entities=True)

No model was supplied, defaulted to dbmdz/bert-large-cased-finetuned-conll03-english (https://huggingface.co/dbmdz/bert-large-cased-finetuned-conll03-english)


In [40]:
ner("I'm Bruno Mars, and I work at Microsoft.")

[{'entity_group': 'PER',
  'score': 0.99904335,
  'word': 'Bruno Mars',
  'start': 4,
  'end': 14},
 {'entity_group': 'ORG',
  'score': 0.9994773,
  'word': 'Microsoft',
  'start': 30,
  'end': 39}]

## Question answering

Given some certain context, it can answer the some questions.

In [41]:
from transformers import pipeline

question_answerer = pipeline("question-answering")

No model was supplied, defaulted to distilbert-base-cased-distilled-squad (https://huggingface.co/distilbert-base-cased-distilled-squad)
Downloading: 100%|██████████| 473/473 [00:00<00:00, 80.4kB/s]
Downloading: 100%|██████████| 249M/249M [00:11<00:00, 23.0MB/s] 
Downloading: 100%|██████████| 29.0/29.0 [00:00<00:00, 21.8kB/s]
Downloading: 100%|██████████| 208k/208k [00:00<00:00, 234kB/s]  
Downloading: 100%|██████████| 426k/426k [00:01<00:00, 374kB/s]  


In [43]:
question_answerer(
    question="Where do I work?",
    context="My name is SuperSecureHuman and I work at Bonk Industries",
)

{'score': 0.9459362626075745,
 'start': 42,
 'end': 57,
 'answer': 'Bonk Industries'}

## Summarization

For those who are lazy to read some huge texts :)

In [1]:
from transformers import pipeline

summarizer = pipeline("summarization")

No model was supplied, defaulted to sshleifer/distilbart-cnn-12-6 (https://huggingface.co/sshleifer/distilbart-cnn-12-6)


In [47]:
summarizer(
"""
A transformer is a passive component that transfers electrical energy from one electrical circuit to another circuit, or multiple circuits. A varying current in any coil of the transformer produces a varying magnetic flux in the transformer's core, which induces a varying electromotive force across any other coils wound around the same core. Electrical energy can be transferred between separate coils without a metallic (conductive) connection between the two circuits. Faraday's law of induction, discovered in 1831, describes the induced voltage effect in any coil due to a changing magnetic flux encircled by the coil.
"""
)

Your max_length is set to 142, but you input_length is only 118. You might consider decreasing max_length manually, e.g. summarizer('...', max_length=59)


[{'summary_text': " A transformer is a passive component that transfers electrical energy from one electrical circuit to another circuit . Electrical energy can be transferred between separate coils without a metallic (conductive) connection between the two circuits . Faraday's law of induction, discovered in 1831, describes the induced voltage effect in any coil due to a changing magnetic flux encircled by the coil ."}]

## Translate Text

Time to make the next google translate.

In [49]:
from transformers import pipeline

translator = pipeline("translation", model="Helsinki-NLP/opus-mt-fr-en")


In [51]:
translator("Ca va bien?")

[{'translation_text': 'Are you all right?'}]