# Huggings Face - Pipeline

## Inputs

In [3]:
import transformers

In [4]:
from transformers import pipeline

## Sentiment Analysis

In [5]:
# pipeline selects a particular pretrained model that has been fine-tuned for sentiment analysis in English.
classifier = pipeline("sentiment-analysis")

No model was supplied, defaulted to distilbert/distilbert-base-uncased-finetuned-sst-2-english and revision 714eb0f (https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


In [6]:
classifier("I've been waiting for a HuggingFace course my whole life.")

[{'label': 'POSITIVE', 'score': 0.9598050713539124}]

In [7]:
classifier(
    ["I've been waiting for a HuggingFace course my whole life.", "I hate this so much!"]
)

[{'label': 'POSITIVE', 'score': 0.9598050713539124},
 {'label': 'NEGATIVE', 'score': 0.9994558691978455}]

## Zero-shot classification
It allows you to specify which labels to use for the classification, so you don’t have to rely on the labels of the pretrained model

In [8]:
zeroshotclassifier = pipeline("zero-shot-classification")

No model was supplied, defaulted to facebook/bart-large-mnli and revision d7645e1 (https://huggingface.co/facebook/bart-large-mnli).
Using a pipeline without specifying a model name and revision in production is not recommended.


In [9]:
zeroshotclassifier(
    "This is a course about the Transformers library",
    candidate_labels=["education", "politics", "business"],
)

{'sequence': 'This is a course about the Transformers library',
 'labels': ['education', 'business', 'politics'],
 'scores': [0.8445988893508911, 0.11197422444820404, 0.04342690482735634]}

## Text generation
The main idea here is that you provide a prompt and the model will auto-complete it by generating the remaining text.

In [10]:
generator = pipeline("text-generation")

No model was supplied, defaulted to openai-community/gpt2 and revision 607a30d (https://huggingface.co/openai-community/gpt2).
Using a pipeline without specifying a model name and revision in production is not recommended.


config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/548M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

In [12]:
generator("In this course, we will teach you how to", num_return_sequences=2, max_length=100)

Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.


[{'generated_text': 'In this course, we will teach you how to work with your environment to solve problems on your own.\n\n\nThis class is designed for business to help you understand how to work effectively with your environment.\n\n\nWe will teach you to work effectively with the company as it goes about its business.\n\n\nWe will teach you the fundamentals and what to expect of companies in regards to business.\n\n\nWe will learn how to find and sell in a friendly environment.\n\n\nWe will learn how'},
 {'generated_text': "In this course, we will teach you how to create a beautiful web application from scratch, using Eclipse as a starting point. You'll understand how to easily start your projects and how to create an application using multiple browser plugins that can run concurrently, as well as how to build your application in Eclipse.\n\nPrerequisites\n\nThe only prerequisite to successfully make the application, is a Web browser. We will be using an old-school web browser called

## Select specific model

In [13]:
generator_with_distilgpt2 = pipeline("text-generation", model="distilgpt2")

config.json:   0%|          | 0.00/762 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/353M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

In [14]:
generator_with_distilgpt2(
    "In this course, we will teach you how to",
    max_length=30,
    num_return_sequences=2,
)

Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.


[{'generated_text': 'In this course, we will teach you how to control yourself when choosing a specific set of exercises for use in physical training.\n\n\n\n\n'},
 {'generated_text': 'In this course, we will teach you how to generate your own video file using YouTube video or by using an iOS app in order to view video files'}]

## Mask filling
The idea of this task is to fill in the blanks in a given text.

In [15]:
unmasker = pipeline("fill-mask")

No model was supplied, defaulted to distilbert/distilroberta-base and revision fb53ab8 (https://huggingface.co/distilbert/distilroberta-base).
Using a pipeline without specifying a model name and revision in production is not recommended.


config.json:   0%|          | 0.00/480 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/331M [00:00<?, ?B/s]

Some weights of the model checkpoint at distilbert/distilroberta-base were not used when initializing RobertaForMaskedLM: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
- This IS expected if you are initializing RobertaForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


tokenizer_config.json:   0%|          | 0.00/25.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

In [16]:
unmasker("This course will teach you all about <mask> models.", top_k=2)

[{'score': 0.19198444485664368,
  'token': 30412,
  'token_str': ' mathematical',
  'sequence': 'This course will teach you all about mathematical models.'},
 {'score': 0.04209214076399803,
  'token': 38163,
  'token_str': ' computational',
  'sequence': 'This course will teach you all about computational models.'}]

In [17]:
unmasker("This product contains milk<mask>chocolate<mask>fruit<mask>juice<mask>water<mask>sugar", top_k=2)

[[{'score': 0.43303748965263367,
   'token': 50118,
   'token_str': '\n',
   'sequence': '<s>This product contains milk\nchocolate<mask>fruit<mask>juice<mask>water<mask>sugar</s>'},
  {'score': 0.14356417953968048,
   'token': 73,
   'token_str': '/',
   'sequence': '<s>This product contains milk/chocolate<mask>fruit<mask>juice<mask>water<mask>sugar</s>'}],
 [{'score': 0.7519227266311646,
   'token': 24410,
   'token_str': ' grape',
   'sequence': '<s>This product contains milk<mask>chocolate grapefruit<mask>juice<mask>water<mask>sugar</s>'},
  {'score': 0.0906941294670105,
   'token': 10267,
   'token_str': ' jack',
   'sequence': '<s>This product contains milk<mask>chocolate jackfruit<mask>juice<mask>water<mask>sugar</s>'}],
 [{'score': 0.24547459185123444,
   'token': 73,
   'token_str': '/',
   'sequence': '<s>This product contains milk<mask>chocolate<mask>fruit/juice<mask>water<mask>sugar</s>'},
  {'score': 0.06125649809837341,
   'token': 24410,
   'token_str': ' grape',
   'sequ

## Named entity recognition
The model has to find which parts of the input text correspond to entities such as persons, locations, or organizations.

In [18]:
ner = pipeline("ner", grouped_entities=True)

No model was supplied, defaulted to dbmdz/bert-large-cased-finetuned-conll03-english and revision 4c53496 (https://huggingface.co/dbmdz/bert-large-cased-finetuned-conll03-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


config.json:   0%|          | 0.00/998 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.33G [00:00<?, ?B/s]

Some weights of the model checkpoint at dbmdz/bert-large-cased-finetuned-conll03-english were not used when initializing BertForTokenClassification: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


tokenizer_config.json:   0%|          | 0.00/60.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/213k [00:00<?, ?B/s]



In [19]:
ner("My name is Sylvain and I work at Hugging Face in Brooklyn.")


[{'entity_group': 'PER',
  'score': 0.9981694,
  'word': 'Sylvain',
  'start': 11,
  'end': 18},
 {'entity_group': 'ORG',
  'score': 0.9796019,
  'word': 'Hugging Face',
  'start': 33,
  'end': 45},
 {'entity_group': 'LOC',
  'score': 0.9932106,
  'word': 'Brooklyn',
  'start': 49,
  'end': 57}]

## Question answering


In [20]:
question_answerer = pipeline("question-answering")

No model was supplied, defaulted to distilbert/distilbert-base-cased-distilled-squad and revision 564e9b5 (https://huggingface.co/distilbert/distilbert-base-cased-distilled-squad).
Using a pipeline without specifying a model name and revision in production is not recommended.


config.json:   0%|          | 0.00/473 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/261M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/49.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/213k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/436k [00:00<?, ?B/s]

In [21]:
question_answerer(
    question="Where do I work?",
    context="My name is Sylvain and I work at Hugging Face in Brooklyn",
)

{'score': 0.6949769854545593, 'start': 33, 'end': 45, 'answer': 'Hugging Face'}

In [26]:
question_answerer(
    question="What is Lübeck known for?",
    context="The Hanseatic city of Lübeck (Low German: Lübęk, Lübeek; Latin: Lubeca; Polish: Liubice; adjective: lübsch, lübisch, lübeckisch) is an independent city in northern Germany. It is located in the south-east of Schleswig-Holstein on the Bay of Lübeck, a bay of the Baltic Sea. Lübeck is the second largest city in Schleswig-Holstein in terms of population after the state capital Kiel and the largest in terms of area. The Hanseatic city on the Trave is one of the state's four regional centers, with a port, a specialized university with a hospital, a technical university and a music academy. The Lübeck Theater, the Music and Congress Hall and fifteen museums offer a rich cultural offering. Lübeck was founded in 1143 at its present location and was a city-state from 1226 to 1937. As the capital of the Hanseatic League, Lübeck was one of the most important cities in Northern Europe in the 13th and 14th centuries. The preserved areas of Lübeck's old town with over a thousand cultural monuments have been a UNESCO World Heritage Site since 1987. These include some important buildings of medieval brick architecture, such as St. Mary's Church in Lübeck, the town hall, one of the oldest cathedrals on the Baltic Sea, the Holy Ghost Hospital, the Holsten Gate and the castle gate. The city is also famous for its seven towers and Lübeck marzipan.",
)

{'score': 0.43178874254226685,
 'start': 1316,
 'end': 1328,
 'answer': 'seven towers'}

In [27]:
question_answerer(
    question="Which food is fames in Lübeck?",
    context="The Hanseatic city of Lübeck (Low German: Lübęk, Lübeek; Latin: Lubeca; Polish: Liubice; adjective: lübsch, lübisch, lübeckisch) is an independent city in northern Germany. It is located in the south-east of Schleswig-Holstein on the Bay of Lübeck, a bay of the Baltic Sea. Lübeck is the second largest city in Schleswig-Holstein in terms of population after the state capital Kiel and the largest in terms of area. The Hanseatic city on the Trave is one of the state's four regional centers, with a port, a specialized university with a hospital, a technical university and a music academy. The Lübeck Theater, the Music and Congress Hall and fifteen museums offer a rich cultural offering. Lübeck was founded in 1143 at its present location and was a city-state from 1226 to 1937. As the capital of the Hanseatic League, Lübeck was one of the most important cities in Northern Europe in the 13th and 14th centuries. The preserved areas of Lübeck's old town with over a thousand cultural monuments have been a UNESCO World Heritage Site since 1987. These include some important buildings of medieval brick architecture, such as St. Mary's Church in Lübeck, the town hall, one of the oldest cathedrals on the Baltic Sea, the Holy Ghost Hospital, the Holsten Gate and the castle gate. The city is also famous for its seven towers and Lübeck marzipan.",
)

{'score': 0.8702612519264221,
 'start': 1333,
 'end': 1348,
 'answer': 'Lübeck marzipan'}

## Summarization
Summarization is the task of reducing a text into a shorter text while keeping all (or most) of the important aspects referenced in the text.

In [28]:
summarizer = pipeline("summarization")

No model was supplied, defaulted to sshleifer/distilbart-cnn-12-6 and revision a4f8f3e (https://huggingface.co/sshleifer/distilbart-cnn-12-6).
Using a pipeline without specifying a model name and revision in production is not recommended.


config.json:   0%|          | 0.00/1.80k [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/1.22G [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

In [30]:
summarizer("Costumed men beat women with cow horns: this brutal custom, the “Klaasohm” festival, has a long tradition on Borkum. Following a Panorama report, there is great outrage - and the organizers are taking action.After a report about a brutal St. Nicholas custom on the North Sea island of Borkum, the organizers and the police have announced consequences. The police announced that there would be no more violence at the “Klaasohm” festival. The event will be accompanied by numerous police officers. “We have a zero-tolerance policy,” said a police spokesperson, and the police encouraged women who have experienced violence during the custom to press charges. “Anyone who has been a victim should not be afraid,” emphasized the police spokesperson. Offenses such as assault or grievous bodily harm are only time-barred after 20 to 30 years. Beatings with cow hornsThe “Klaasohm” festival on the evening of December 5 has a long tradition on Borkum. Part of the festival is a procession in which women are caught and beaten with cow horns by costumed men, the “Klaasohms”, after a report on the tradition by the ARD magazine Panorama caused outrage across Germany. ", max_length=100, min_length=50)

[{'summary_text': ' A report about a brutal St. Nicholas custom on the island of Borkum caused outrage across Germany . The police announced that there would be no more violence at the “Klaasohm” festival . Police encouraged women who have experienced violence during the custom to press charges .'}]

## Translation

In [35]:
translator = pipeline("translation", model="Helsinki-NLP/opus-mt-fr-en")

In [36]:
translator("Ce cours est produit par Hugging Face.")

[{'translation_text': 'This course is produced by Hugging Face.'}]