# Hugging Face Transformers Pipeline Examples
This notebook demonstrates examples of different pipelines from the Hugging Face Transformers library, inspired by the [HuggingFace NLP Course](https://huggingface.co/learn/nlp-course/en/chapter1/3).

## Feature Extraction

In [None]:
from transformers import pipeline

feature_extractor = pipeline("feature-extraction")
text = "Transformers are powerful models for natural language processing."
features = feature_extractor(text)
print(features)


No model was supplied, defaulted to distilbert/distilbert-base-cased and revision 6ea8117 (https://huggingface.co/distilbert/distilbert-base-cased).
Using a pipeline without specifying a model name and revision in production is not recommended.
Device set to use cpu


[[[0.5115844011306763, 0.04001573100686073, 0.048634883016347885, -0.31373000144958496, -0.5183566808700562, -0.13176031410694122, 0.17424488067626953, -0.07944492995738983, 0.2053682655096054, -1.2887464761734009, -0.5515176653862, 0.006443561986088753, -0.13175353407859802, 0.05151928588747978, -0.6901046633720398, 0.007790588308125734, 0.3300333023071289, 0.2608015239238739, -0.0473315455019474, -0.2897743284702301, 0.12167643755674362, -0.16516034305095673, 0.5798889994621277, -0.27727168798446655, 0.1875155121088028, -0.07988892495632172, 0.4408264756202698, 0.12077152729034424, -0.15416131913661957, 0.2977027893066406, -0.015281862579286098, 0.23542667925357819, 0.005821977276355028, -0.04860702529549599, -0.31440430879592896, 0.12098716199398041, -0.08264835178852081, -0.33412182331085205, -0.19794556498527527, -0.11043064296245575, -0.39108461141586304, 0.15656737983226776, 0.7110542058944702, -0.11396275460720062, 0.07718120515346527, -0.3446686267852783, -0.05199480429291725,

In [None]:
from transformers import pipeline

feature_extractor = pipeline("feature-extraction")
text = "CPP is a great college."
features = feature_extractor(text)
print(features)

No model was supplied, defaulted to distilbert/distilbert-base-cased and revision 6ea8117 (https://huggingface.co/distilbert/distilbert-base-cased).
Using a pipeline without specifying a model name and revision in production is not recommended.
Device set to use cpu


[[[0.3971414566040039, 0.0387209914624691, -0.15291063487529755, -0.04688560217618942, -0.2999558448791504, -0.12602372467517853, 0.18216072022914886, -0.01665571704506874, -0.07056694477796555, -0.9736645221710205, -0.1956232637166977, -0.04907802492380142, -0.25109541416168213, -0.0782897025346756, -0.4497322738170624, 0.023899663239717484, 0.13205568492412567, -0.03560547158122063, -0.07035867869853973, 0.04044870659708977, 0.0014224074548110366, -0.21667976677417755, 0.49718406796455383, -0.2876790463924408, 0.27582797408103943, -0.11969372630119324, 0.2426980435848236, 0.3697463274002075, -0.15134930610656738, 0.19788649678230286, -0.22976331412792206, 0.2842572033405304, -0.09884006530046463, 0.11169132590293884, -0.301175981760025, 0.40446737408638, 0.10747236013412476, -0.3289119005203247, -0.0061807939782738686, -0.16862930357456207, -0.28093552589416504, 0.23115859925746918, 0.5681968927383423, 0.11964476108551025, -0.07357995212078094, -0.5085370540618896, 0.0397261120378971

## Fill-Mask

In [None]:
from transformers import pipeline

unmasker = pipeline("fill-mask")
unmasker("This course will teach you all about <mask> models.", top_k=2)

No model was supplied, defaulted to distilbert/distilroberta-base and revision fb53ab8 (https://huggingface.co/distilbert/distilroberta-base).
Using a pipeline without specifying a model name and revision in production is not recommended.
Some weights of the model checkpoint at distilbert/distilroberta-base were not used when initializing RobertaForMaskedLM: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
- This IS expected if you are initializing RobertaForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Device set to use cpu


[{'score': 0.19619767367839813,
  'token': 30412,
  'token_str': ' mathematical',
  'sequence': 'This course will teach you all about mathematical models.'},
 {'score': 0.04052715748548508,
  'token': 38163,
  'token_str': ' computational',
  'sequence': 'This course will teach you all about computational models.'}]

In [None]:
from transformers import pipeline

unmasker = pipeline("fill-mask")
unmasker("Transformers are revolutionizing <mask> processing.", top_k=3)


No model was supplied, defaulted to distilbert/distilroberta-base and revision fb53ab8 (https://huggingface.co/distilbert/distilroberta-base).
Using a pipeline without specifying a model name and revision in production is not recommended.
Some weights of the model checkpoint at distilbert/distilroberta-base were not used when initializing RobertaForMaskedLM: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
- This IS expected if you are initializing RobertaForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Device set to use cpu


[{'score': 0.2031552940607071,
  'token': 6029,
  'token_str': ' signal',
  'sequence': 'Transformers are revolutionizing signal processing.'},
 {'score': 0.1361866444349289,
  'token': 2274,
  'token_str': ' image',
  'sequence': 'Transformers are revolutionizing image processing.'},
 {'score': 0.09854312241077423,
  'token': 414,
  'token_str': ' data',
  'sequence': 'Transformers are revolutionizing data processing.'}]

## Named Entity Recognition (NER)

In [None]:
from transformers import pipeline

ner = pipeline("ner", grouped_entities=True)
ner("My name is Sylvain and I work at Hugging Face in Brooklyn.")

No model was supplied, defaulted to dbmdz/bert-large-cased-finetuned-conll03-english and revision 4c53496 (https://huggingface.co/dbmdz/bert-large-cased-finetuned-conll03-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
Some weights of the model checkpoint at dbmdz/bert-large-cased-finetuned-conll03-english were not used when initializing BertForTokenClassification: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Device set to use cpu


[{'entity_group': 'PER',
  'score': np.float32(0.9981694),
  'word': 'Sylvain',
  'start': 11,
  'end': 18},
 {'entity_group': 'ORG',
  'score': np.float32(0.9796019),
  'word': 'Hugging Face',
  'start': 33,
  'end': 45},
 {'entity_group': 'LOC',
  'score': np.float32(0.9932106),
  'word': 'Brooklyn',
  'start': 49,
  'end': 57}]

In [None]:
ner = pipeline("ner", grouped_entities=True)
ner("Barack Obama was the 44th president of the United States.")

No model was supplied, defaulted to dbmdz/bert-large-cased-finetuned-conll03-english and revision 4c53496 (https://huggingface.co/dbmdz/bert-large-cased-finetuned-conll03-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
Some weights of the model checkpoint at dbmdz/bert-large-cased-finetuned-conll03-english were not used when initializing BertForTokenClassification: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Device set to use cpu


[{'entity_group': 'PER',
  'score': np.float32(0.99918306),
  'word': 'Barack Obama',
  'start': 0,
  'end': 12},
 {'entity_group': 'LOC',
  'score': np.float32(0.9986908),
  'word': 'United States',
  'start': 43,
  'end': 56}]

## Question Answering

In [None]:
question_answerer = pipeline("question-answering")
question_answerer(
    question="Where is Hugging Face based?",
    context = "Hugging Face Inc. is a company based in New York City.",
)

No model was supplied, defaulted to distilbert/distilbert-base-cased-distilled-squad and revision 564e9b5 (https://huggingface.co/distilbert/distilbert-base-cased-distilled-squad).
Using a pipeline without specifying a model name and revision in production is not recommended.
Device set to use cpu


{'score': 0.9631800651550293,
 'start': 40,
 'end': 53,
 'answer': 'New York City'}

In [None]:
question_answerer = pipeline("question-answering")
question_answerer(
    question="What college do I go to?",
    context="My name is Clarissa and I am currently a student at Cal Poly Pomona.",
)

No model was supplied, defaulted to distilbert/distilbert-base-cased-distilled-squad and revision 564e9b5 (https://huggingface.co/distilbert/distilbert-base-cased-distilled-squad).
Using a pipeline without specifying a model name and revision in production is not recommended.
Device set to use cpu


{'score': 0.9737498760223389,
 'start': 52,
 'end': 67,
 'answer': 'Cal Poly Pomona'}

## Sentiment Analysis

In [None]:
sentiment = pipeline("sentiment-analysis")
print(sentiment("I love watching movies with my family!"))

No model was supplied, defaulted to distilbert/distilbert-base-uncased-finetuned-sst-2-english and revision 714eb0f (https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
Device set to use cpu


[{'label': 'POSITIVE', 'score': 0.9998006224632263}]


In [None]:
sentiment = pipeline("sentiment-analysis")
print(sentiment("This movie was terrible and boring."))

No model was supplied, defaulted to distilbert/distilbert-base-uncased-finetuned-sst-2-english and revision 714eb0f (https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
Device set to use cpu


[{'label': 'NEGATIVE', 'score': 0.9997683167457581}]


## Summarization

In [None]:
from transformers import pipeline

summarizer = pipeline("summarization")
summarizer(
    """
Soccer, known as football in most parts of the world, is the most popular and widely followed sport globally, with billions of fans and participants. The game is played between two teams of eleven players on a rectangular field with a goal at each end. It is governed by a set of rules known as the Laws of the Game, which are maintained by the International Football Association Board. Soccer is admired not only for its simplicity but also for the deep strategy and teamwork it requires. From grassroots amateur leagues to elite professional clubs, the sport serves as a unifying force across nations, cultures, and languages. Major international tournaments such as the FIFA World Cup, held every four years, and continental competitions like the UEFA European Championship and Copa América, attract massive global audiences and foster intense national pride. The sport's accessibility, requiring little more than a ball and open space, contributes to its universal appeal. Soccer has also evolved into a massive commercial enterprise, with top clubs and players generating significant revenues through sponsorships, broadcasting rights, and merchandise. Beyond the financial aspect, the sport has a profound social and cultural impact, often acting as a platform for social change, community development, and the promotion of health and fitness. With the continued rise of women’s soccer and technological advancements such as VAR (Video Assistant Referee), the game is constantly evolving, yet its essence—a shared passion for the beautiful game—remains unchanged.
"""
)

No model was supplied, defaulted to sshleifer/distilbart-cnn-12-6 and revision a4f8f3e (https://huggingface.co/sshleifer/distilbart-cnn-12-6).
Using a pipeline without specifying a model name and revision in production is not recommended.


config.json:   0%|          | 0.00/1.80k [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/1.22G [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.22G [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

Device set to use cpu


[{'summary_text': ' Soccer is admired not only for its simplicity but also for its deep strategy and teamwork it requires . The game is played between two teams of eleven players on a rectangular field with a goal at each end . Major international tournaments such as the FIFA World Cup attract massive global audiences and foster intense national pride .'}]

In [None]:
from transformers import pipeline

summarizer = pipeline("summarization")
summarizer(
    """
Mexico, officially known as the United Mexican States, is a vibrant and culturally rich country located in the southern portion of North America. It shares its northern border with the United States, and its southern region connects with Central American countries like Guatemala and Belize. Known for its diverse landscapes, Mexico is home to deserts, mountains, jungles, and coastlines along both the Pacific Ocean and the Gulf of Mexico. The country has a deep historical legacy, dating back thousands of years to the civilizations of the Olmec, Maya, and Aztec, whose influence is still visible today in architecture, traditions, and archaeological sites such as Chichen Itza and Teotihuacan. Modern Mexico is a blend of indigenous heritage and Spanish colonial influence, reflected in its language, cuisine, music, and religious practices. Cities like Mexico City, Guadalajara, and Monterrey are major economic and cultural hubs, showcasing a mixture of ancient tradition and contemporary urban development. Mexican cuisine, recognized by UNESCO as an Intangible Cultural Heritage of Humanity, is known for its bold flavors and regional diversity, with dishes like tacos, tamales, mole, and pozole. The country also celebrates colorful festivals such as Día de los Muertos (Day of the Dead), which honors deceased loved ones with altars, marigolds, and offerings. Despite facing social and economic challenges, including inequality and issues related to organized crime, Mexico continues to grow as a key player in global trade, tourism, and cultural exchange. Its people are known for their warmth, resilience, and pride in a national identity that bridges ancient civilizations and modern aspirations.
"""
)

No model was supplied, defaulted to sshleifer/distilbart-cnn-12-6 and revision a4f8f3e (https://huggingface.co/sshleifer/distilbart-cnn-12-6).
Using a pipeline without specifying a model name and revision in production is not recommended.
Device set to use cpu


[{'summary_text': ' Mexico, officially known as the United Mexican States, is a vibrant and culturally rich country . Modern Mexico is a blend of indigenous heritage and Spanish colonial influence . Cities like Mexico City, Guadalajara, and Monterrey are major economic and cultural hubs . Despite facing social and economic challenges, Mexico continues to grow as a key player in global trade, tourism .'}]

## Text Generation

In [None]:
from transformers import pipeline

generator = pipeline("text-generation")
generator("In this course, we will teach you how to")

No model was supplied, defaulted to openai-community/gpt2 and revision 607a30d (https://huggingface.co/openai-community/gpt2).
Using a pipeline without specifying a model name and revision in production is not recommended.
Device set to use cpu
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'In this course, we will teach you how to create and complete the online project to take advantage of this powerful database that will make all the difference!\n\nAs we explore what you will need, you can check out the full plan that this online'}]

In [None]:
generator = pipeline("text-generation")
generator("AI will take over the")

No model was supplied, defaulted to openai-community/gpt2 and revision 607a30d (https://huggingface.co/openai-community/gpt2).
Using a pipeline without specifying a model name and revision in production is not recommended.
Device set to use cpu
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': "AI will take over the reins of the organisation by 2020, making it the'most efficient and responsive' IT provider.\n\nOn Thursday night, the IT department in Islamabad, Pakistan's leading IT outfit, said the Indian National Security Council had sent"}]

## Translation

In [None]:
from transformers import pipeline
# Create a translation pipeline from English to Spanish
translator = pipeline("translation", model="Helsinki-NLP/opus-mt-en-es")
translator("CS4650 is an interesting topic.")


Device set to use cpu


[{'translation_text': 'CS4650 es un tema interesante.'}]

In [None]:
from transformers import pipeline
# Create a translation pipeline from English to Chinese
translator = pipeline("translation", model="Helsinki-NLP/opus-mt-en-zh")
# Translate an English sentence to Chinese
result = translator("Messi is the greatest player alive in soccer.")
print(result)


config.json:   0%|          | 0.00/1.40k [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/312M [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/312M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/293 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/44.0 [00:00<?, ?B/s]

source.spm:   0%|          | 0.00/806k [00:00<?, ?B/s]

target.spm:   0%|          | 0.00/805k [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.62M [00:00<?, ?B/s]

Device set to use cpu


[{'translation_text': '梅西是足球界最伟大的球员'}]


## Zero-Shot Classification

In [None]:
from transformers import pipeline

classifier = pipeline("zero-shot-classification")
classifier(
    "This is a course about natural language processing.",
    candidate_labels=["education", "politics", "business"]
)


No model was supplied, defaulted to facebook/bart-large-mnli and revision d7645e1 (https://huggingface.co/facebook/bart-large-mnli).
Using a pipeline without specifying a model name and revision in production is not recommended.


config.json:   0%|          | 0.00/1.15k [00:00<?, ?B/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


model.safetensors:   0%|          | 0.00/1.63G [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

Device set to use cpu


{'sequence': 'This is a course about natural language processing.',
 'labels': ['education', 'business', 'politics'],
 'scores': [0.7542569041252136, 0.18482503294944763, 0.06091810017824173]}

In [None]:
classifier = pipeline("zero-shot-classification")
classifier(
    "I love watching football on weekends.",
    candidate_labels=["sports", "cooking", "education"],
)

No model was supplied, defaulted to facebook/bart-large-mnli and revision d7645e1 (https://huggingface.co/facebook/bart-large-mnli).
Using a pipeline without specifying a model name and revision in production is not recommended.
Device set to use cpu


{'sequence': 'I love watching football on weekends.',
 'labels': ['sports', 'cooking', 'education'],
 'scores': [0.9976385831832886, 0.0013419223250821233, 0.0010195373324677348]}