# What is pipeline


In the context of the Hugging Face transformers library, a pipeline is a high-level API that simplifies the process of using pre-trained models for various natural language processing (NLP) tasks. It encapsulates complex processes such as tokenization, model loading, inference, and post-processing into a single, easy-to-use interface.

The pipeline function allows you to create pipelines for a wide range of NLP tasks, including:

Text generation

Named entity recognition (NER)

Sentiment analysis

Question answering

Language translation

Text summarization

Text classification

And more...

When you create a pipeline using the pipeline function, you specify the task you want to perform (e.g., "sentiment-analysis", "text-generation") as an argument. The function then automatically loads the appropriate pre-trained model and sets up the necessary components to perform the specified task.


Pipelines provide a convenient way to apply state-of-the-art NLP models to your text data without having to manually handle the complexities of model loading, tokenization, and inference. They abstract away many of the technical details, allowing you to focus on using the models for your specific tasks.

# Sentiment analysis

In [None]:
from transformers import pipeline

classifier = pipeline("sentiment-analysis")

result = classifier("I hate you")[0]
print(f"label: {result['label']}, with score: {round(result['score'], 4)}")

result = classifier("I love you")[0]
print(f"label: {result['label']}, with score: {round(result['score'], 4)}")

No model was supplied, defaulted to distilbert/distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


[{'label': 'NEGATIVE', 'score': 0.9991129040718079}]

In [None]:
result = classifier("I just watched the sunset from my balcony, and it was absolutely breathtaking. The colors were vibrant, and the whole sky seemed to be on fire. It was such a peaceful and magical moment. I feel so grateful to witness such beauty.")[0]
print(f"label: {result['label']}, with score: {round(result['score'], 4)}")

label: POSITIVE, with score: 0.9999


In [None]:
result = classifier("Today has been a really tough day. Everything seems to be going wrong, and I can't shake off this feeling of frustration and disappointment. I just wish things would get better soon.")[0]


print(f"label:{result['label']},with score: {round(result['score'],4)}")

label:NEGATIVE,with score: 0.9995


In [None]:
result = classifier("The 19th century was a time of immense change and upheaval, marked by rapid industrialization, urbanization, and technological advancements. It was a period of great innovation and progress, but also one of significant social and economic challenges.The Industrial Revolution transformed societies around the world, as new inventions and manufacturing techniques revolutionized the way goods were produced. Factories sprung up in cities, drawing rural populations to urban centers in search of employment opportunities. This mass migration led to overcrowding, poor living conditions, and the rise of urban poverty.At the same time, the 19th century saw remarkable advancements in science, medicine, and transportation. The steam engine, invented by James Watt")[0]
print(f"label: {result['label']}, with score: {round(result['score'], 4)}")

label: POSITIVE, with score: 0.999


# Question-Answering

In [None]:
from transformers import pipeline
question_answerer = pipeline('question-answering')

context = r"""
Extractive Question Answering is the task of extracting an answer from a text given a question. An example of a
question answering dataset is the SQuAD dataset, which is entirely based on that task. If you would like to fine-tune
a model on a SQuAD task, you may leverage the examples/pytorch/question-answering/run_squad.py script.
"""

No model was supplied, defaulted to distilbert/distilbert-base-cased-distilled-squad and revision 626af31 (https://huggingface.co/distilbert/distilbert-base-cased-distilled-squad).
Using a pipeline without specifying a model name and revision in production is not recommended.


config.json:   0%|          | 0.00/473 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/261M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/29.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/213k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/436k [00:00<?, ?B/s]

In [None]:
# result = question_answerer(question="What is extractive question answering?", context=context)
# print(f"Answer: '{result['answer']}', score: {round(result['score'], 4)}, start: {result['start']}, end: {result['end']}")

result = question_answerer(question="What is a good example of a question answering dataset?", context=context)
print(f"Answer: '{result['answer']}', score: {round(result['score'], 4)}, start: {result['start']}, end: {result['end']}"),

Answer: 'SQuAD dataset', score: 0.5152, start: 147, end: 160


(None,)

In [None]:
from transformers import pipeline
question_answerer = pipeline('question-answering')

context = r"""
The Great Wall of China is one of the most impressive architectural feats in history. It is a series of fortifications made of stone, brick, tamped earth, wood, and other materials, generally built along an east-to-west line across the historical northern borders of China to protect the Chinese states and empires against the raids and invasions of the various nomadic groups. The Great Wall stretches over approximately 13,171 miles and was continuously built from the 3rd century BC to the 17th century AD.

One of the most famous sections of the Great Wall is the Badaling section, located near Beijing. This section is visited by millions of tourists every year due to its accessibility and well-preserved condition. Other notable sections include Mutianyu, Simatai, and Jinshanling, each offering unique experiences and breathtaking views.

Despite its name, the Great Wall of China is not a single continuous wall but rather a collection of walls and fortifications built by different dynasties over centuries. It symbolizes China's rich history, cultural heritage, and remarkable engineering achievements.

"""


result = question_answerer(question="What materials were used to construct the Great Wall of China?",context=context)
print(f"Answer: '{result['answer']}', score: {round(result['score'], 4)}, start: {result['start']}, end: {result['end']}")

result = question_answerer(question="How long is the Great Wall of China?", context=context)
print(f"Answer: '{result['answer']}', score: {round(result['score'], 4)}, start: {result['start']}, end: {result['end']}"),

result = question_answerer(question="When was the Great Wall of China continuously built from?", context=context)
print(f"Answer: '{result['answer']}', score: {round(result['score'], 4)}, start: {result['start']}, end: {result['end']}"),

result = question_answerer(question="Which section of the Great Wall is visited by millions of tourists every year?", context=context)
print(f"Answer: '{result['answer']}', score: {round(result['score'], 4)}, start: {result['start']}, end: {result['end']}"),

result = question_answerer(question="Is the Great Wall of China a single continuous wall?", context=context)
print(f"Answer: '{result['answer']}', score: {round(result['score'], 4)}, start: {result['start']}, end: {result['end']}"),

No model was supplied, defaulted to distilbert/distilbert-base-cased-distilled-squad and revision 626af31 (https://huggingface.co/distilbert/distilbert-base-cased-distilled-squad).
Using a pipeline without specifying a model name and revision in production is not recommended.


Answer: 'stone, brick, tamped earth, wood, and other materials', score: 0.449, start: 128, end: 181
Answer: '13,171 miles', score: 0.7249, start: 423, end: 435
Answer: '3rd century BC to the 17th century AD', score: 0.5212, start: 472, end: 509
Answer: 'Badaling', score: 0.377, start: 569, end: 577
Answer: 'not a single continuous wall', score: 0.2159, start: 893, end: 921


(None,)

# NAMED ENTITY REECOGNITION

In [None]:
from transformers import pipeline

ner_pipe = pipeline("ner")

sequence = """Hugging Face Inc. is a company based in New York City. Its headquarters are in DUMBO,
therefore very close to the Manhattan Bridge which is visible from the window.""",

No model was supplied, defaulted to dbmdz/bert-large-cased-finetuned-conll03-english and revision f2482bf (https://huggingface.co/dbmdz/bert-large-cased-finetuned-conll03-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


config.json:   0%|          | 0.00/998 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.33G [00:00<?, ?B/s]

Some weights of the model checkpoint at dbmdz/bert-large-cased-finetuned-conll03-english were not used when initializing BertForTokenClassification: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


tokenizer_config.json:   0%|          | 0.00/60.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/213k [00:00<?, ?B/s]

In [None]:
for entity in ner_pipe(sequence):
    print(entity)

{'entity': 'I-ORG', 'score': 0.99957865, 'index': 1, 'word': 'Hu', 'start': 0, 'end': 2}
{'entity': 'I-ORG', 'score': 0.9909764, 'index': 2, 'word': '##gging', 'start': 2, 'end': 7}
{'entity': 'I-ORG', 'score': 0.9982224, 'index': 3, 'word': 'Face', 'start': 8, 'end': 12}
{'entity': 'I-ORG', 'score': 0.9994879, 'index': 4, 'word': 'Inc', 'start': 13, 'end': 16}
{'entity': 'I-LOC', 'score': 0.9994344, 'index': 11, 'word': 'New', 'start': 40, 'end': 43}
{'entity': 'I-LOC', 'score': 0.99931955, 'index': 12, 'word': 'York', 'start': 44, 'end': 48}
{'entity': 'I-LOC', 'score': 0.9993794, 'index': 13, 'word': 'City', 'start': 49, 'end': 53}
{'entity': 'I-LOC', 'score': 0.98625815, 'index': 19, 'word': 'D', 'start': 79, 'end': 80}
{'entity': 'I-LOC', 'score': 0.95142686, 'index': 20, 'word': '##UM', 'start': 80, 'end': 82}
{'entity': 'I-LOC', 'score': 0.9336589, 'index': 21, 'word': '##BO', 'start': 82, 'end': 84}
{'entity': 'I-LOC', 'score': 0.9761654, 'index': 28, 'word': 'Manhattan', 'star

In [None]:
sequence1 = "The Great Wall of China is one of the most impressive architectural feats in history. It is a series of fortifications made of stone, brick, tamped earth, wood, and other materials, generally built along an east-to-west line across the historical northern borders of China to protect the Chinese states and empires against the raids and invasions of the various nomadic groups. The Great Wall stretches over approximately 13,171 miles and was continuously built from the 3rd century BC to the 17th century AD.One of the most famous sections of the Great Wall is the Badaling section, located near Beijing. This section is visited by millions of tourists every year due to its accessibility and well-preserved condition. Other notable sections include Mutianyu, Simatai, and Jinshanling, each offering unique experiences and breathtaking views.Despite its name, the Great Wall of China is not a single continuous wall but rather a collection of walls and fortifications built by different dynasties over centuries. It symbolizes China's rich history, cultural heritage, and remarkable engineering achievements. "


In [None]:
for entity in ner_pipe(sequence1):
  print(entity)

{'entity': 'I-MISC', 'score': 0.8898691, 'index': 2, 'word': 'Great', 'start': 4, 'end': 9}
{'entity': 'I-LOC', 'score': 0.5096978, 'index': 3, 'word': 'Wall', 'start': 10, 'end': 14}
{'entity': 'I-MISC', 'score': 0.9488885, 'index': 4, 'word': 'of', 'start': 15, 'end': 17}
{'entity': 'I-LOC', 'score': 0.5932887, 'index': 5, 'word': 'China', 'start': 18, 'end': 23}
{'entity': 'I-LOC', 'score': 0.99907494, 'index': 57, 'word': 'China', 'start': 267, 'end': 272}
{'entity': 'I-MISC', 'score': 0.9991984, 'index': 61, 'word': 'Chinese', 'start': 288, 'end': 295}
{'entity': 'I-MISC', 'score': 0.48654228, 'index': 78, 'word': 'Great', 'start': 382, 'end': 387}
{'entity': 'I-LOC', 'score': 0.7946715, 'index': 79, 'word': 'Wall', 'start': 388, 'end': 392}
{'entity': 'I-MISC', 'score': 0.48803532, 'index': 95, 'word': 'BC', 'start': 483, 'end': 485}
{'entity': 'I-MISC', 'score': 0.5708499, 'index': 110, 'word': 'Great', 'start': 548, 'end': 553}
{'entity': 'I-LOC', 'score': 0.836376, 'index': 11

# TEXT GENERATION

In [None]:
text_generator = pipeline("text-generation")

print(text_generator("As far as I am concerned, I will", max_length=50, do_sample=False))

No model was supplied, defaulted to openai-community/gpt2 and revision 6c0e608 (https://huggingface.co/openai-community/gpt2).
Using a pipeline without specifying a model name and revision in production is not recommended.


config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/548M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'As far as I am concerned, I will be the first to admit that I am not a fan of the idea of a "free market." I think that the idea of a free market is a bit of a stretch. I think that the idea'}]


In [None]:
print(text_generator("Riding through the forest, I suddenly spotted a mysterious...", max_length=50, do_sample=True))

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'Riding through the forest, I suddenly spotted a mysterious... a girl who looked like her father? Then a light red one followed me and walked away, and I jumped down and ran to my left.\nMy right hand was still on a tree'}]


In [None]:
print(text_generator("Climbing up the hill, I found a hidden...", max_length=30, do_sample=False))

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'Climbing up the hill, I found a hidden...\n\n...room.\n\nI was in the room with the other two, and'}]


# **Summarization**

In [None]:
from transformers import pipeline

summarizer = pipeline("summarization")

ARTICLE = """ New York (CNN)When Liana Barrientos was 23 years old, she got married in Westchester County, New York.
A year later, she got married again in Westchester County, but to a different man and without divorcing her first husband.
Only 18 days after that marriage, she got hitched yet again. Then, Barrientos declared "I do" five more times, sometimes only within two weeks of each other.
In 2010, she married once more, this time in the Bronx. In an application for a marriage license, she stated it was her "first and only" marriage.
Barrientos, now 39, is facing two criminal counts of "offering a false instrument for filing in the first degree," referring to her false statements on the
2010 marriage license application, according to court documents.
Prosecutors said the marriages were part of an immigration scam.
On Friday, she pleaded not guilty at State Supreme Court in the Bronx, according to her attorney, Christopher Wright, who declined to comment further.
After leaving court, Barrientos was arrested and charged with theft of service and criminal trespass for allegedly sneaking into the New York subway through an emergency exit, said Detective
Annette Markowski, a police spokeswoman. In total, Barrientos has been married 10 times, with nine of her marriages occurring between 1999 and 2002.
All occurred either in Westchester County, Long Island, New Jersey or the Bronx. She is believed to still be married to four men, and at one time, she was married to eight men at once, prosecutors say.
Prosecutors said the immigration scam involved some of her husbands, who filed for permanent residence status shortly after the marriages.
Any divorces happened only after such filings were approved. It was unclear whether any of the men will be prosecuted.
The case was referred to the Bronx District Attorney\'s Office by Immigration and Customs Enforcement and the Department of Homeland Security\'s
Investigation Division. Seven of the men are from so-called "red-flagged" countries, including Egypt, Turkey, Georgia, Pakistan and Mali.
Her eighth husband, Rashid Rajput, was deported in 2006 to his native Pakistan after an investigation by the Joint Terrorism Task Force.
If convicted, Barrientos faces up to four years in prison.  Her next court appearance is scheduled for May 18.
"""

No model was supplied, defaulted to sshleifer/distilbart-cnn-12-6 and revision a4f8f3e (https://huggingface.co/sshleifer/distilbart-cnn-12-6).
Using a pipeline without specifying a model name and revision in production is not recommended.


In [None]:
print(summarizer(ARTICLE, max_length=130, min_length=30, do_sample=False))

[{'summary_text': ' Liana Barrientos, 39, is charged with two counts of "offering a false instrument for filing in the first degree" In total, she has been married 10 times, with nine of her marriages occurring between 1999 and 2002 . At one time, she was married to eight men at once, prosecutors say .'}]


# Translation

In [None]:
from transformers import pipeline

translator = pipeline("translation_en_to_de")
print(translator("Hugging Face is a technology company based in New York and Paris", max_length=40))

No model was supplied, defaulted to google-t5/t5-base and revision 686f1db (https://huggingface.co/google-t5/t5-base).
Using a pipeline without specifying a model name and revision in production is not recommended.


config.json:   0%|          | 0.00/1.21k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/892M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.39M [00:00<?, ?B/s]

For now, this behavior is kept to avoid breaking backwards compatibility when padding/encoding with `truncation is True`.
- Be aware that you SHOULD NOT rely on google-t5/t5-base automatically truncating your input to 512 when padding/encoding.
- If you want to encode/pad to sequences longer than 512 you can either instantiate this tokenizer with `model_max_length` or pass `max_length` when encoding/padding.


[{'translation_text': 'Hugging Face ist ein Technologieunternehmen mit Sitz in New York und Paris.'}]


# Fill mask

In [None]:
from transformers import pipeline

fill_mask_pipeline = pipeline("fill-mask")
from pprint import pprint
pprint(fill_mask_pipeline(f"HuggingFace is creating a {fill_mask_pipeline.tokenizer.mask_token} that the community uses to solve NLP tasks."))


No model was supplied, defaulted to distilbert/distilroberta-base and revision ec58a5b (https://huggingface.co/distilbert/distilroberta-base).
Using a pipeline without specifying a model name and revision in production is not recommended.
Some weights of the model checkpoint at distilbert/distilroberta-base were not used when initializing RobertaForMaskedLM: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
- This IS expected if you are initializing RobertaForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


[{'score': 0.1792747676372528,
  'sequence': 'HuggingFace is creating a tool that the community uses to solve '
              'NLP tasks.',
  'token': 3944,
  'token_str': ' tool'},
 {'score': 0.11349434405565262,
  'sequence': 'HuggingFace is creating a framework that the community uses to '
              'solve NLP tasks.',
  'token': 7208,
  'token_str': ' framework'},
 {'score': 0.052435800433158875,
  'sequence': 'HuggingFace is creating a library that the community uses to '
              'solve NLP tasks.',
  'token': 5560,
  'token_str': ' library'},
 {'score': 0.03493550419807434,
  'sequence': 'HuggingFace is creating a database that the community uses to '
              'solve NLP tasks.',
  'token': 8503,
  'token_str': ' database'},
 {'score': 0.028602533042430878,
  'sequence': 'HuggingFace is creating a prototype that the community uses to '
              'solve NLP tasks.',
  'token': 17715,
  'token_str': ' prototype'}]


In [None]:
from transformers import pipeline

fill_mask_pipeline = pipeline("fill-mask")
from pprint import pprint
pprint(fill_mask_pipeline(f"The cat sat on the {fill_mask_pipeline.tokenizer.mask_token}"))


No model was supplied, defaulted to distilbert/distilroberta-base and revision ec58a5b (https://huggingface.co/distilbert/distilroberta-base).
Using a pipeline without specifying a model name and revision in production is not recommended.
Some weights of the model checkpoint at distilbert/distilroberta-base were not used when initializing RobertaForMaskedLM: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
- This IS expected if you are initializing RobertaForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


[{'score': 0.23286384344100952,
  'sequence': 'The cat sat on the sofa',
  'token': 26711,
  'token_str': ' sofa'},
 {'score': 0.19330808520317078,
  'sequence': 'The cat sat on the couch',
  'token': 16433,
  'token_str': ' couch'},
 {'score': 0.05189194530248642,
  'sequence': 'The cat sat on the floor',
  'token': 1929,
  'token_str': ' floor'},
 {'score': 0.043266020715236664,
  'sequence': 'The cat sat on the toilet',
  'token': 11471,
  'token_str': ' toilet'},
 {'score': 0.03674289584159851,
  'sequence': 'The cat sat on the bed',
  'token': 3267,
  'token_str': ' bed'}]
