# Transformers Library

## 1. Pipeline Function
It connects a model with the pre-processing and post-processing steps.

Pre-processing -> model -> post-processing

In [1]:
from transformers import pipeline

classifier = pipeline("sentiment-analysis")
classifier("I've been waiting for a HuggingFace course my whole life.")

  from .autonotebook import tqdm as notebook_tqdm
No model was supplied, defaulted to distilbert/distilbert-base-uncased-finetuned-sst-2-english and revision 714eb0f (https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
Device set to use mps:0


[{'label': 'POSITIVE', 'score': 0.9598048329353333}]

In [2]:
classifier(
    ["I've been waiting for a HuggingFace course my whole life.", "I hate this so much!"]
)

[{'label': 'POSITIVE', 'score': 0.9598048329353333},
 {'label': 'NEGATIVE', 'score': 0.9994558691978455}]

By default, this pipeline selects a particular pretrained model that has been fine-tuned for sentiment analysis in English. The model is downloaded and cached when you create the classifier object. If you rerun the command, the cached model will be used instead and there is no need to download the model again.

There are three main steps involved when you pass some text to a pipeline:

- The text is preprocessed into a format the model can understand.
- The preprocessed inputs are passed to the model.
- The predictions of the model are post-processed, so you can make sense of them.

#### Available pipelines for different modalities

The pipeline() function supports multiple modalities, allowing you to work with text, images, audio, and even multimodal tasks. In this course we’ll focus on text tasks, but it’s useful to understand the transformer architecture’s potential, so we’ll briefly outline it.

Here’s an overview of what’s available:

For a full and updated list of pipelines, see the 🤗 Transformers documentation.

#### Text pipelines

- text-generation: Generate text from a prompt
- text-classification: Classify text into predefined categories
- summarization: Create a shorter version of a text while preserving key information
- translation: Translate text from one language to another
- zero-shot-classification: Classify text without prior training on specific labels
- feature-extraction: Extract vector representations of text

#### Image pipelines

- image-to-text: Generate text descriptions of images
- image-classification: Identify objects in an image
- object-detection: Locate and identify objects in images

#### Audio pipelines

- automatic-speech-recognition: Convert speech to text
- audio-classification: Classify audio into categories
- text-to-speech: Convert text to spoken audio

#### Multimodal pipelines

- image-text-to-text: Respond to an image based on a text prompt

In [3]:
classifier_zero_shot = pipeline("zero-shot-classification")
classifier_zero_shot(
    "I've been waiting for a HuggingFace course my whole life.",
    candidate_labels=["education", "politics", "business"],
)

No model was supplied, defaulted to facebook/bart-large-mnli and revision d7645e1 (https://huggingface.co/facebook/bart-large-mnli).
Using a pipeline without specifying a model name and revision in production is not recommended.
Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`
Device set to use mps:0


{'sequence': "I've been waiting for a HuggingFace course my whole life.",
 'labels': ['business', 'education', 'politics'],
 'scores': [0.35590553283691406, 0.346484899520874, 0.2976095378398895]}

In [4]:
generation = pipeline("text-generation")

# Generate text from a prompt
generation("In this course, we will learn to")

No model was supplied, defaulted to openai-community/gpt2 and revision 607a30d (https://huggingface.co/openai-community/gpt2).
Using a pipeline without specifying a model name and revision in production is not recommended.
Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`
Device set to use mps:0
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
  test_elements = torch.tensor(test_elements)


[{'generated_text': 'In this course, we will learn to navigate virtual machines with an emphasis on using PowerShell-Sieve and using D-Word-Sieve to convert text into files in Microsoft Word.'}]

✏️ Try it out! Use the num_return_sequences and max_length arguments to generate two sentences of 15 words each.

In [8]:
generation_with_params = pipeline("text-generation")
generation_with_params("In this course, we will learn to", num_return_sequences=2, max_length=15)

No model was supplied, defaulted to openai-community/gpt2 and revision 607a30d (https://huggingface.co/openai-community/gpt2).
Using a pipeline without specifying a model name and revision in production is not recommended.
Device set to use mps:0
Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'In this course, we will learn to make the right decision by making the'},
 {'generated_text': 'In this course, we will learn to use web services such as Amazon.'}]

### Using any model from the Hub in a pipeline

The previous examples used the default model for the task at hand, but you can also choose a particular model from the Hub to use in a pipeline for a specific task — say, text generation. Go to the [Model Hub](https://huggingface.co/models) and click on the corresponding tag on the left to display only the supported models for that task. You should get to a page like [this](https://huggingface.co/models?pipeline_tag=text-generation) one.

Let’s try the HuggingFaceTB/SmolLM2-360M model! Here’s how to load it in the same pipeline as before:

In [14]:
generation_with_model = pipeline("text-generation", model="HuggingFaceTB/SmolLM2-360M")
generation_with_model("In this course, we will learn to",
                    max_length=30,
                    num_return_sequences=2,
                    # num_beams=3,
                    do_sample=True,
                    )

Device set to use mps:0
Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
  test_elements = torch.tensor(test_elements)


[{'generated_text': 'In this course, we will learn to solve these and other similar problems of polynomial relations and their applications, using the algebraic closure of the field of rational'},
 {'generated_text': 'In this course, we will learn to set up, use, and interpret data and statistics. To be able to apply statistical thinking to problems in everyday'}]

#### Why use do_sample or num_beams?

Greedy decoding (default) can only return one sequence
When you use num_return_sequences=2 with greedy decoding, you get an error
Greedy decoding always chooses the highest probability token at each step

Three Solutions
1. Beam Search (num_beams)

```generator("prompt", num_return_sequences=2, num_beams=5)```

What it does: Maintains multiple possible sequences simultaneously
Why it works: Explores several paths, can return multiple outputs
Rule: num_beams must be ≥ num_return_sequences

2. Sampling (do_sample=True)

```generator("prompt", num_return_sequences=2, do_sample=True, temperature=0.7)```

What it does: Chooses tokens probabilistically instead of always picking the best
Why it works: Introduces randomness, each run can be different
Parameters: temperature (0.1-1.0), top_k, top_p

3. Beam Search + Sampling

```generator("prompt", num_return_sequences=2, num_beams=5, do_sample=True)```

What it does: Combines both methods
Why it works: High quality + diversity

Quick Comparison
| Method       | Quality | Speed  | Diversity | Multiple Outputs |
|--------------|---------|--------|-----------|------------------|
| Greedy       | Medium  | Fast   | Low       | ❌               |
| Beam Search  | High    | Medium | Medium    | ✅               |
| Sampling     | Variable| Fast   | High      | ✅               |


When to Use
- Greedy: Fast, deterministic results
- Beam Search: High-quality, diverse outputs
- Sampling: Creative, varied text
- Beam + Sampling: Best of both worlds

In [17]:
generation_with_model = pipeline("text-generation", model="Gensyn/Qwen2.5-0.5B-Instruct")
generation_with_model("In this course, we will learn to",
                    max_length=30,
                    num_return_sequences=2,
                    num_beams=3,
                    # do_sample=True,
                    )

Sliding Window Attention is enabled but not implemented for `sdpa`; unexpected results may be encountered.
Device set to use mps:0
Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.


[{'generated_text': 'In this course, we will learn to use a variety of tools and techniques to analyze and visualize data. We will cover a wide range of topics,'},
 {'generated_text': 'In this course, we will learn to use a variety of tools and techniques to analyze and visualize data. We will cover a wide range of topics including'}]

#### Mask filling

The next pipeline you’ll try is fill-mask. The idea of this task is to fill in the blanks in a given text:

In [18]:
unmasker = pipeline("fill-mask")
unmasker("This course will teach you all about <mask> models.", top_k=3)

No model was supplied, defaulted to distilbert/distilroberta-base and revision fb53ab8 (https://huggingface.co/distilbert/distilroberta-base).
Using a pipeline without specifying a model name and revision in production is not recommended.
Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`
Some weights of the model checkpoint at distilbert/distilroberta-base were not used when initializing RobertaForMaskedLM: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
- This IS expected if you are initializing RobertaForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForMaskedLM from the checkpoint of a model that you exp

[{'score': 0.19619810581207275,
  'token': 30412,
  'token_str': ' mathematical',
  'sequence': 'This course will teach you all about mathematical models.'},
 {'score': 0.04052705690264702,
  'token': 38163,
  'token_str': ' computational',
  'sequence': 'This course will teach you all about computational models.'},
 {'score': 0.03301771357655525,
  'token': 27930,
  'token_str': ' predictive',
  'sequence': 'This course will teach you all about predictive models.'}]

The top_k argument controls how many possibilities you want to be displayed. Note that here the model fills in the special <mask> word, which is often referred to as a mask token. Other mask-filling models might have different mask tokens, so it’s always good to verify the proper mask word when exploring other models. One way to check it is by looking at the mask word used in the widget.

✏️ Try it out! Search for the bert-base-cased model on the Hub and identify its mask word in the Inference API widget. What does this model predict for the sentence in our pipeline example above?

In [19]:
unmasker = pipeline("fill-mask", model="bert-base-cased")
unmasker("This course will teach you all about [MASK] models.", top_k=3)

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`
Some weights of the model checkpoint at bert-base-cased were not used when initializing BertForMaskedLM: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight', 'cls.seq_relationship.bias', 'cls.seq_relationship.weight']
- This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Device set to use mps:0


[{'score': 0.25963249802589417,
  'token': 1648,
  'token_str': 'role',
  'sequence': 'This course will teach you all about role models.'},
 {'score': 0.09427230805158615,
  'token': 1103,
  'token_str': 'the',
  'sequence': 'This course will teach you all about the models.'},
 {'score': 0.03386767581105232,
  'token': 4633,
  'token_str': 'fashion',
  'sequence': 'This course will teach you all about fashion models.'}]

#### Named entity recognition

Named entity recognition (NER) is a task where the model has to find which parts of the input text correspond to entities such as persons, locations, or organizations. Let’s look at an example:

In [22]:
ner_finder = pipeline("ner", grouped_entities=True)
ner_finder("My name is Sylvain and I work at Hugging Face in Brooklyn.")

No model was supplied, defaulted to dbmdz/bert-large-cased-finetuned-conll03-english and revision 4c53496 (https://huggingface.co/dbmdz/bert-large-cased-finetuned-conll03-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
Some weights of the model checkpoint at dbmdz/bert-large-cased-finetuned-conll03-english were not used when initializing BertForTokenClassification: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Device set to use mps:0


[{'entity_group': 'PER',
  'score': 0.9981694,
  'word': 'Sylvain',
  'start': 11,
  'end': 18},
 {'entity_group': 'ORG',
  'score': 0.9796019,
  'word': 'Hugging Face',
  'start': 33,
  'end': 45},
 {'entity_group': 'LOC',
  'score': 0.9932106,
  'word': 'Brooklyn',
  'start': 49,
  'end': 57}]

The grouped_entities=True parameter is used to group related tokens that belong to the same entity together.

Without grouped_entities=True (default), we would get something like this:

{'entity': 'I-PER', 'word': 'S', 'start': 11, 'end': 12},

{'entity': 'I-PER', 'word': '##yl', 'start': 12, 'end': 14},

{'entity': 'I-PER', 'word': '##va', 'start': 14, 'end': 16},


With grouped_entities=True, we get:

{'entity_group': 'PER', 'score': 0.998, 'word': 'Sylvain', 'start': 11, 'end': 18},

{'entity_group': 'ORG', 'score': 0.980, 'word': 'Hugging Face', 'start': 33, 'end': 45},

{'entity_group': 'LOC', 'score': 0.993, 'word': 'Brooklyn', 'start': 49, 'end': 57}


##### Key Benefits:

- Cleaner output: Combines subword tokens into complete words/phrases
- Easier to read: "Sylvain" instead of "S", "##yl", "##va", "##in"
- Better for processing: One entity per result instead of multiple tokens
- More intuitive: Matches how humans think about named entities

##### When to use it:
- Use grouped_entities=True: When you want clean, complete entity names
- Use default: When you need detailed token-level information for analysis




In [24]:
pos_finder = pipeline("ner", model="QCRI/bert-base-multilingual-cased-pos-english")
pos_finder("My name is Sylvain and I work at Hugging Face in Brooklyn.")

Some weights of the model checkpoint at QCRI/bert-base-multilingual-cased-pos-english were not used when initializing BertForTokenClassification: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Device set to use mps:0


[{'entity': 'PRP$',
  'score': 0.99944586,
  'index': 1,
  'word': 'My',
  'start': 0,
  'end': 2},
 {'entity': 'NN',
  'score': 0.9995615,
  'index': 2,
  'word': 'name',
  'start': 3,
  'end': 7},
 {'entity': 'VBZ',
  'score': 0.9995523,
  'index': 3,
  'word': 'is',
  'start': 8,
  'end': 10},
 {'entity': 'NNP',
  'score': 0.9963742,
  'index': 4,
  'word': 'Sylvain',
  'start': 11,
  'end': 18},
 {'entity': 'CC',
  'score': 0.9996537,
  'index': 5,
  'word': 'and',
  'start': 19,
  'end': 22},
 {'entity': 'PRP',
  'score': 0.99956113,
  'index': 6,
  'word': 'I',
  'start': 23,
  'end': 24},
 {'entity': 'VBP',
  'score': 0.9976095,
  'index': 7,
  'word': 'work',
  'start': 25,
  'end': 29},
 {'entity': 'IN',
  'score': 0.999801,
  'index': 8,
  'word': 'at',
  'start': 30,
  'end': 32},
 {'entity': 'NNP',
  'score': 0.99481,
  'index': 9,
  'word': 'Hu',
  'start': 33,
  'end': 35},
 {'entity': 'NNP',
  'score': 0.918101,
  'index': 10,
  'word': '##gging',
  'start': 35,
  'end':

#### Question answering

The question-answering pipeline answers questions using information from a given context. Note that this pipeline works by extracting information from the provided context; it does not generate the answer.

In [26]:
qa = pipeline("question-answering")

context = """
I live in Paris, which is the capital of France.
"""

qa(question="What is the capital of France?", context=context)

No model was supplied, defaulted to distilbert/distilbert-base-cased-distilled-squad and revision 564e9b5 (https://huggingface.co/distilbert/distilbert-base-cased-distilled-squad).
Using a pipeline without specifying a model name and revision in production is not recommended.
Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`
Device set to use mps:0


{'score': 0.9969314336776733, 'start': 11, 'end': 16, 'answer': 'Paris'}

#### Summarization

Summarization is the task of reducing a text into a shorter text while keeping all (or most) of the important aspects referenced in the text. Like with text generation, you can specify a max_length or a min_length for the result. Here’s an example:

In [27]:
summarization = pipeline("summarization")

summarization("""
    America has changed dramatically during recent years. Not only has the number of 
    graduates in traditional engineering disciplines such as mechanical, civil, 
    electrical, chemical, and aeronautical engineering declined, but in most of 
    the premier American universities engineering curricula now concentrate on 
    and encourage largely the study of engineering science. As a result, there 
    are declining offerings in engineering subjects dealing with infrastructure, 
    the environment, and related issues, and greater concentration on high 
    technology subjects, largely supporting increasingly complex scientific 
    developments. While the latter is important, it should not be at the expense 
    of more traditional engineering.

    Rapidly developing economies such as China and India, as well as other 
    industrial countries in Europe and Asia, continue to encourage and advance 
    the teaching of engineering. Both China and India, respectively, graduate 
    six and eight times as many traditional engineers as does the United States. 
    Other industrial countries at minimum maintain their output, while America 
    suffers an increasingly serious decline in the number of engineering graduates 
    and a lack of well-educated engineers.
""")

No model was supplied, defaulted to sshleifer/distilbart-cnn-12-6 and revision a4f8f3e (https://huggingface.co/sshleifer/distilbart-cnn-12-6).
Using a pipeline without specifying a model name and revision in production is not recommended.
Device set to use mps:0
  test_elements = torch.tensor(test_elements)


[{'summary_text': ' America has changed dramatically during recent years . The number of engineering graduates in the U.S. has declined in traditional engineering disciplines such as mechanical, civil,    electrical, chemical, and aeronautical engineering . Rapidly developing economies such as China and India continue to encourage and advance the teaching of engineering .'}]

### Translation

For translation, you can use a default model if you provide a language pair in the task name (such as "translation_en_to_fr"), but the easiest way is to pick the model you want to use on the Model Hub. Here we’ll try translating from French to English. Like with text generation and summarization, you can specify a max_length or a min_length for the result.

In [1]:
from transformers import pipeline

translation = pipeline("translation", model="Helsinki-NLP/opus-mt-fr-en")
translation("Je suis un chat.", max_length=10)

  from .autonotebook import tqdm as notebook_tqdm
Device set to use mps:0
  test_elements = torch.tensor(test_elements)


[{'translation_text': "I'm a cat."}]

### Image and audio pipelines

Beyond text, Transformer models can also work with images and audio. Here are a few examples:

#### Image classification

In [2]:
from transformers import pipeline

img_classifier = pipeline(task="image-classification", model="google/vit-base-patch16-224")

img_classifier("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/pipeline-cat-chonk.jpeg")

Device set to use mps:0


[{'label': 'lynx, catamount', 'score': 0.43350061774253845},
 {'label': 'cougar, puma, catamount, mountain lion, painter, panther, Felis concolor',
  'score': 0.03479621186852455},
 {'label': 'snow leopard, ounce, Panthera uncia',
  'score': 0.03240187093615532},
 {'label': 'Egyptian cat', 'score': 0.023944765329360962},
 {'label': 'tiger cat', 'score': 0.0228891484439373}]

### Automatic speech recognition

In [1]:
from transformers import pipeline
transcriber = pipeline("automatic-speech-recognition", model="openai/whisper-large-v3")

transcriber("https://huggingface.co/datasets/Narsil/asr_dummy/resolve/main/mlk.flac")

Device set to use mps:0
Due to a bug fix in https://github.com/huggingface/transformers/pull/28687 transcription using a multilingual Whisper will default to language detection followed by transcription instead of translation to English.This might be a breaking change for your use case. If you want to instead always translate your audio to English, make sure to pass `language='en'`.
  test_elements = torch.tensor(test_elements)


{'text': ' I have a dream that one day this nation will rise up and live out the true meaning of its creed.'}

## Combining data from multiple sources

One powerful application of Transformer models is their ability to combine and process data from multiple sources. This is especially useful when you need to:

Search across multiple databases or repositories
Consolidate information from different formats (text, images, audio)
Create a unified view of related information
For example, you could build a system that:

Searches for information across databases in multiple modalities like text and image.
Combines results from different sources into a single coherent response. For example, from an audio file and text description.
Presents the most relevant information from a database of documents and metadata.