### Examples to demonstrate the use of Pipelines from HuggingFace ###

In [1]:
from transformers import pipeline
import warnings
warnings.filterwarnings('ignore')

  from .autonotebook import tqdm as notebook_tqdm


#### Fill "Mask" task ####

In [2]:
# Specifying the pipeline
bert_unmasker = pipeline('fill-mask', model="bert-base-uncased")
text = "I have to wake up in the morning and [MASK] a doctor"
result = bert_unmasker(text)
for r in result:
    print(r)

Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForMaskedLM: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight', 'cls.seq_relationship.bias', 'cls.seq_relationship.weight']
- This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Device set to use cuda:0


{'score': 0.6457508206367493, 'token': 2156, 'token_str': 'see', 'sequence': 'i have to wake up in the morning and see a doctor'}
{'score': 0.17833824455738068, 'token': 2655, 'token_str': 'call', 'sequence': 'i have to wake up in the morning and call a doctor'}
{'score': 0.07508159428834915, 'token': 2424, 'token_str': 'find', 'sequence': 'i have to wake up in the morning and find a doctor'}
{'score': 0.056827399879693985, 'token': 2131, 'token_str': 'get', 'sequence': 'i have to wake up in the morning and get a doctor'}
{'score': 0.006895780097693205, 'token': 2022, 'token_str': 'be', 'sequence': 'i have to wake up in the morning and be a doctor'}


In [3]:
# Sentiment analysis example

# By default, this pipeline selects a particular pretrained model that has been fine-tuned for sentiment analysis in English. # # The model is downloaded and cached when you create the classifier object. 
# If you rerun the command, the cached model will be used instead and there is no need to download the model again.

classifier = pipeline("sentiment-analysis")
classifier("I've been waiting for a HuggingFace course my whole life.")

No model was supplied, defaulted to distilbert/distilbert-base-uncased-finetuned-sst-2-english and revision 714eb0f (https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
Device set to use cuda:0


[{'label': 'POSITIVE', 'score': 0.9598046541213989}]

In [4]:
# We can even pass several sentences

classifier(
    ["I've been waiting for a HuggingFace course my whole life.", "I hate this so much!"]
)

[{'label': 'POSITIVE', 'score': 0.9598046541213989},
 {'label': 'NEGATIVE', 'score': 0.9994558691978455}]

#### Zero-shot classification - Task where we need to classify texts that haven’t been labelled ####

 * This is a common scenario in real-world projects because annotating text is usually time-consuming and requires domain expertise. 
 
 * For this use case, the zero-shot-classification pipeline is very powerful: it allows you to specify which labels to use for the classification, so you don’t have to rely on the labels of the pretrained model.  
 
 * You’ve already seen how the model can classify a sentence as positive or negative using those two labels — but it can also classify the text using any other set of labels you like.
 
 * This pipeline is called zero-shot because you don’t need to fine-tune the model on your data to use it. It can directly return probability scores for any list of labels you want!

In [5]:
# Zero shot classification

classifier = pipeline("zero-shot-classification")
print(classifier(
    "This is a course about the Transformers library",
    candidate_labels=["education", "politics", "business"],
))

print(classifier(
    "The movie was very good",
    candidate_labels = ["positive","negative","neutral"]
    ))

No model was supplied, defaulted to facebook/bart-large-mnli and revision d7645e1 (https://huggingface.co/facebook/bart-large-mnli).
Using a pipeline without specifying a model name and revision in production is not recommended.
Device set to use cuda:0


{'sequence': 'This is a course about the Transformers library', 'labels': ['education', 'business', 'politics'], 'scores': [0.8445959687232971, 0.1119764968752861, 0.043427541851997375]}
{'sequence': 'The movie was very good', 'labels': ['positive', 'neutral', 'negative'], 'scores': [0.9884912967681885, 0.007404666393995285, 0.004104042425751686]}


#### Text generation ####

* The main idea here is that you provide a prompt and the model will auto-complete it by generating the remaining text. 

* This is similar to the predictive text feature that is found on many phones. 

* Text generation involves randomness, so it’s normal if you don’t get the same results as shown below.

* You can control how many different sequences are generated with the argument num_return_sequences and the total length of the output text with the argument max_length.

In [6]:
# Text Generation

generator = pipeline("text-generation")
generator("In this course, we will teach you how to")

No model was supplied, defaulted to openai-community/gpt2 and revision 607a30d (https://huggingface.co/openai-community/gpt2).
Using a pipeline without specifying a model name and revision in production is not recommended.
Device set to use cuda:0
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': "In this course, we will teach you how to create a simple and user-friendly application using React.\n\nThere is a ton of good content on React.js, so I recommend you to read it in order to learn about the concepts and how to use them.\n\nNow let's get started…\n\nStep One\n\nStep Two\n\nStep Three\n\nStep Four\n\nStep Five\n\nStep Six\n\nStep Seven\n\nStep Eight\n\nStep Nine\n\nStep Ten\n\nStep Eleven\n\nStep Twelve\n\nStep 13\n\nStep 14\n\nStep 15\n\nStep 16\n\nStep 17\n\nStep 18\n\nStep 19\n\nStep 20\n\nStep 21\n\nStep 22\n\nStep 23\n\nStep 24\n\nStep 25\n\nStep 26\n\nStep 27\n\nStep 28\n\nStep 29\n\nStep 30\n\nStep 31\n\nStep 32\n\nStep 33\n\nStep 34\n\nStep 35\n\nStep 36\n\nStep 37\n\nStep 38\n\nStep 39\n\nStep 40\n\nStep 41\n\nStep 42\n\nStep 43\n\nStep 44\n\nStep 45\n\nStep 46\n\nStep 47\n\nStep 48\n\nStep 49\n\nStep 50\n\nStep"}]

#### Using any model from the Hub in a pipeline ####

* The previous examples used the default model for the task at hand, but you can also choose a particular model from the Hub to use in a pipeline for a specific task — say, text generation. 

* Go to the Model Hub and click on the corresponding tag on the left to display only the supported models for that task.<br>
https://huggingface.co/models

In [7]:
generator = pipeline("text-generation", model="distilgpt2")
generator(
    "In this course, we will teach you how to",
    max_length=30,
    num_return_sequences=2,
)

Device set to use cuda:0
Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Both `max_new_tokens` (=256) and `max_length`(=30) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


[{'generated_text': 'In this course, we will teach you how to use these techniques to create the most efficient, and efficient way to accomplish your goal.\n\n\n\n\nIf you want to see the videos you are going to need to see a video on the new video, click here.\nAnd in the meantime, you can use this tutorial to help you improve your workflow and performance for yourself. Be sure to check out the main video and videos by clicking the link below to see some more tutorials.\nThe video is now available on YouTube.\nIf you want to view the videos in the video, click here.'},
 {'generated_text': 'In this course, we will teach you how to create a simple video that shows you how to create a video that shows you how to create a video that shows you how to create a video that shows you how to create a video that shows you how to create a video that shows you how to create a video that shows you how to create a video that shows you how to create a video that shows you how to create a video that s

#### Named entity recognition ####

Named entity recognition (NER) is a task where the model has to find which parts of the input text correspond to entities such as persons, locations, or organizations. 

In [8]:
ner = pipeline("ner", grouped_entities=True)
ner("My name is Sylvain and I work at Hugging Face in Brooklyn.")

No model was supplied, defaulted to dbmdz/bert-large-cased-finetuned-conll03-english and revision 4c53496 (https://huggingface.co/dbmdz/bert-large-cased-finetuned-conll03-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
Some weights of the model checkpoint at dbmdz/bert-large-cased-finetuned-conll03-english were not used when initializing BertForTokenClassification: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Device set to use cuda:0


[{'entity_group': 'PER',
  'score': np.float32(0.9981694),
  'word': 'Sylvain',
  'start': 11,
  'end': 18},
 {'entity_group': 'ORG',
  'score': np.float32(0.9796019),
  'word': 'Hugging Face',
  'start': 33,
  'end': 45},
 {'entity_group': 'LOC',
  'score': np.float32(0.9932106),
  'word': 'Brooklyn',
  'start': 49,
  'end': 57}]

#### Summarization ####

Summarization is the task of reducing a text into a shorter text while keeping all (or most) of the important aspects referenced in the text. 

In [9]:
summarizer = pipeline("summarization")
summarizer(
    """
    Europe is the second-smallest continent. The name Europe, or Europa, is believed to be of Greek origin, as it is the name of a princess in Greek mythology. The name Europe may also come from combining the Greek roots eur- (wide) and -op (seeing) to form the phrase “wide-gazing.”

Europe is often described as a “peninsula of peninsulas.” A peninsula is a piece of land surrounded by water on three sides. Europe is a peninsula of the Eurasian supercontinent and is bordered by the Arctic Ocean to the north, the Atlantic Ocean to the west, and the Mediterranean, Black, and Caspian seas to the south.

Europe’s main peninsulas are the Iberian, Italian, and Balkan, located in southern Europe, and the Scandinavian and Jutland, located in northern Europe. The link between these peninsulas has made Europe a dominant economic, social, and cultural force throughout recorded history.
"""
)

No model was supplied, defaulted to sshleifer/distilbart-cnn-12-6 and revision a4f8f3e (https://huggingface.co/sshleifer/distilbart-cnn-12-6).
Using a pipeline without specifying a model name and revision in production is not recommended.
Device set to use cuda:0


[{'summary_text': " Europe is a peninsula of the Eurasian supercontinent . It is bordered by the Arctic Ocean to the north, the Atlantic to the west, and the Mediterranean, Black, and Caspian seas to the south . Europe's main peninsulas are the Iberian, Italian, and Balkan, located in southern Europe ."}]

#### Feature extraction

In [10]:
emb = pipeline("feature-extraction", model="google-bert/bert-base-uncased")
X = emb("industrial anomaly detection with audio", return_tensors=True)  # [1, seq_len, hidden]
print(f"Embeddings shape: {X.size()}\nEmbeddings:\n{X}")

Device set to use cuda:0


Embeddings shape: torch.Size([1, 7, 768])
Embeddings:
tensor([[[-0.4095,  0.1067, -0.3270,  ..., -0.5877, -0.2818,  0.2980],
         [ 0.2982,  0.3689, -0.4391,  ..., -0.4908,  0.2701,  0.0518],
         [ 0.1266, -0.3955, -0.0369,  ..., -0.5126,  0.2784,  0.0832],
         ...,
         [-0.4505,  0.0647, -0.3500,  ..., -0.5077, -0.3530, -0.1903],
         [-0.1960,  0.1788,  0.0403,  ..., -0.0181, -0.3388, -0.5101],
         [ 0.7388,  0.1887, -0.5517,  ...,  0.0199, -0.5753, -0.2244]]])


#### Translation ####

* For translation, you can use a default model if you provide a language pair in the task name (such as "translation_en_to_fr"), 

* but the easiest way is to pick the model you want to use on the Model Hub. 

* Here we’ll try translating from French to English

In [11]:
# Create a translation pipeline for English to German
# The 'translation_en_to_de' task uses a default model suitable for this pair.
translator = pipeline("translation_en_to_de")

# Define the text to be translated
text_to_translate = "Hello world! This is an example of translation using Hugging Face pipelines."

# Perform the translation
translated_text = translator(text_to_translate)

# Print the result
print(translated_text)

No model was supplied, defaulted to google-t5/t5-base and revision a9723ea (https://huggingface.co/google-t5/t5-base).
Using a pipeline without specifying a model name and revision in production is not recommended.
Device set to use cuda:0


[{'translation_text': 'Hallo Welt, dies ist ein Beispiel für die Übersetzung mit Hugging Face Pipelines.'}]


#### Q&A with pipeline("question-answering")
What it does

Extracts an answer span from a given context for a question. (This is extractive QA—answers come from the context text.)

Key knobs you’ll actually use

- Model choice: pick a SQuAD/SQuAD2-finetuned checkpoint (e.g., deepset/roberta-base-squad2, distilbert-base-cased-distilled-squad).

- Device & precision: device=0 (single GPU) or device_map="auto" (multi-GPU/CPU offload).

- Top-k candidates: top_k returns N best spans with scores.

- Unanswerable questions: handle_impossible_answer=True (only for SQuAD2-style models).

- Answer length: max_answer_len limits span length (useful to prevent rambling answers).

- Long contexts: max_seq_len and doc_stride let the pipeline window over long passages.

Input / Output shape

- Input: {"question": "...", "context": "..."} or a list of such dicts (for batching).

- Output: dict (or list) with answer, score, start, end. With top_k>1, returns a list of candidates.

In [12]:
qa = pipeline(
    task="question-answering",
    model="deepset/roberta-base-squad2",   # SQuAD2 (can predict “no answer”)
    device_map="auto"
)

context = """The UR5e is a collaborative robot arm from Universal Robots.
It supports force-torque sensing and is commonly used in education and industry."""
question = "What is used a lot in education?"

qa(question=question, context=context)
# -> {'score': ..., 'start': ..., 'end': ..., 'answer': 'UR5e'}

Some parameters are on the meta device because they were offloaded to the cpu.
Device set to use cuda:0


{'score': 0.30193451046943665, 'start': 4, 'end': 8, 'answer': 'UR5e'}

#### Use the pipeline class effectively (battle-tested tips)

1. Pick the right checkpoint for the task (e.g., a QA-finetuned model for extractive QA). Pipeline auto-routes inputs/outputs per task. 


2. Control outputs explicitly:

- Classification confidence curves → return_all_scores=True.

- “Only completions, not the prompt” in generation → return_full_text=False. 


3. Batch properly: Feed a datasets.Dataset + KeyDataset to stream and batch on GPU—cleaner and typically as fast as manual loops. Tune batch_size only if you see under-utilization. 


4. Mind sequence lengths: For tasks that support it, use padding/truncation (or task-specific strategies) to avoid over-length errors. Check each pipeline’s doc for which kwargs are honored. 


5. Performance & memory: Use device_map="auto" + accelerate and torch_dtype="auto" to shrink memory and speed up inference on modern hardware. 


6. When you need raw features (embeddings for clustering, etc.), use the feature-extraction pipeline with return_tensors=True and pool as needed. 
