# Welcome to Pipelines!

The HuggingFace transformers library provides APIs at two different levels.

## Hugging Face Pipelines

A **pipeline** is a high-level interface provided by Hugging Face that allows you to quickly run NLP tasks without worrying too much about the underlying details. It's like a ready-to-use tool that takes care of the complex steps for you. There are various types of pipelines:

- **Text classification**: Assigns labels to text (e.g., spam detection).
- **Named entity recognition (NER)**: Identifies specific entities (e.g., names of people, organizations, locations).
- **Question answering**: Answers questions based on a given text.
- **Text generation**: Generates new text based on a prompt (e.g., writing continuation).
- **Translation**: Translates text from one language to another.
- **Summarization**: Summarizes long pieces of text into shorter ones.
- **Sentimental Analysis**: Detects the sentiment of the word or sentence given.

You create a pipeline using something like:

`my_pipeline = pipeline("the_task_I_want_to_do")`

Followed by

`result = my_pipeline(my_input)`

And that's it!

See end of this notebook for a list of all pipelines.

## Notes:

When working with Data Science models, you could be carrying out 2 very different activities: Training and Inference.

## Training:

*   **Definition:** Training is the process of teaching a machine learning model to recognize patterns and relationships within data. It's like teaching a student by showing them 
examples.
*   **Goal:** To adjust the model’s internal parameters (weights and biases) so it can accurately predict or classify new data.
*   **Process:**
    *   **Data Preparation:** Cleaning, transforming, and preparing the dataset for the model.
    *   **Model Selection:** Choosing the appropriate machine learning algorithm (e.g., linear regression, decision tree, neural network) based on the problem.
    *   **Algorithm Execution:** The algorithm iteratively adjusts its parameters based on the training data.
    *   **Loss Function:** A metric that measures how well the model is performing (the difference between predicted and actual values).
    *   **Optimization:**  A process (like gradient descent) to minimize the loss function and improve the model’s accuracy.
*   **Think of it like:** A student studying for an exam – they learn from examples and practice to improve their understanding.

## Inference:

*   **Definition:** Inference is the process of using a *trained* model to make predictions or decisions on *new, unseen* data.
*   **Goal:** To apply the knowledge the model gained during training to solve a real-world problem.
*   **Process:**
    *   **Input Data:**  You provide the model with a new data point it hasn't seen before.
    *   **Prediction:** The model uses its learned parameters to generate a prediction (e.g., predicting the price of a house, classifying an image, recommending a product).
    *   **No Parameter Updates:** Importantly, the model *doesn’t* learn or change during inference. It simply uses its existing knowledge.
*   **Think of it like:**  The student taking the exam – they apply their learned knowledge to answer the questions.

The pipelines API in HuggingFace is only for use for **inference** - running a model that has already been trained. We will be training our own model, and we will need to use the more advanced HuggingFace APIs that we look at in the up-coming notebooks.

In [1]:
# Import Libraries

import torch
import os
import gc
from huggingface_hub import login
from transformers import pipeline
from diffusers import DiffusionPipeline
from datasets import load_dataset
import soundfile as sf
from IPython.display import Audio

[2025-12-23 21:49:54,851] [INFO] [real_accelerator.py:219:get_accelerator] Setting ds_accelerator to cuda (auto detect)


/usr/bin/ld: cannot find -lcufile: No such file or directory
collect2: error: ld returned 1 exit status


In [2]:
#Loading HF Token

hf_token = os.getenv("HUGGING_FACE_WRITE_TOKEN")
login(hf_token)

## Using Pipelines from Hugging Face

A simple way to run inference for common tasks, without worrying about all the plumbing, picking reasonable defaults.


### How it works:

STEP 1: Create a pipeline - a function you can then call

```python
my_pipeline = pipeline(task, model=xx, device=xx)
```

If you don't specify a model, then Hugging Face picks one for you that's the default for the task. Specify "cuda" for the device to use an NVIDIA GPU. Specify "mps" on a Mac.(No model was supplied, defaulted to distilbert/distilbert-base-uncased-finetuned-sst-2-english. Using a pipeline without specifying a model name and revision in production is not recommended.)


STEP 2: Then call it as many times as you want:

```python
my_pipeline(input1)
my_pipeline(input2)
```

#### Sentiment analysis (or opinion mining) uses NLP and machine learning to automatically detect the emotional tone (positive, negative, neutral, or specific emotions like joy/anger) in given text

Note: No model was supplied, defaulted to distilbert/distilbert-base-uncased-finetuned-sst-2-english

In [3]:
# Sentiment Analysis

my_simple_sentiment_analyzer = pipeline("sentiment-analysis", device="cuda")
result = my_simple_sentiment_analyzer("I'm super excited to be on the way to LLM mastery!")
print(result)

No model was supplied, defaulted to distilbert/distilbert-base-uncased-finetuned-sst-2-english and revision 714eb0f (https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
Device set to use cuda


[{'label': 'POSITIVE', 'score': 0.9993460774421692}]


In [4]:
result = my_simple_sentiment_analyzer("I should be more excited to be on the way to LLM mastery!")
print(result)

[{'label': 'POSITIVE', 'score': 0.9008280634880066}]


Now for better sentimental Analysis we will use a model call ```nlptown/bert-base-multilingual-uncased-sentiment``` which will give more precise sentiment for English Language

If you want any other models from hugging face mention it in models parameter in pipeline call

Note: If you notice uncased in any model it's not case sensitive for the input text provides

In [5]:
better_sentiment = pipeline("sentiment-analysis", model="nlptown/bert-base-multilingual-uncased-sentiment", device="cuda")
result = better_sentiment("I should be more excited to be on the way to LLM mastery!!")
print(result)

Device set to use cuda


[{'label': '3 stars', 'score': 0.3944801986217499}]


In [6]:
result = better_sentiment("I'm super excited to be on the way to LLM mastery!")
print(result)

[{'label': '5 stars', 'score': 0.6633582711219788}]


Each Model will take some space in GPU VRAM if we dont clean up the unused models it will create OOM(Out Of Memory) Error in Cuda, So we will clear the space using torch and cuda methods

In [7]:
# clean up memory

del my_simple_sentiment_analyzer, better_sentiment # deleting the models from memory
gc.collect() 
torch.cuda.ipc_collect()
torch.cuda.empty_cache()

#### Named Entity Recognition (NER) identifies and categorizes key info like people, places, organizations, dates, and money, using types like Person (PER), Organization (ORG), Location (LOC), Geopolitical Entity (GPE), and Date

Common Entity Types (Categories):
* Person (PER): Names of individuals (e.g., Albert Einstein).
* Organization (ORG): Companies, agencies, institutions (e.g., Google, United Nations).
* Location (LOC): Geographical places (e.g., Mount Everest).
* Geopolitical Entity (GPE): Countries, states (e.g., United States).
* Date/Time: Calendar dates and times (e.g., July 2021, 5 PM).
* Money/Percent: Monetary values and percentages (e.g., $100, 50%).
* Product: Objects, vehicles, software (e.g., iPhone, Boeing 747).
* Event: Named occurrences (e.g., World War II, Olympics).
* Facility (FAC): Buildings, airports (e.g., Eiffel Tower).
* Work of Art/Language/Law: Titles, languages, legal docs. 

In [8]:
# Named Entity Recognition

# If No model was supplied in NER, defaulted to dbmdz/bert-large-cased-finetuned-conll03-english

ner = pipeline("ner", device="cuda")
result = ner("AI Engineers are learning about the amazing pipelines from HuggingFace in Google Colab from Ed Donner")
for entity in result:
  print(entity)

No model was supplied, defaulted to dbmdz/bert-large-cased-finetuned-conll03-english and revision 4c53496 (https://huggingface.co/dbmdz/bert-large-cased-finetuned-conll03-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
Some weights of the model checkpoint at dbmdz/bert-large-cased-finetuned-conll03-english were not used when initializing BertForTokenClassification: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Device set to use cuda


{'entity': 'I-ORG', 'score': np.float32(0.999476), 'index': 1, 'word': 'AI', 'start': 0, 'end': 2}
{'entity': 'I-ORG', 'score': np.float32(0.9962089), 'index': 2, 'word': 'Engineers', 'start': 3, 'end': 12}
{'entity': 'I-ORG', 'score': np.float32(0.8205686), 'index': 11, 'word': 'Hu', 'start': 59, 'end': 61}
{'entity': 'I-ORG', 'score': np.float32(0.65149444), 'index': 12, 'word': '##gging', 'start': 61, 'end': 66}
{'entity': 'I-ORG', 'score': np.float32(0.960993), 'index': 13, 'word': '##F', 'start': 66, 'end': 67}
{'entity': 'I-ORG', 'score': np.float32(0.9251713), 'index': 14, 'word': '##ace', 'start': 67, 'end': 70}
{'entity': 'I-MISC', 'score': np.float32(0.8882752), 'index': 16, 'word': 'Google', 'start': 74, 'end': 80}
{'entity': 'I-MISC', 'score': np.float32(0.6730776), 'index': 17, 'word': 'Cola', 'start': 81, 'end': 85}
{'entity': 'I-PER', 'score': np.float32(0.9989441), 'index': 20, 'word': 'Ed', 'start': 92, 'end': 94}
{'entity': 'I-PER', 'score': np.float32(0.99872154), 'i

In [9]:
# Clean up memory

del ner
gc.collect()
torch.cuda.ipc_collect()
torch.cuda.empty_cache()

#### Question Answering (QA) in NLP is about building systems that understand human questions in natural language and provide precise answers from Context text given

Note: If No model was supplied for question-answering task, defaulted to distilbert/distilbert-base-cased-distilled-squad

In [10]:
# Question Answering with Context

question="What are Hugging Face pipelines?"
context="Pipelines are a high level API for inference of LLMs with common tasks"

question_answerer = pipeline("question-answering", device="cuda")
result = question_answerer(question=question, context=context)
print(result)

No model was supplied, defaulted to distilbert/distilbert-base-cased-distilled-squad and revision 564e9b5 (https://huggingface.co/distilbert/distilbert-base-cased-distilled-squad).
Using a pipeline without specifying a model name and revision in production is not recommended.
Device set to use cuda


{'score': 0.2457699030637741, 'start': 35, 'end': 70, 'answer': 'inference of LLMs with common tasks'}


In [11]:
# Clean up memory

del question_answerer
gc.collect()
torch.cuda.ipc_collect()
torch.cuda.empty_cache()

#### Text summarization is an AI and Natural Language Processing (NLP) task that condenses long documents into shorter, accurate versions, capturing key points and meaning for quick understanding. 

Note: If No model was supplied for QA task, defaulted to sshleifer/distilbart-cnn-12-6

In [12]:
# Text Summarization

summarizer = pipeline("summarization", device="cuda")
text = """
The Hugging Face transformers library is an incredibly versatile and powerful tool for natural language processing (NLP).
It allows users to perform a wide range of tasks such as text classification, named entity recognition, and question answering, among others.
It's an extremely popular library that's widely used by the open-source data science community.
It lowers the barrier to entry into the field by providing Data Scientists with a productive, convenient way to work with transformer models.
"""

# max_length and min_length are the parameters which the summary will be generated(in terms of the number of tokens)

# do_sample=False: This parameter controls the decoding strategy. 
# When False, the model uses a deterministic approach (e.g., greedy decoding or beam search) to generate the summary. 
# When True, the model samples from the probability distribution over possible outputs, introducing randomness in the generated summary.

summary = summarizer(text, max_length=50, min_length=25, do_sample=False)
print(summary[0]['summary_text'])

No model was supplied, defaulted to sshleifer/distilbart-cnn-12-6 and revision a4f8f3e (https://huggingface.co/sshleifer/distilbart-cnn-12-6).
Using a pipeline without specifying a model name and revision in production is not recommended.
Device set to use cuda


 The Hugging Face transformers library is an incredibly versatile and powerful tool for natural language processing . It allows users to perform a wide range of tasks such as text classification, named entity recognition, and question answering .


In [13]:
# Clean up memory

del summarizer
gc.collect()
torch.cuda.ipc_collect()
torch.cuda.empty_cache()

#### Language translation in NLP is the process of using AI/ML models (like Transformers) to automatically convert text from a source language to a target language, focusing on meaning and context rather than just literal words

Note: If No model was supplied, defaulted to google-t5/t5-base

In [14]:
# Translation

translator = pipeline("translation_en_to_fr", device="cuda")
result = translator("The Data Scientists were truly amazed by the power and simplicity of the HuggingFace pipeline API.")
print(result[0]['translation_text'])

No model was supplied, defaulted to google-t5/t5-base and revision a9723ea (https://huggingface.co/google-t5/t5-base).
Using a pipeline without specifying a model name and revision in production is not recommended.
Device set to use cuda


Les Data Scientists ont été vraiment étonnés par la puissance et la simplicité de l'API du pipeline HuggingFace.


In [15]:
# Another translation, showing a model being specified
# All translation models are here: https://huggingface.co/models?pipeline_tag=translation&sort=trending

translator = pipeline("translation_en_to_es", model="Helsinki-NLP/opus-mt-en-es", device="cuda")
result = translator("The Data Scientists were truly amazed by the power and simplicity of the HuggingFace pipeline API.")
print(result[0]['translation_text'])

Device set to use cuda


Los científicos de datos estaban verdaderamente sorprendidos por el poder y la simplicidad de la API de tuberías HuggingFace.


In [16]:
# Clean up memory

del translator
gc.collect()
torch.cuda.ipc_collect()
torch.cuda.empty_cache()

#### Zero-shot classification is an AI method where models categorize text or data into classes they were never explicitly trained on, using their deep understanding of language and context from large datasets to infer relationships between input and new labels provided at inference time.

E.G.: If you give a sentence and ask the model to classify it between ```POSITIVE``` or ```NEGATIVE```, it will do classify only between these 2 labels

Note: If No model was supplied, defaulted to facebook/bart-large-mnli

In [17]:
# Classification

classifier = pipeline("zero-shot-classification", device="cuda")
result = classifier("Hugging Face's Transformers library is amazing!", candidate_labels=["technology", "sports", "politics"])
print(result)

No model was supplied, defaulted to facebook/bart-large-mnli and revision d7645e1 (https://huggingface.co/facebook/bart-large-mnli).
Using a pipeline without specifying a model name and revision in production is not recommended.
Device set to use cuda


{'sequence': "Hugging Face's Transformers library is amazing!", 'labels': ['technology', 'sports', 'politics'], 'scores': [0.9493840932846069, 0.03225000947713852, 0.018365921452641487]}


In [18]:
# Clean up memory

del classifier
gc.collect()
torch.cuda.ipc_collect()
torch.cuda.empty_cache()

#### Text generation in NLP is creating human-like, coherent text using AI, powered by deep learning models (like Transformers, LLMs) trained on vast data to predict word sequences, enabling applications from chatbots to content writing, by learning patterns to produce contextually relevant and fluent language, moving beyond old rule-based systems. 

Note: If No model was supplied, defaulted to openai-community/gpt2 

In [19]:
# Text Generation
# Since its an old model the context it generate is very bad

generator = pipeline("text-generation", device="cuda")
result = generator("If there's one thing I want you to remember about using HuggingFace pipelines, it's")
print(result[0]['generated_text'])

No model was supplied, defaulted to openai-community/gpt2 and revision 607a30d (https://huggingface.co/openai-community/gpt2).
Using a pipeline without specifying a model name and revision in production is not recommended.
Device set to use cuda
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


If there's one thing I want you to remember about using HuggingFace pipelines, it's that they can produce everything from a photo editor and a video editor together. When I go to the website, it's just all about producing content and the


In [20]:
# Clean up memory

del generator
gc.collect()
torch.cuda.ipc_collect()
torch.cuda.empty_cache()

In [None]:
# Image Generation - remember this?! Now you know what's going on
# Pipelines can be used for diffusion models as well as transformers

from IPython.display import display
from diffusers import AutoPipelineForText2Image
import torch

pipe = AutoPipelineForText2Image.from_pretrained("stabilityai/sdxl-turbo", torch_dtype=torch.float16, variant="fp16")
pipe.to("cuda")
prompt = "A class of students learning AI engineering in a vibrant pop-art style"
image = pipe(prompt=prompt, num_inference_steps=4, guidance_scale=0.0).images[0]
display(image)

In [21]:
# Audio Generation

from transformers import pipeline
from datasets import load_dataset
import soundfile as sf
import torch
from IPython.display import Audio

synthesiser = pipeline("text-to-speech", "microsoft/speecht5_tts", device='cuda')
embeddings_dataset = load_dataset("matthijs/cmu-arctic-xvectors", split="validation", trust_remote_code=True)
speaker_embedding = torch.tensor(embeddings_dataset[7306]["xvector"]).unsqueeze(0)
speech = synthesiser("Hi to an artificial intelligence engineer, on the way to mastery!", forward_params={"speaker_embeddings": speaker_embedding})

Audio(speech["audio"], rate=speech["sampling_rate"])

Device set to use cuda


### All the available pipelines

Here are all the pipelines available from Transformers and Diffusers.

With thanks to student Lucky P for suggesting I include this!

There's a list pipelines under the Tasks on this page (you have to scroll down a bit, then expand the parameters to see the Tasks):

https://huggingface.co/docs/transformers/main_classes/pipelines

There's also this list of Tasks for Diffusion models instead of Transformers, following the image generation example where I use DiffusionPipeline above.

https://huggingface.co/docs/diffusers/en/api/pipelines/overview

If you come up with some cool examples of other pipelines, please share them with me! It's wonderful how HuggingFace makes this advanced AI functionality available for inference with such a simple API.