# HuggingFace pipelines

For this session this Notebook to explore the HuggingFace High Level API, pipelines.

In [1]:
pip install -q -U "transformers>=4.41" "diffusers" "datasets" "soundfile" "torch" "huggingface_hub" "python-dotenv"

Note: you may need to restart the kernel to use updated packages.


In [2]:
# Imports & environment
from __future__ import annotations

import os
from pathlib import Path

from dotenv import load_dotenv
from huggingface_hub import login
import torch
from transformers import pipeline
from diffusers import DiffusionPipeline
from datasets import load_dataset
import soundfile as sf

try:
    from IPython.display import Audio, display
except ImportError:
    Audio = None  # type: ignore
    display = print  # type: ignore

In [3]:
# Notebook path – __file__ is undefined in Jupyter
NOTEBOOK_DIR = Path.cwd()
load_dotenv(NOTEBOOK_DIR / ".env")  # pulls HF_TOKEN into env

hf_token: str | None = os.getenv("HF_TOKEN")
if hf_token:
    login(hf_token, add_to_git_credential=True, write_permission=False)
else:
    print("⚠️  No HF_TOKEN found – running in anonymous mode (public models only).")

DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using device: {DEVICE}")

Note: Environment variable`HF_TOKEN` is set and is the current active token independently from the token you've just configured.


Using device: cpu


In [4]:
# Sentiment Analysis

sentiment = pipeline("sentiment-analysis", model="distilbert/distilbert-base-uncased-finetuned-sst-2-english", device=0 if DEVICE == "cuda" else -1)
sentiment("Absolutely thrilled to be diving headfirst into the world of LLMs — mastery, here I come!")


Device set to use cpu


[{'label': 'POSITIVE', 'score': 0.9996113181114197}]

In [5]:
# Named‑Entity Recognition

ner = pipeline("ner", grouped_entities=True, device=0 if DEVICE == "cuda" else -1)
ner("Barack Obama made history as the 44th president of the United States, ushering in a new era of leadership and hope.")

No model was supplied, defaulted to dbmdz/bert-large-cased-finetuned-conll03-english and revision 4c53496 (https://huggingface.co/dbmdz/bert-large-cased-finetuned-conll03-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
Some weights of the model checkpoint at dbmdz/bert-large-cased-finetuned-conll03-english were not used when initializing BertForTokenClassification: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Device set to use cpu


[{'entity_group': 'PER',
  'score': np.float32(0.9992453),
  'word': 'Barack Obama',
  'start': 0,
  'end': 12},
 {'entity_group': 'LOC',
  'score': np.float32(0.9990786),
  'word': 'United States',
  'start': 55,
  'end': 68}]

In [6]:
# Question Answering

qa = pipeline("question-answering", device=0 if DEVICE == "cuda" else -1)
qa(question="Who was the 44th president of the United States?", context="Barack Obama was the 44th president of the United States.")

No model was supplied, defaulted to distilbert/distilbert-base-cased-distilled-squad and revision 564e9b5 (https://huggingface.co/distilbert/distilbert-base-cased-distilled-squad).
Using a pipeline without specifying a model name and revision in production is not recommended.
Device set to use cpu


{'score': 0.9889456033706665, 'start': 0, 'end': 12, 'answer': 'Barack Obama'}

In [None]:
# Summarization

summariser = pipeline("summarization", device=0 if DEVICE == "cuda" else -1)
long_text = (
    "The Hugging Face Transformers library is a game-changer in natural language processing! From text classification "
    "to question answering and named entity recognition, it powers it all with ease. Beloved by the open-source community, "
    "it’s breaking down barriers and fast-tracking innovation like never before."
)
summariser(long_text, max_length=50, min_length=25, do_sample=False)[0]["summary_text"]

No model was supplied, defaulted to sshleifer/distilbart-cnn-12-6 and revision a4f8f3e (https://huggingface.co/sshleifer/distilbart-cnn-12-6).
Using a pipeline without specifying a model name and revision in production is not recommended.
Device set to use cpu


' Hugging Face Transformers library is a game-changer in natural language processing . From text classification to question answering and named entity recognition, it powers it all with ease . Beloved by the open-source community, it’s'

In [8]:
# Translation (EN→FR and EN→ES)

msg = "The Data Scientists were truly amazed by the power and simplicity of the HuggingFace pipeline API."

fr = pipeline("translation_en_to_fr", device=0 if DEVICE == "cuda" else -1)
fr(msg)[0]["translation_text"]

No model was supplied, defaulted to google-t5/t5-base and revision a9723ea (https://huggingface.co/google-t5/t5-base).
Using a pipeline without specifying a model name and revision in production is not recommended.
Device set to use cpu


"Les Data Scientists ont été vraiment étonnés par la puissance et la simplicité de l'API du pipeline HuggingFace."

In [None]:
# Zero‑Shot Classification 

zsc = pipeline("zero-shot-classification", device=0 if DEVICE == "cuda" else -1)
zsc("Hugging Face's Transformers library is amazing!", candidate_labels=["technology", "sports", "politics"])


No model was supplied, defaulted to facebook/bart-large-mnli and revision d7645e1 (https://huggingface.co/facebook/bart-large-mnli).
Using a pipeline without specifying a model name and revision in production is not recommended.
Device set to use cpu


{'sequence': "Hugging Face's Transformers library is amazing!",
 'labels': ['technology', 'sports', 'politics'],
 'scores': [0.9493837952613831, 0.032250262796878815, 0.01836598850786686]}

In [10]:
# Text Generation

gen = pipeline("text-generation", device=0 if DEVICE == "cuda" else -1)
gen("If there's one thing I want you to remember about using HuggingFace pipelines, it's", max_length=60)[0]["generated_text"]

No model was supplied, defaulted to openai-community/gpt2 and revision 607a30d (https://huggingface.co/openai-community/gpt2).
Using a pipeline without specifying a model name and revision in production is not recommended.
Device set to use cpu
Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Both `max_new_tokens` (=256) and `max_length`(=60) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


"If there's one thing I want you to remember about using HuggingFace pipelines, it's that they're the simplest and simplest way to make the data look clean.\n\nSo here's my idea: try to make your image look clean by using the same filters we use for our images. Here are our filter settings:\n\nFilter: #fetch-filter-from-image #make-image #filter-type: image image_type = 'text/svg';\n\nFilter: #filter-filter-from-image #filter-type: image image_type = 'text/css';\n\nFilter: #filter-filter-from-image #filter-type: image image_type = 'text/css';\n\nFilter: #filter-filter-from-image #filter-type: image image_type = 'text/css';\n\nFilter: #filter-filter-from-image #filter-type: image image_type = 'text/css';\n\nFilter: #filter-filter-from-image #filter-type: image image_type = 'text/css';\n\nFilter: #filter-filter-from-image #filter-type: image image_type = 'text/css';\n\nFilter: #filter-filter-from-image #filter"