
# Hugging Face `pipeline`: Beginner-Friendly Tour (with Zero‑Shot Demos)

**Audience:** 4th‑year CS students — beginner friendly  
**Goal:** Learn how to use 🤗 Transformers **`pipeline`** to run **pretrained** models quickly (no fine‑tuning) and
see where **zero‑shot** pipelines are powerful.

> This follows the spirit of HF LLM Course Chapter 1.3 (Pipelines) and adds gentle explanations, tips, and exercises.



## 0) Setup (Install, Seeds, Device)


In [None]:

# If needed, uncomment to install dependencies:
# %pip install -U transformers torch accelerate sentencepiece sacremoses datasets

import os, random, time
import numpy as np
import torch
from transformers import pipeline

# --- Reproducibility (mostly affects sampling-based generators) ---
SEED = 42
def set_seed(seed=SEED):
    random.seed(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    torch.cuda.manual_seed_all(seed)

set_seed()

# --- Device selection for pipelines: device=0 means CUDA GPU 0; -1 means CPU ---
device = 0 if torch.cuda.is_available() else -1
print("Torch:", torch.__version__, "| CUDA available:", torch.cuda.is_available(), "| pipeline(device) =", device)



## 1) What *is* `pipeline`?



`pipeline(task, model=..., tokenizer=...)` is a **high‑level helper** that hides the usual steps:
1. **Tokenize** raw inputs (text → token IDs)  
2. **Run the model** forward pass  
3. **Post‑process** outputs into friendly Python objects (labels, scores, text, etc.)

If you **omit `model=`**, it will pick a sensible default from the Hub for that task.  
You can always **override** with a specific model ID later.



### 1.1 Minimal example: Sentiment Analysis (zero‑shot pretrained)


In [None]:

# Default pretrained model for sentiment analysis (zero-shot use: we don't fine-tune anything here)
sentiment = pipeline("sentiment-analysis", device=device)
result = sentiment("I absolutely love using Transformers—it's so convenient!")
print(result)  # [{'label': 'POSITIVE', 'score': ...}]



### 1.2 Batch inputs (faster than looped single calls)


In [None]:

texts = [
    "Today is an amazing day 😄",
    "The interface is confusing and slow.",
    "Meh, it's fine but could be better.",
]
for out in sentiment(texts):
    print(out)



### 1.3 Under the hood
- Picks / loads a **tokenizer** and **model** (from the Hub cache after first download)  
- Handles **padding/truncation** defaults (configurable)  
- Converts logits → **labels/scores**  
- Runs on **GPU** if available, else CPU



## 2) Zero‑Shot Text Classification (No Task‑Specific Training)



**Problem:** Classify a sentence into **labels you choose on the fly** (e.g., `"bug report"`, `"feature request"`), even if the model was **not trained** specifically for those labels.

**How:** Use a Natural Language Inference (NLI) model (e.g., `facebook/bart-large-mnli`) under the hood.  
The pipeline scores how much the text **entails** each label hypothesis (e.g., “This text is about *a bug report*”).


In [None]:

zs = pipeline(
    task="zero-shot-classification",
    model="facebook/bart-large-mnli",
    device=device
)

candidate_labels = ["customer support", "bug report", "feature request", "pricing", "complaint"]
text = "The app keeps crashing whenever I try to upload a photo. Please fix it."
zs(text, candidate_labels=candidate_labels)  # labels with probabilities



**Batch example:** Pass several texts to save time.


In [None]:

batch_texts = [
    "Could you add SSO with Okta?",
    "My credit card was charged twice.",
    "New release works flawlessly. Kudos to the team!",
]
zs(batch_texts, candidate_labels=candidate_labels)



**Tip:** Ask for **all scores** to see probabilities for every label (useful for multi‑label cases).


In [None]:

zs(text, candidate_labels=candidate_labels, multi_label=True)  # treats labels as independent



## 3) Masked Language Modeling — `fill-mask` (Pretrained, Zero‑Shot)



Given a sentence with a **mask token**, predict the missing word(s) using a pretrained masked‑LM (`bert-base-uncased`).


In [None]:

fill = pipeline("fill-mask", model="bert-base-uncased", device=device)
fill("Transformers are the most [MASK] library for NLP.")



## 4) Named Entity Recognition (NER) — Token Classification



Extract entities (people, organizations, locations) from text.  
`aggregation_strategy="simple"` merges word‑pieces into full tokens.


In [None]:

ner = pipeline("token-classification", aggregation_strategy="simple", device=device)
ner("Hugging Face is based in Paris and New York, and Google is one of its partners.")



## 5) Extractive Question Answering (QA)



Given a **context** and a **question**, extract the answer span.  
We use a pretrained QA model (often fine‑tuned on SQuAD) as‑is (zero‑shot on your text).


In [None]:

qa = pipeline("question-answering", device=device)
context = \"\"\"
The Transformer architecture, introduced in 2017, replaced recurrent networks for many NLP tasks.
It relies entirely on attention mechanisms to draw global dependencies between input and output.
\"\"\"
qa({"question": "What architecture replaced recurrent networks?", "context": context})



## 6) Summarization (Seq2Seq)



Abstractive summarization with a pretrained seq2seq model. Keep `max_length` small for quick demos.


In [None]:

summarizer = pipeline("summarization", device=device)
article = (
    "Transformers have revolutionized natural language processing by enabling "
    "parallel training and capturing long-range dependencies efficiently. "
    "Libraries like Hugging Face Transformers provide user-friendly APIs for "
    "inference and fine-tuning, accelerating research and production adoption."
)
summarizer(article, max_length=40, min_length=10, do_sample=False)



## 7) Translation (EN → ES)



Pretrained machine translation models can translate out‑of‑the‑box.


In [None]:

translator = pipeline("translation", model="Helsinki-NLP/opus-mt-en-es", device=device)
translator("Transformers make transfer learning straightforward and effective.")



## 8) Text Generation (Causal LM)



Generate continuations with an autoregressive model (e.g., `gpt2`). Use a small `max_length` for speed.


In [None]:

generator = pipeline("text-generation", model="gpt2", device=device)
prompt = "In a future where AI assists every developer,"
generator(prompt, max_length=40, num_return_sequences=1)



## 9) (Optional) Zero‑Shot **Image** Classification with CLIP



Zero‑shot also works for images using **text prompts** as labels.  
Below is commented code (needs internet to fetch an image). Uncomment to try.


In [None]:

# from transformers import pipeline
# import requests
# from PIL import Image
# from io import BytesIO
# 
# img_url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/image_classification.jpeg"
# image = Image.open(BytesIO(requests.get(img_url).content))
# labels = ["a cat", "a dog", "a bird", "a car"]
# clip = pipeline("zero-shot-image-classification", model="openai/clip-vit-base-patch32", device=device)
# clip(image, candidate_labels=labels)



## 10) Tiny Timing Utility (CPU vs GPU)



Measure average runtime to compare CPU vs GPU or different models.


In [None]:

def time_call(fn, *args, repeat=3, **kwargs):
    # Return average runtime (s) of calling fn(*args, **kwargs)
    times = []
    import time as _t
    for _ in range(repeat):
        t0 = _t.time()
        _ = fn(*args, **kwargs)
        times.append(_t.time() - t0)
    return sum(times) / len(times)

avg = time_call(sentiment, texts, repeat=2)
print(f"Avg runtime over 2 runs: {avg:.3f}s")



## 11) Exercises (Try It Yourself)



1) **Zero‑Shot Labels:** Change `candidate_labels` in Section 2 (e.g., ["how‑to", "billing", "outage", "praise"]).  
2) **Domain Shift:** Paste your own domain text into QA / Summarization; compare outputs.  
3) **Model Cards:** In the Hub, open the model page used by a pipeline (e.g., `facebook/bart-large-mnli`). Skim the **Model Card** for intended use & limitations.  
4) **Batching:** Time single‑input vs list‑input for sentiment. Which is faster?  
5) **Multilingual:** Try `xlm-roberta-large` with NER or zero‑shot and test on non‑English text.



## 12) Troubleshooting Tips



- **Downloads are slow / blocked:** Pre‑download models or set a local HF cache (`HF_HOME`).  
- **Out of memory:** Use smaller models (e.g., `distilbert-base-uncased`), smaller `max_length`, or switch to CPU.  
- **Mismatched mask token:** For `fill-mask`, ensure your input uses the right mask (e.g., `[MASK]` for BERT).  
- **Reproducibility:** Generation may vary slightly; set seeds and keep sequences short in demos.
