# Lab 2.4.2: Pipeline Showcase

**Module:** 2.4 - Hugging Face Ecosystem  
**Time:** 2 hours  
**Difficulty:** ⭐⭐ (Beginner-Intermediate)

---

## Learning Objectives

By the end of this notebook, you will:
- [ ] Use the Pipeline API for quick inference without boilerplate code
- [ ] Demonstrate 5 different pipeline types: text generation, sentiment, NER, QA, summarization
- [ ] Understand when to use pipelines vs. manual model loading
- [ ] Customize pipeline behavior with advanced parameters
- [ ] Build a multi-task inference demo

---

## Prerequisites

- Completed: Lab 2.4.1 (Hub Exploration)
- Knowledge of: Basic Python, model loading concepts

---

## Real-World Context

Imagine you're building a customer support chatbot that needs to:
1. Understand if the customer is happy or angry (sentiment)
2. Extract key entities like product names and dates (NER)
3. Answer questions about policies (QA)
4. Summarize long complaint emails (summarization)
5. Generate helpful responses (text generation)

Without pipelines, you'd write hundreds of lines of tokenization, model loading, and post-processing code. With pipelines, each task is literally **one line of code**!

---

## ELI5: What is a Pipeline?

> **Imagine you want a smoothie.** You could:
> - Option A: Buy fruits, wash them, peel them, cut them, put them in a blender, blend, pour into glass
> - Option B: Walk up to a smoothie bar and say "One strawberry smoothie please!"
>
> **Pipelines are like the smoothie bar.** They handle all the messy preparation work:
> - Loading the right model
> - Preparing your text (tokenization)
> - Running the model
> - Converting the output to something useful
>
> You just say: `pipeline('sentiment-analysis')('I love this!')` → `POSITIVE`
>
> **In AI terms:** A pipeline is a high-level abstraction that chains together tokenization → model inference → post-processing into a single, easy-to-use function.

---

## Part 1: Pipeline Basics

In [None]:
# Install required packages
# Note: These packages are pre-installed in the NGC PyTorch container.
# Running pip install ensures you have compatible versions.

!pip install -q "transformers>=4.35.0" "huggingface_hub>=0.19.0" "datasets>=2.14.0"

print("Packages ready!")

In [None]:
# Import the pipeline function
from transformers import pipeline
import torch
import warnings
warnings.filterwarnings('ignore')

# Check hardware
# IMPORTANT: For pipelines, use integer device index (0=GPU, -1=CPU)
# This is DIFFERENT from torch.device("cuda") used elsewhere in the curriculum!
# The pipeline() function expects: device=0 (GPU) or device=-1 (CPU)
device = 0 if torch.cuda.is_available() else -1
print(f"Using device: {'GPU (cuda:0)' if device == 0 else 'CPU'}")

# Helper to show memory
def show_memory():
    if torch.cuda.is_available():
        used = torch.cuda.memory_allocated() / 1e9
        print(f"GPU Memory: {used:.2f} GB")

### The Simplest Pipeline

Creating a pipeline is as simple as specifying the task:

In [None]:
# Create a sentiment analysis pipeline
# This automatically downloads the default model for sentiment analysis!
sentiment_pipe = pipeline(
    "sentiment-analysis",
    device=device,
    torch_dtype=torch.bfloat16  # Use bfloat16 on DGX Spark
)

print("Pipeline created!")
print(f"Model: {sentiment_pipe.model.config._name_or_path}")
show_memory()

In [None]:
# Use the pipeline - it's that simple!
result = sentiment_pipe("I absolutely love learning about AI on my DGX Spark!")
print(result)

# Process multiple texts at once (batched inference)
texts = [
    "This is the best course ever!",
    "I'm struggling with this concept.",
    "The weather is nice today.",
    "I can't believe how terrible this experience was."
]

results = sentiment_pipe(texts)
print("\nBatch results:")
for text, result in zip(texts, results):
    print(f"  '{text[:40]}...' → {result['label']} ({result['score']:.2%})")

### What Just Happened?

With just `pipeline("sentiment-analysis")`, Hugging Face:
1. Downloaded the default sentiment model (`distilbert-base-uncased-finetuned-sst-2-english`)
2. Created a tokenizer for that model
3. Set up the model for inference
4. Created a post-processor that converts logits → readable labels

All in one line! And when you call it, it handles tokenization, inference, and output formatting automatically.

---

## Part 2: The Five Essential Pipelines

Let's showcase the five pipelines required for this task's deliverable.

### Pipeline 1: Text Generation

In [None]:
# Clean up previous pipeline
del sentiment_pipe
torch.cuda.empty_cache()

print("Creating text generation pipeline...")

# Text Generation Pipeline
generator = pipeline(
    "text-generation",
    model="gpt2",  # Specify model explicitly
    device=device,
    torch_dtype=torch.bfloat16
)

show_memory()

In [None]:
# Generate text with different parameters
prompt = "Artificial intelligence will transform"

print("Text Generation Demo\n")
print(f"Prompt: '{prompt}'\n")

# Default generation
result = generator(
    prompt,
    max_new_tokens=50,
    num_return_sequences=1
)
print("Default (greedy):")
print(f"  {result[0]['generated_text']}\n")

# Creative generation with sampling
result = generator(
    prompt,
    max_new_tokens=50,
    num_return_sequences=2,
    do_sample=True,
    temperature=0.9,
    top_p=0.95
)
print("Creative (sampling, temp=0.9):")
for i, r in enumerate(result):
    print(f"  {i+1}. {r['generated_text']}\n")

#### Generation Parameters Explained

| Parameter | Effect | When to Use |
|-----------|--------|-------------|
| `max_new_tokens` | Limit output length | Always set this! |
| `do_sample=True` | Enable random sampling | Creative tasks |
| `temperature` | Higher = more random | 0.7-1.0 for creativity |
| `top_p` | Nucleus sampling threshold | 0.9-0.95 typical |
| `top_k` | Limit vocabulary per step | 50-100 typical |
| `num_return_sequences` | Multiple outputs | Compare options |

### Pipeline 2: Sentiment Analysis (already shown above)

Let's do a more comprehensive demo:

In [None]:
# Clean up
del generator
torch.cuda.empty_cache()

# Create sentiment pipeline with a more powerful model
sentiment = pipeline(
    "sentiment-analysis",
    model="cardiffnlp/twitter-roberta-base-sentiment-latest",
    device=device,
    torch_dtype=torch.bfloat16
)

print("Loaded Twitter-RoBERTa sentiment model")
show_memory()

In [None]:
# Test on realistic examples
customer_feedback = [
    "Your product changed my life! Best purchase ever!",
    "Waited 3 weeks for delivery. Unacceptable.",
    "It's okay I guess. Does what it says.",
    "The customer service was incredibly helpful and patient.",
    "Don't waste your money on this garbage.",
    "Shipped fast but packaging was damaged.",
    "I've recommended this to all my friends!",
    "Why is this so complicated to set up?"
]

print("Customer Feedback Sentiment Analysis\n")
print(f"{'Feedback':<55} {'Sentiment':<10} {'Score'}")
print("=" * 75)

results = sentiment(customer_feedback)
for text, result in zip(customer_feedback, results):
    display_text = text[:52] + "..." if len(text) > 55 else text
    print(f"{display_text:<55} {result['label']:<10} {result['score']:.2%}")

### Pipeline 3: Named Entity Recognition (NER)

In [None]:
# Clean up
del sentiment
torch.cuda.empty_cache()

# Create NER pipeline
ner = pipeline(
    "ner",
    model="dslim/bert-base-NER",
    device=device,
    aggregation_strategy="simple"  # Group tokens into words
)

print("Loaded BERT-NER model")
show_memory()

In [None]:
# Test NER on various texts
test_texts = [
    "Apple CEO Tim Cook announced new products in Cupertino, California.",
    "The DGX Spark was developed by NVIDIA and announced at CES 2025.",
    "Dr. Sarah Johnson from MIT published research in Nature last Tuesday.",
    "Amazon and Microsoft are competing with Google in cloud services."
]

print("Named Entity Recognition Demo\n")

for text in test_texts:
    print(f"Text: {text}")
    entities = ner(text)
    
    if entities:
        print("  Entities found:")
        for entity in entities:
            print(f"    - '{entity['word']}' → {entity['entity_group']} (confidence: {entity['score']:.2%})")
    else:
        print("  No entities found.")
    print()

#### NER Entity Types

| Entity | Meaning | Examples |
|--------|---------|----------|
| PER | Person | Tim Cook, Sarah Johnson |
| ORG | Organization | Apple, NVIDIA, MIT |
| LOC | Location | California, Cupertino |
| MISC | Miscellaneous | CES 2025, Nature |

### Pipeline 4: Question Answering

In [None]:
# Clean up
del ner
torch.cuda.empty_cache()

# Create QA pipeline
qa = pipeline(
    "question-answering",
    model="deepset/roberta-base-squad2",
    device=device,
    torch_dtype=torch.bfloat16
)

print("Loaded RoBERTa QA model (trained on SQuAD 2.0)")
show_memory()

In [None]:
# Context about DGX Spark
context = """
The NVIDIA DGX Spark is a desktop AI supercomputer designed for researchers, 
developers, and data scientists. Powered by the NVIDIA Blackwell GB10 Superchip, 
it features 128GB of unified LPDDR5X memory shared between CPU and GPU. 

The system delivers up to 1 PFLOP of FP4 compute performance and approximately 
209 TFLOPS at FP8 precision. It includes 6,144 CUDA cores and 192 fifth-generation 
Tensor Cores optimized for AI workloads.

Unlike cloud-based solutions, the DGX Spark runs locally on your desk, 
eliminating cloud costs and data privacy concerns. It can run models with 
up to 70 billion parameters using the unified memory architecture.

The device was announced at CES 2025 and runs on the NVIDIADesktop OS, 
a Linux-based operating system. It supports popular frameworks like 
PyTorch and TensorFlow through NGC containers.
"""

questions = [
    "How much memory does the DGX Spark have?",
    "What chip powers the DGX Spark?",
    "How many CUDA cores are in the system?",
    "When was the DGX Spark announced?",
    "What operating system does it run?",
    "What is the maximum model size it can run?"
]

print("Question Answering Demo\n")
print("Context: Information about DGX Spark\n")
print("=" * 70)

for question in questions:
    result = qa(question=question, context=context)
    print(f"Q: {question}")
    print(f"A: {result['answer']} (confidence: {result['score']:.2%})")
    print()

#### QA Pipeline Tips

1. **Context matters**: The answer must be in the context (extractive QA)
2. **Confidence scores**: Low scores often mean the answer isn't in the context
3. **Chunk long documents**: Split into smaller contexts for better results
4. **SQuAD 2.0 models**: Can indicate "no answer" if context doesn't contain it

### Pipeline 5: Summarization

In [None]:
# Clean up
del qa
torch.cuda.empty_cache()

# Create summarization pipeline
summarizer = pipeline(
    "summarization",
    model="facebook/bart-large-cnn",
    device=device,
    torch_dtype=torch.bfloat16
)

print("Loaded BART-CNN summarization model")
show_memory()

In [None]:
# Long article to summarize
article = """
Artificial intelligence has made remarkable strides in recent years, transforming 
industries from healthcare to finance. The development of large language models 
has been particularly significant, with systems like GPT-4 and Claude demonstrating 
unprecedented capabilities in understanding and generating human-like text.

One of the most important developments has been the democratization of AI technology. 
Platforms like Hugging Face have made it possible for developers worldwide to access 
and deploy state-of-the-art models without needing massive computational resources 
or specialized expertise. This has led to an explosion of AI applications across 
every sector of the economy.

However, challenges remain. The environmental impact of training large models is 
substantial, with some estimates suggesting a single training run can produce as 
much carbon as five cars over their lifetime. Additionally, concerns about bias, 
misinformation, and job displacement continue to fuel debates about how AI should 
be developed and regulated.

Looking ahead, experts predict that AI will become increasingly integrated into 
daily life. From personal assistants that truly understand context to medical 
diagnostic systems that can detect diseases earlier than human doctors, the 
potential applications are vast. The key challenge will be ensuring these 
technologies are developed responsibly and benefit all of humanity.
"""

print("Summarization Demo\n")
print(f"Original article length: {len(article.split())} words\n")

# Generate different length summaries
short_summary = summarizer(
    article, 
    max_length=50, 
    min_length=20,
    do_sample=False
)

medium_summary = summarizer(
    article,
    max_length=100,
    min_length=50,
    do_sample=False
)

print("Short Summary (20-50 tokens):")
print(f"  {short_summary[0]['summary_text']}")
print(f"  ({len(short_summary[0]['summary_text'].split())} words)\n")

print("Medium Summary (50-100 tokens):")
print(f"  {medium_summary[0]['summary_text']}")
print(f"  ({len(medium_summary[0]['summary_text'].split())} words)")

---

## Part 3: Advanced Pipeline Features

### Specifying Custom Models

In [None]:
# You can use ANY compatible model from the Hub
# Clean up first
del summarizer
torch.cuda.empty_cache()

# Use a specific model for sentiment
financial_sentiment = pipeline(
    "sentiment-analysis",
    model="ProsusAI/finbert",  # Specialized for financial text!
    device=device
)

financial_texts = [
    "The stock price surged 15% after the earnings report.",
    "The company filed for bankruptcy following poor quarterly results.",
    "Revenue remained stable compared to last year.",
    "Investors are optimistic about the merger announcement."
]

print("Financial Sentiment Analysis (using FinBERT)\n")
results = financial_sentiment(financial_texts)
for text, result in zip(financial_texts, results):
    print(f"'{text[:50]}...'")
    print(f"  → {result['label']} ({result['score']:.2%})\n")

### Batch Processing for Efficiency

In [None]:
import time

# Create 100 sample texts
sample_texts = [f"Sample text number {i} for batch processing test." for i in range(100)]

# Process one at a time (slow)
start = time.time()
for text in sample_texts[:20]:  # Just first 20 for demo
    _ = financial_sentiment(text)
sequential_time = time.time() - start

# Process in batches (fast!)
start = time.time()
_ = financial_sentiment(sample_texts[:20], batch_size=8)
batch_time = time.time() - start

print("Batch Processing Performance Comparison\n")
print(f"Sequential (one at a time): {sequential_time:.2f}s")
print(f"Batched (batch_size=8):      {batch_time:.2f}s")
print(f"Speedup: {sequential_time/batch_time:.1f}x faster!")

### Pipeline on GPU with Specific Device

In [None]:
# You can specify GPU device in several ways
print("Device specification options:\n")

# Method 1: device index
# pipe = pipeline("task", device=0)  # First GPU

# Method 2: device string  
# pipe = pipeline("task", device="cuda:0")  # Explicit

# Method 3: device_map for large models
# pipe = pipeline("task", device_map="auto")  # Auto-distribute

# Method 4: Stay on CPU
# pipe = pipeline("task", device=-1)  # Force CPU

print("device=0          → First GPU")
print("device='cuda:0'   → Explicit GPU selection")
print("device_map='auto' → Auto-distribute (for huge models)")
print("device=-1         → Force CPU")

---

## Part 4: Building a Multi-Task Demo

Let's build a unified demo that showcases all five pipelines!

In [None]:
# Clean up
del financial_sentiment
torch.cuda.empty_cache()

class MultiTaskNLP:
    """
    A multi-task NLP system using Hugging Face pipelines.
    Demonstrates all five required pipeline types.
    """
    
    def __init__(self, device=0):
        self.device = device
        self.pipelines = {}
        print("Initializing Multi-Task NLP System...")
    
    def load_pipeline(self, task, model=None):
        """Load a specific pipeline on demand."""
        if task in self.pipelines:
            return self.pipelines[task]
        
        print(f"  Loading {task} pipeline...")
        
        task_configs = {
            "sentiment": ("sentiment-analysis", "distilbert-base-uncased-finetuned-sst-2-english"),
            "ner": ("ner", "dslim/bert-base-NER"),
            "qa": ("question-answering", "distilbert-base-cased-distilled-squad"),
            "summarization": ("summarization", "sshleifer/distilbart-cnn-12-6"),
            "generation": ("text-generation", "distilgpt2")
        }
        
        task_name, default_model = task_configs.get(task, (task, None))
        model = model or default_model
        
        kwargs = {"device": self.device}
        if task == "ner":
            kwargs["aggregation_strategy"] = "simple"
        
        self.pipelines[task] = pipeline(task_name, model=model, **kwargs)
        return self.pipelines[task]
    
    def analyze_text(self, text, context=None):
        """Run all analyses on a piece of text."""
        results = {"original_text": text}
        
        # Sentiment
        pipe = self.load_pipeline("sentiment")
        sentiment_result = pipe(text)[0]
        results["sentiment"] = {
            "label": sentiment_result["label"],
            "confidence": sentiment_result["score"]
        }
        
        # NER
        pipe = self.load_pipeline("ner")
        ner_result = pipe(text)
        results["entities"] = [
            {"text": e["word"], "type": e["entity_group"], "confidence": e["score"]}
            for e in ner_result
        ]
        
        # QA (if context provided)
        if context:
            pipe = self.load_pipeline("qa")
            qa_result = pipe(question=text, context=context)
            results["qa_answer"] = {
                "answer": qa_result["answer"],
                "confidence": qa_result["score"]
            }
        
        return results
    
    def summarize(self, text, max_length=100):
        """Summarize a piece of text."""
        pipe = self.load_pipeline("summarization")
        result = pipe(text, max_length=max_length, min_length=30)
        return result[0]["summary_text"]
    
    def generate(self, prompt, max_tokens=50):
        """Generate text continuation."""
        pipe = self.load_pipeline("generation")
        result = pipe(
            prompt, 
            max_new_tokens=max_tokens,
            do_sample=True,
            temperature=0.7
        )
        return result[0]["generated_text"]
    
    def cleanup(self):
        """Free GPU memory."""
        self.pipelines.clear()
        torch.cuda.empty_cache()
        print("Cleaned up pipelines.")


# Create instance
nlp = MultiTaskNLP(device=device)

In [None]:
# Demo the multi-task system
print("=" * 70)
print("MULTI-TASK NLP DEMO")
print("=" * 70)

# Test text
test_text = "Apple CEO Tim Cook announced that the company will invest $1 billion in AI research."

print(f"\nInput: '{test_text}'\n")

# Analyze
analysis = nlp.analyze_text(test_text)

print("--- Sentiment Analysis ---")
print(f"  {analysis['sentiment']['label']} (confidence: {analysis['sentiment']['confidence']:.2%})")

print("\n--- Named Entities ---")
for entity in analysis['entities']:
    print(f"  {entity['text']}: {entity['type']} ({entity['confidence']:.2%})")

In [None]:
# Test summarization
long_text = """
The field of natural language processing has seen tremendous advances in recent years. 
Large language models, trained on vast amounts of text data, can now perform a wide 
range of tasks from translation to creative writing. Companies like OpenAI, Google, 
and Anthropic have developed increasingly capable systems. However, these advances 
come with challenges including computational costs, potential biases, and concerns 
about misuse. Researchers are working on making these models more efficient and safer.
"""

print("\n--- Summarization ---")
print(f"Original ({len(long_text.split())} words):")
print(f"  {long_text[:150]}...\n")

summary = nlp.summarize(long_text)
print(f"Summary ({len(summary.split())} words):")
print(f"  {summary}")

In [None]:
# Test generation
print("\n--- Text Generation ---")
prompt = "The future of AI is"
print(f"Prompt: '{prompt}'")

generated = nlp.generate(prompt, max_tokens=40)
print(f"Generated: {generated}")

In [None]:
# Test QA
print("\n--- Question Answering ---")
context = """The DGX Spark is NVIDIA's desktop AI supercomputer. It has 128GB of 
unified memory and is powered by the Blackwell GB10 chip. It was announced at CES 2025."""

question = "How much memory does the DGX Spark have?"
print(f"Context: {context[:80]}...")
print(f"Question: {question}")

analysis = nlp.analyze_text(question, context=context)
print(f"Answer: {analysis['qa_answer']['answer']} (confidence: {analysis['qa_answer']['confidence']:.2%})")

In [None]:
# Cleanup
nlp.cleanup()
show_memory()

---

## Try It Yourself: Customer Support Bot

Build a simple customer support analyzer that:
1. Detects if the customer is angry (sentiment)
2. Extracts product names mentioned (NER)
3. Generates a helpful response (generation)

<details>
<summary>Hint</summary>

```python
def analyze_support_ticket(ticket_text):
    # 1. Check sentiment
    sentiment_pipe = pipeline("sentiment-analysis", device=device)
    sentiment = sentiment_pipe(ticket_text)[0]
    
    # 2. Extract entities
    ner_pipe = pipeline("ner", aggregation_strategy="simple", device=device)
    entities = ner_pipe(ticket_text)
    
    # 3. Generate response based on sentiment
    gen_pipe = pipeline("text-generation", device=device)
    if sentiment['label'] == 'NEGATIVE':
        prompt = "Dear valued customer, we apologize for"
    else:
        prompt = "Dear valued customer, thank you for"
    
    response = gen_pipe(prompt, max_new_tokens=50)[0]['generated_text']
    
    return sentiment, entities, response
```
</details>

In [None]:
# YOUR CODE HERE
# Build your customer support analyzer!

def analyze_support_ticket(ticket_text):
    """Analyze a customer support ticket."""
    # TODO: Implement the analyzer
    pass

# Test with sample tickets
sample_tickets = [
    "I've been waiting 3 weeks for my MacBook to arrive! This is unacceptable!",
    "Just wanted to say your customer service team was amazing. Thanks!",
    "The iPhone screen is cracked and it's only been a week since I bought it."
]

# for ticket in sample_tickets:
#     result = analyze_support_ticket(ticket)
#     print(f"Ticket: {ticket}")
#     print(f"Analysis: {result}\n")

---

## Common Mistakes

### Mistake 1: Not Specifying Device

In [None]:
# WRONG: Let pipeline decide (might use CPU on some systems)
# pipe = pipeline("sentiment-analysis")  # May run on CPU!

# CORRECT: Explicitly specify device
# pipe = pipeline("sentiment-analysis", device=0)  # Force GPU

print("Always specify device for predictable performance!")

### Mistake 2: Wrong Pipeline for Task

In [None]:
# WRONG: Using text-classification pipeline for NER
# The model might load but give wrong outputs!

# CORRECT: Match pipeline type to model type
print("Pipeline-Task Matching:\n")
print("sentiment-analysis  → Text classification models")
print("ner                 → Token classification models")
print("question-answering  → QA models (extractive)")
print("summarization       → Seq2Seq models (BART, T5)")
print("text-generation     → Causal LM models (GPT-2, etc.)")

### Mistake 3: Not Cleaning Up Between Pipelines

In [None]:
# WRONG: Creating many pipelines without cleanup
# p1 = pipeline("sentiment-analysis")
# p2 = pipeline("ner")
# p3 = pipeline("summarization")
# All three are in memory!

# CORRECT: Delete and clear cache between pipelines
# p1 = pipeline("sentiment-analysis")
# ... use p1 ...
# del p1
# torch.cuda.empty_cache()
# p2 = pipeline("ner")

print("Memory tip: delete pipelines and call torch.cuda.empty_cache()!")

---

## Checkpoint

You've learned:
- ✅ How to create pipelines for instant NLP capabilities
- ✅ How to use 5 different pipeline types (generation, sentiment, NER, QA, summarization)
- ✅ How to customize pipeline behavior with parameters
- ✅ How to build multi-task NLP systems
- ✅ Best practices for device management and memory

---

## Challenge: Build a Content Moderation System

Create a content moderation pipeline that:
1. Detects toxic language (sentiment or specialized toxicity model)
2. Extracts mentioned entities (NER)
3. Summarizes the content for human review
4. Generates a moderation decision explanation

In [None]:
# YOUR CHALLENGE CODE HERE
# Hint: Look for toxicity detection models on the Hub!
# e.g., "unitary/toxic-bert"


---

## Further Reading

- [Pipeline Documentation](https://huggingface.co/docs/transformers/main_classes/pipelines)
- [Available Pipeline Tasks](https://huggingface.co/docs/transformers/task_summary)
- [Custom Pipelines](https://huggingface.co/docs/transformers/add_new_pipeline)

---

## Cleanup

In [None]:
# Final cleanup
import gc

# Clear any remaining objects
torch.cuda.empty_cache()
gc.collect()

print("Cleanup complete!")
show_memory()

---

## Next Steps

In the next notebook, **03-dataset-processing.ipynb**, we'll learn how to load and process datasets efficiently using the Hugging Face `datasets` library - essential for fine-tuning models!

Great job completing Lab 2.4.2! You now have a powerful toolkit of NLP pipelines at your fingertips!