# 6. Zero-Shot Classification

**Estimated Time**: ~2 hours

**Prerequisites**: Notebooks 1-5 (understanding of model flexibility from text generation, basic NLP concepts)

---

## Learning Objectives

By the end of this notebook, you will be able to:

1. **Understand** how zero-shot classification works without task-specific training
2. **Explain** how Natural Language Inference (NLI) enables flexible classification
3. **Design** effective label sets for different classification tasks
4. **Handle** multi-label classification scenarios
5. **Build** a custom content tagger for real-world use cases

## Setup

Run this cell first. If you completed previous notebooks, you already have the core packages ready.

In [None]:
# Core imports
from transformers import pipeline

# Suppress warnings for cleaner output
import warnings
warnings.filterwarnings('ignore')

print("Setup complete!")

---

# Part 1: Conceptual Foundation

## What is Zero-Shot Classification?

**In plain English**: Zero-shot classification lets a model categorize text into categories it was never explicitly trained on. You just tell it what categories exist, and it figures out which one fits.

**Technical definition**: Zero-shot classification uses a model trained on Natural Language Inference (NLI) to determine if a text "entails" (implies) a given label hypothesis, without requiring task-specific fine-tuning.

### Why "Zero-Shot"?

```
TRADITIONAL CLASSIFICATION:
‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê
‚îÇ  Step 1: Collect thousands of labeled examples              ‚îÇ
‚îÇ          "This is great!" ‚Üí positive                        ‚îÇ
‚îÇ          "I hate this" ‚Üí negative                           ‚îÇ
‚îÇ          ... (thousands more) ...                           ‚îÇ
‚îÇ                                                             ‚îÇ
‚îÇ  Step 2: Train a model specifically for this task           ‚îÇ
‚îÇ                                                             ‚îÇ
‚îÇ  Step 3: Model can ONLY classify into [positive, negative]  ‚îÇ
‚îÇ          Want new categories? Start over at Step 1!         ‚îÇ
‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò

ZERO-SHOT CLASSIFICATION:
‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê
‚îÇ  Step 1: Load pre-trained model (already done!)             ‚îÇ
‚îÇ                                                             ‚îÇ
‚îÇ  Step 2: Give it ANY labels you want                        ‚îÇ
‚îÇ          ["positive", "negative", "neutral"]                ‚îÇ
‚îÇ          ["sports", "politics", "entertainment"]            ‚îÇ
‚îÇ          ["urgent", "normal", "spam"]                       ‚îÇ
‚îÇ                                                             ‚îÇ
‚îÇ  Step 3: Classify! Change labels anytime!                   ‚îÇ
‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò
```

**"Zero-shot"** = zero training examples needed for your specific task

### How It Works: Natural Language Inference (NLI)

The secret is **NLI** - a task where models learn to determine relationships between sentences:

```
NLI TASK:
Given a PREMISE and a HYPOTHESIS, determine their relationship:

‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê
‚îÇ  Premise:    "A man is playing guitar on stage."           ‚îÇ
‚îÇ  Hypothesis: "A musician is performing."                   ‚îÇ
‚îÇ  Relationship: ENTAILMENT ‚úì (premise implies hypothesis)   ‚îÇ
‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò

‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê
‚îÇ  Premise:    "A man is playing guitar on stage."           ‚îÇ
‚îÇ  Hypothesis: "The stage is empty."                         ‚îÇ
‚îÇ  Relationship: CONTRADICTION ‚úó (premise contradicts hyp.)  ‚îÇ
‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò

‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê
‚îÇ  Premise:    "A man is playing guitar on stage."           ‚îÇ
‚îÇ  Hypothesis: "The guitar is red."                          ‚îÇ
‚îÇ  Relationship: NEUTRAL ‚óã (can't tell from premise)         ‚îÇ
‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò
```

### Turning NLI into Classification

```
ZERO-SHOT CLASSIFICATION TRICK:

Text to classify: "Apple announces new iPhone with better camera"
Labels: ["technology", "sports", "politics"]

Convert to NLI format:
‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê
‚îÇ  Premise: "Apple announces new iPhone with better camera"     ‚îÇ
‚îÇ                                                                ‚îÇ
‚îÇ  Test each label as hypothesis:                                ‚îÇ
‚îÇ                                                                ‚îÇ
‚îÇ  Hypothesis: "This text is about technology."                 ‚îÇ
‚îÇ  ‚Üí ENTAILMENT score: 0.92 ‚úì HIGH                              ‚îÇ
‚îÇ                                                                ‚îÇ
‚îÇ  Hypothesis: "This text is about sports."                     ‚îÇ
‚îÇ  ‚Üí ENTAILMENT score: 0.03 ‚úó LOW                               ‚îÇ
‚îÇ                                                                ‚îÇ
‚îÇ  Hypothesis: "This text is about politics."                   ‚îÇ
‚îÇ  ‚Üí ENTAILMENT score: 0.05 ‚úó LOW                               ‚îÇ
‚îÇ                                                                ‚îÇ
‚îÇ  Result: "technology" wins!                                   ‚îÇ
‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò
```

### Connection to Previous Notebooks

| Notebook | Model Flexibility | Training Required |
|----------|-------------------|------------------|
| 1-4 (MLM, NER, QA, Summarization) | Fixed tasks | Task-specific training |
| 5 (Text Generation) | Flexible prompts | No fine-tuning needed |
| **6 (Zero-Shot Classification)** | **Any labels** | **No training needed** |

Like text generation, zero-shot classification demonstrates the flexibility of modern language models - they can perform tasks they weren't explicitly trained for!

### The Hypothesis Template

The model converts labels into hypotheses using a template:

```
Default template: "This example is {label}."

Label "technology" ‚Üí "This example is technology."
Label "sports"     ‚Üí "This example is sports."

Custom templates can improve accuracy:
"This text is about {label}."      ‚Üí Better for topics
"The sentiment is {label}."        ‚Üí Better for sentiment
"This email is {label}."           ‚Üí Better for email classification
```

### Real-World Applications

Zero-shot classification is powerful for:

- **Content Moderation**: Classify user content into categories (safe, nsfw, spam)
- **Customer Support**: Route tickets to departments without pre-labeling
- **News Categorization**: Tag articles with topics dynamically
- **Sentiment Analysis**: Classify into any sentiment scale you define
- **Intent Detection**: Understand user intent in chatbots
- **Rapid Prototyping**: Test classification ideas without collecting labeled data

### Key Terminology

| Term | Definition |
|------|------------|
| **Zero-shot** | Classification without task-specific training examples |
| **NLI** | Natural Language Inference - determining logical relationships |
| **Entailment** | When one statement logically implies another |
| **Hypothesis Template** | Pattern for converting labels into test sentences |
| **Multi-label** | When text can belong to multiple categories |
| **Candidate Labels** | The possible categories for classification |

### Check Your Understanding

Before moving on, try to answer these questions (answers at the end):

1. Why is it called "zero-shot" classification?
   - A) Because it takes zero seconds to run
   - B) Because it needs zero task-specific training examples
   - C) Because it has zero accuracy

2. What NLI relationship does zero-shot classification look for?
   - A) Contradiction
   - B) Entailment
   - C) Neutral

3. What is the purpose of the hypothesis template?
   - A) To generate new text
   - B) To convert labels into testable sentences
   - C) To train the model

4. Which is a valid use case for zero-shot classification?
   - A) Classifying text into categories you just invented
   - B) Generating text continuations
   - C) Translating between languages

---

# Part 2: Basic Implementation

## Your First Zero-Shot Classification

Let's classify some text into categories we define on the fly:

In [None]:
# Create a zero-shot classification pipeline
classifier = pipeline("zero-shot-classification")

# Text to classify
text = "The new smartphone features a revolutionary camera system with 200MP resolution."

# Define candidate labels - you can use ANY labels!
candidate_labels = ["technology", "sports", "politics", "entertainment"]

# Classify
result = classifier(text, candidate_labels)

print("Zero-Shot Classification Result:")
print("="*60)
print(f"Text: \"{text}\"")
print(f"\nScores:")
for label, score in zip(result['labels'], result['scores']):
    bar = '*' * int(score * 40)
    print(f"  {label:15s} {score:.1%} {bar}")

### Understanding the Output

The pipeline returns a dictionary with:
- `sequence`: The input text
- `labels`: Labels sorted by score (highest first)
- `scores`: Probability scores for each label

In [None]:
# Examine the full output structure
print("Full Output Structure:")
print("="*50)
for key, value in result.items():
    if key == 'sequence':
        print(f"{key}: \"{value[:50]}...\"")
    else:
        print(f"{key}: {value}")

### Trying Different Label Sets

The magic of zero-shot: just change the labels!

In [None]:
# Same text, different label sets
text = "I can't believe how terrible the service was at this restaurant. Never going back!"

label_sets = {
    "Sentiment": ["positive", "negative", "neutral"],
    "Emotion": ["anger", "joy", "sadness", "surprise"],
    "Category": ["complaint", "compliment", "question", "suggestion"],
    "Urgency": ["urgent", "normal", "low priority"],
}

print(f"Text: \"{text}\"")
print("="*70)

for set_name, labels in label_sets.items():
    result = classifier(text, labels)
    top_label = result['labels'][0]
    top_score = result['scores'][0]
    
    print(f"\n[{set_name}]")
    print(f"  Labels: {labels}")
    print(f"  Prediction: {top_label} ({top_score:.1%})")

### Classifying Multiple Texts

In [None]:
# Classify multiple texts at once
texts = [
    "The Lakers dominated the fourth quarter to secure the championship.",
    "Congress passes new legislation on climate change initiatives.",
    "The movie broke box office records in its opening weekend.",
    "Scientists discover new exoplanet in habitable zone.",
    "Stock market reaches all-time high amid economic optimism.",
]

labels = ["sports", "politics", "entertainment", "science", "business"]

print("Multi-Text Classification:")
print("="*70)

for text in texts:
    result = classifier(text, labels)
    top = result['labels'][0]
    score = result['scores'][0]
    
    print(f"\n\"{text[:60]}...\"")
    print(f"  ‚Üí {top} ({score:.1%})")

---

## Exercise 1: News Article Classifier (Guided)

**Difficulty**: Basic | **Time**: 10-15 minutes

**Your task**: Build a news article classifier that can categorize headlines into different topics.

### Step 1: Create a news classifier function

In [None]:
def classify_news(headline, categories=None):
    """
    Classify a news headline into categories.
    
    Args:
        headline: The news headline text
        categories: List of possible categories (optional)
        
    Returns:
        dict with top category and all scores
    """
    if categories is None:
        categories = [
            "world news", 
            "business", 
            "technology", 
            "sports", 
            "entertainment",
            "science",
            "health"
        ]
    
    result = classifier(headline, categories)
    
    return {
        'headline': headline,
        'top_category': result['labels'][0],
        'confidence': result['scores'][0],
        'all_scores': dict(zip(result['labels'], result['scores'])),
    }


# Test with sample headlines
test_headlines = [
    "Tesla Stock Surges 15% After Record Q4 Deliveries",
    "WHO Declares End of Global Health Emergency",
    "SpaceX Successfully Launches 50 More Starlink Satellites",
    "Olympic Committee Announces New Host City for 2036 Games",
]

print("News Article Classification:")
print("="*70)

for headline in test_headlines:
    result = classify_news(headline)
    print(f"\n\"{result['headline']}\"")
    print(f"  Category: {result['top_category']} ({result['confidence']:.1%})")

### Step 2: Show full score breakdown

In [None]:
# Detailed analysis of one headline
headline = "Major Tech Companies Report Mixed Earnings Amid AI Investment Surge"
result = classify_news(headline)

print(f"Detailed Analysis: \"{headline}\"")
print("="*60)

# Sort by score for display
sorted_scores = sorted(result['all_scores'].items(), key=lambda x: x[1], reverse=True)

for category, score in sorted_scores:
    bar = '*' * int(score * 40)
    marker = " ‚Üê TOP" if category == result['top_category'] else ""
    print(f"  {category:15s} {score:6.1%} {bar}{marker}")

### Step 3: Try your own headlines

In [None]:
# YOUR CODE HERE
# Write your own headlines and classify them

my_headline = "Your headline here"

# Uncomment to run:
# result = classify_news(my_headline)
# print(f"Category: {result['top_category']} ({result['confidence']:.1%})")

---

# Part 3: Intermediate Exploration

## Multi-Label Classification

Sometimes text belongs to multiple categories. Zero-shot handles this too!

In [None]:
# Multi-label example: a text can belong to multiple categories
text = "The tech billionaire's foundation announced a $500 million donation to climate research."

labels = ["technology", "business", "philanthropy", "environment", "politics"]

# Single-label (default): labels must sum to 1
single_result = classifier(text, labels, multi_label=False)

# Multi-label: each label is scored independently
multi_result = classifier(text, labels, multi_label=True)

print(f"Text: \"{text}\"")
print("="*70)

print("\n[SINGLE-LABEL] (scores sum to ~1)")
for label, score in zip(single_result['labels'], single_result['scores']):
    bar = '*' * int(score * 30)
    print(f"  {label:15s} {score:6.1%} {bar}")
print(f"  Sum: {sum(single_result['scores']):.2f}")

print("\n[MULTI-LABEL] (independent scores)")
for label, score in zip(multi_result['labels'], multi_result['scores']):
    bar = '*' * int(score * 30)
    applicable = " ‚Üê APPLIES" if score > 0.5 else ""
    print(f"  {label:15s} {score:6.1%} {bar}{applicable}")
print(f"  Sum: {sum(multi_result['scores']):.2f} (can be >1)")

### When to Use Multi-Label

| Scenario | Use | Example |
|----------|-----|--------|
| Mutually exclusive categories | `multi_label=False` | Sentiment: positive/negative |
| Overlapping categories | `multi_label=True` | Article topics: tech + business |
| Tag assignment | `multi_label=True` | Social media hashtag suggestions |

In [None]:
# Multi-label for content tagging
post = """
Just finished my morning run along the beach. The sunrise was incredible! 
Stopped for a healthy smoothie afterward. Feeling energized for the day. 
#mondaymotivation
"""

tags = ["fitness", "nature", "food", "travel", "lifestyle", "motivation"]

result = classifier(post, tags, multi_label=True)

print("Content Tagging (Multi-Label):")
print("="*60)
print(f"Post: {post.strip()[:80]}...")
print("\nSuggested Tags:")

# Show tags above threshold
threshold = 0.3
for label, score in zip(result['labels'], result['scores']):
    if score > threshold:
        print(f"  #{label} ({score:.1%})")

### Label Phrasing Matters

The way you phrase labels significantly affects results:

In [None]:
# Label phrasing comparison
text = "I waited 45 minutes for my food and it arrived cold."

label_variations = {
    "Simple": ["positive", "negative"],
    "Descriptive": ["customer is satisfied", "customer is dissatisfied"],
    "Emotional": ["happy experience", "frustrating experience"],
    "Action-oriented": ["would recommend", "would not recommend"],
}

print(f"Text: \"{text}\"")
print("="*70)

for style, labels in label_variations.items():
    result = classifier(text, labels)
    print(f"\n[{style}]")
    for label, score in zip(result['labels'], result['scores']):
        bar = '*' * int(score * 30)
        print(f"  {label:25s} {score:.1%} {bar}")

### Using Hypothesis Templates

You can customize how labels are converted to hypotheses:

In [None]:
# Custom hypothesis templates
text = "Breaking: Earthquake magnitude 6.5 strikes coastal region, tsunami warning issued."

labels = ["natural disaster", "crime", "politics", "sports"]

# Default template
result_default = classifier(text, labels)

# Custom template with hypothesis_template parameter
result_custom = classifier(
    text, 
    labels,
    hypothesis_template="This news article is about {}."
)

print(f"Text: \"{text}\"")
print("="*70)

print("\n[Default Template]")
for label, score in zip(result_default['labels'][:3], result_default['scores'][:3]):
    print(f"  {label:20s} {score:.1%}")

print("\n[Custom Template: 'This news article is about {}.']")
for label, score in zip(result_custom['labels'][:3], result_custom['scores'][:3]):
    print(f"  {label:20s} {score:.1%}")

---

## Exercise 2: Multi-Label Scenario (Semi-guided)

**Difficulty**: Intermediate | **Time**: 15-20 minutes

**Your task**: Build a function that assigns multiple relevant tags to content and filters by a confidence threshold.

**Hints**:
1. Use `multi_label=True` for independent scoring
2. Filter results by a confidence threshold
3. Return tags sorted by relevance

In [None]:
# YOUR CODE HERE

def auto_tag_content(text, available_tags, threshold=0.4, max_tags=5):
    """
    Automatically assign relevant tags to content.
    
    Args:
        text: The content to tag
        available_tags: List of possible tags
        threshold: Minimum confidence to include a tag
        max_tags: Maximum number of tags to return
        
    Returns:
        dict with selected tags and all scores
    """
    # Classify with multi-label mode
    result = classifier(text, available_tags, multi_label=True)
    
    # Filter by threshold and limit
    selected_tags = []
    for label, score in zip(result['labels'], result['scores']):
        if score >= threshold and len(selected_tags) < max_tags:
            selected_tags.append({'tag': label, 'confidence': score})
    
    return {
        'text_preview': text[:100] + '...' if len(text) > 100 else text,
        'selected_tags': selected_tags,
        'all_scores': dict(zip(result['labels'], result['scores'])),
    }


# Test with various content
test_contents = [
    {
        'text': """Just launched my new app! It uses AI to help people learn 
                   languages through conversation practice. Available on iOS and Android.""",
        'tags': ["technology", "education", "mobile apps", "artificial intelligence", 
                 "entrepreneurship", "marketing", "social media"]
    },
    {
        'text': """Recipe: Healthy quinoa bowl with roasted vegetables and tahini dressing. 
                   Perfect for meal prep! High in protein and takes only 30 minutes.""",
        'tags': ["food", "health", "recipes", "vegetarian", "meal prep", 
                 "quick meals", "nutrition"]
    },
]

print("Auto-Tagging Content:")
print("="*70)

for content in test_contents:
    result = auto_tag_content(content['text'], content['tags'], threshold=0.35)
    
    print(f"\nContent: \"{result['text_preview']}\"")
    print("\nSelected Tags:")
    for tag_info in result['selected_tags']:
        print(f"  #{tag_info['tag']} ({tag_info['confidence']:.1%})")
    print()

In [None]:
# Visualize all scores for one piece of content
content = test_contents[0]
result = auto_tag_content(content['text'], content['tags'], threshold=0.3)

print("Full Score Breakdown:")
print("="*60)

sorted_scores = sorted(result['all_scores'].items(), key=lambda x: x[1], reverse=True)

for tag, score in sorted_scores:
    bar = '*' * int(score * 40)
    selected = " [SELECTED]" if score >= 0.3 else ""
    print(f"  {tag:25s} {score:6.1%} {bar}{selected}")

---

# Part 4: Advanced Topics

## Comparing Different Models

Different zero-shot models have different strengths:

In [None]:
# The default model
print("Default zero-shot model:")
print(f"  {classifier.model.name_or_path}")

# You can load different models
# Popular options:
# - facebook/bart-large-mnli (default, good all-around)
# - MoritzLaworski/DeBERTa-v3-large-mnli-fever-anli-ling-wanli (higher accuracy)
# - cross-encoder/nli-deberta-v3-base (faster)

### Label Design Best Practices

The quality of labels significantly affects results:

In [None]:
# Label design examples
email_text = """
Hi Team,

The server is down and customers can't access their accounts. 
We need to fix this immediately - it's affecting thousands of users.

Please escalate to the on-call engineer ASAP.

Thanks,
Support Team
"""

# Poor labels: too vague or overlapping
poor_labels = ["important", "email", "message", "text"]

# Good labels: specific and distinct
good_labels = ["urgent technical issue", "meeting request", "general inquiry", "spam"]

# Better labels: action-oriented
best_labels = ["requires immediate action", "can wait until tomorrow", "informational only", "no response needed"]

print("Label Design Comparison:")
print("="*70)

for label_type, labels in [("Poor", poor_labels), ("Good", good_labels), ("Best", best_labels)]:
    result = classifier(email_text, labels)
    print(f"\n[{label_type} Labels]")
    print(f"  Labels: {labels}")
    print(f"  Top: {result['labels'][0]} ({result['scores'][0]:.1%})")

### Label Design Guidelines

| Guideline | Bad Example | Good Example |
|-----------|-------------|-------------|
| Be specific | "good", "bad" | "positive review", "negative review" |
| Avoid overlap | "urgent", "important" | "urgent action needed", "routine matter" |
| Use natural language | "cat1", "cat2" | "customer complaint", "product inquiry" |
| Match the domain | "happy", "sad" (for business) | "satisfied customer", "dissatisfied customer" |
| Be exhaustive | 2 labels for complex tasks | Cover all possible categories |

### Handling Edge Cases

In [None]:
# Edge case: Ambiguous text
ambiguous_texts = [
    "It was okay.",  # Neutral/unclear sentiment
    "Apple.",  # Missing context
    "üéâüéäü•≥",  # Only emojis
    "The bank was steep.",  # Word sense ambiguity
]

labels = ["positive", "negative", "neutral"]

print("Handling Ambiguous Text:")
print("="*60)

for text in ambiguous_texts:
    result = classifier(text, labels)
    
    # Check if classification is confident
    top_score = result['scores'][0]
    confidence_level = "High" if top_score > 0.7 else "Medium" if top_score > 0.5 else "Low"
    
    print(f"\n\"{text}\"")
    print(f"  Prediction: {result['labels'][0]} ({top_score:.1%})")
    print(f"  Confidence: {confidence_level}")
    if confidence_level == "Low":
        print(f"  ‚ö†Ô∏è Warning: Low confidence - consider manual review")

### Limitations of Zero-Shot Classification

| Limitation | Description | Mitigation |
|------------|-------------|------------|
| **Accuracy** | May be less accurate than fine-tuned models | Use for prototyping, fine-tune for production |
| **Speed** | Slower than traditional classifiers | Batch processing, model optimization |
| **Label sensitivity** | Results depend on label phrasing | Test multiple phrasings, use descriptive labels |
| **Complex reasoning** | Struggles with nuanced distinctions | Use more specific labels, combine with other methods |
| **Domain-specific** | General models may miss domain nuances | Consider domain-specific fine-tuning |

---

## Exercise 3: Label Phrasing Comparison (Independent)

**Difficulty**: Advanced | **Time**: 15-20 minutes

**Your task**: Build a class that tests different label phrasings and finds the most effective ones.

**Requirements**:
1. Test multiple phrasings for the same concept
2. Compare results across different test texts
3. Recommend the best phrasing based on consistency and confidence

In [None]:
# YOUR CODE HERE

class LabelOptimizer:
    """
    Tests different label phrasings to find the most effective ones.
    """
    
    def __init__(self):
        self.classifier = pipeline("zero-shot-classification")
    
    def compare_phrasings(self, test_texts, label_variations, expected_labels=None):
        """
        Compare different label phrasings across test texts.
        
        Args:
            test_texts: List of texts to classify
            label_variations: Dict of {concept: [phrasing1, phrasing2, ...]}
            expected_labels: Optional dict of {text: expected_concept}
            
        Returns:
            Comparison results with recommendations
        """
        results = []
        
        # Test each phrasing
        for concept, phrasings in label_variations.items():
            for phrasing in phrasings:
                phrasing_results = {
                    'concept': concept,
                    'phrasing': phrasing,
                    'scores': [],
                    'correct': 0,
                    'total': 0,
                }
                
                # All phrasings as labels (one from each concept)
                all_labels = [phrasings[0] for phrasings in label_variations.values()]
                # Replace with current phrasing being tested
                concept_idx = list(label_variations.keys()).index(concept)
                all_labels[concept_idx] = phrasing
                
                for text in test_texts:
                    result = self.classifier(text, all_labels)
                    # Find score for our phrasing
                    score_idx = result['labels'].index(phrasing)
                    score = result['scores'][score_idx]
                    phrasing_results['scores'].append(score)
                    
                    # Check accuracy if expected labels provided
                    if expected_labels and text in expected_labels:
                        phrasing_results['total'] += 1
                        if expected_labels[text] == concept and result['labels'][0] == phrasing:
                            phrasing_results['correct'] += 1
                
                # Calculate statistics
                phrasing_results['avg_score'] = sum(phrasing_results['scores']) / len(phrasing_results['scores'])
                phrasing_results['consistency'] = 1 - (max(phrasing_results['scores']) - min(phrasing_results['scores']))
                
                results.append(phrasing_results)
        
        return results
    
    def get_recommendations(self, comparison_results):
        """
        Recommend the best phrasing for each concept.
        """
        recommendations = {}
        
        # Group by concept
        concepts = set(r['concept'] for r in comparison_results)
        
        for concept in concepts:
            concept_results = [r for r in comparison_results if r['concept'] == concept]
            # Sort by average score (higher is better)
            best = max(concept_results, key=lambda x: x['avg_score'])
            recommendations[concept] = {
                'best_phrasing': best['phrasing'],
                'avg_score': best['avg_score'],
                'consistency': best['consistency'],
            }
        
        return recommendations


# Create optimizer
optimizer = LabelOptimizer()

# Test texts
test_texts = [
    "This product exceeded my expectations! Highly recommend.",
    "Terrible experience. The item broke after one day.",
    "It works fine. Nothing special but gets the job done.",
    "Absolutely love it! Best purchase I've made this year.",
    "Very disappointed. Would not buy again.",
]

# Different phrasings for the same concepts
label_variations = {
    'positive': ["positive", "satisfied", "happy customer", "positive review"],
    'negative': ["negative", "dissatisfied", "unhappy customer", "negative review"],
    'neutral': ["neutral", "mixed feelings", "average opinion", "neutral review"],
}

# Expected labels for accuracy checking
expected = {
    "This product exceeded my expectations! Highly recommend.": "positive",
    "Terrible experience. The item broke after one day.": "negative",
    "It works fine. Nothing special but gets the job done.": "neutral",
    "Absolutely love it! Best purchase I've made this year.": "positive",
    "Very disappointed. Would not buy again.": "negative",
}

print("Testing label phrasings...")
results = optimizer.compare_phrasings(test_texts, label_variations, expected)
recommendations = optimizer.get_recommendations(results)

print("\nLabel Phrasing Recommendations:")
print("="*60)

for concept, rec in recommendations.items():
    print(f"\n[{concept.upper()}]")
    print(f"  Best phrasing: \"{rec['best_phrasing']}\"")
    print(f"  Average score: {rec['avg_score']:.1%}")
    print(f"  Consistency: {rec['consistency']:.1%}")

In [None]:
# Show detailed comparison for one concept
print("\nDetailed Comparison for 'positive' concept:")
print("="*60)

positive_results = [r for r in results if r['concept'] == 'positive']
positive_results.sort(key=lambda x: x['avg_score'], reverse=True)

for r in positive_results:
    bar = '*' * int(r['avg_score'] * 40)
    print(f"  \"{r['phrasing']:20s}\" avg: {r['avg_score']:6.1%} {bar}")

---

# Part 5: Mini-Project

## Project: Custom Content Tagger

**Scenario**: You're building a social media management tool that automatically tags posts with relevant categories for organization and analytics.

**Your goal**: Build a `CustomContentTagger` class that:
1. Supports user-defined tag categories
2. Handles multi-label tagging
3. Provides confidence scores and filtering
4. Suggests hashtags based on classifications

In [None]:
# MINI-PROJECT: Custom Content Tagger
# ====================================

class CustomContentTagger:
    """
    Automatically tags social media content with user-defined categories.
    """
    
    # Default category presets
    PRESETS = {
        'social_media': {
            'categories': [
                'lifestyle', 'food', 'travel', 'fitness', 'fashion',
                'technology', 'business', 'education', 'entertainment', 'news'
            ],
            'hashtag_map': {
                'lifestyle': ['#lifestyle', '#dailylife', '#life'],
                'food': ['#foodie', '#food', '#yummy'],
                'travel': ['#travel', '#wanderlust', '#explore'],
                'fitness': ['#fitness', '#workout', '#health'],
                'fashion': ['#fashion', '#style', '#ootd'],
                'technology': ['#tech', '#innovation', '#digital'],
                'business': ['#business', '#entrepreneur', '#success'],
                'education': ['#learning', '#education', '#knowledge'],
                'entertainment': ['#entertainment', '#fun', '#music'],
                'news': ['#news', '#breaking', '#current'],
            }
        },
        'customer_support': {
            'categories': [
                'complaint', 'question', 'feedback', 'compliment',
                'bug report', 'feature request', 'billing issue'
            ],
            'priority_map': {
                'complaint': 'high',
                'bug report': 'high',
                'billing issue': 'high',
                'question': 'medium',
                'feature request': 'low',
                'feedback': 'low',
                'compliment': 'low',
            }
        }
    }
    
    def __init__(self, preset='social_media', custom_categories=None):
        """
        Initialize the content tagger.
        
        Args:
            preset: Use a preset configuration ('social_media', 'customer_support')
            custom_categories: Override with custom categories
        """
        self.classifier = pipeline("zero-shot-classification")
        
        if custom_categories:
            self.categories = custom_categories
            self.preset_config = None
        elif preset in self.PRESETS:
            self.categories = self.PRESETS[preset]['categories']
            self.preset_config = self.PRESETS[preset]
        else:
            raise ValueError(f"Unknown preset: {preset}")
    
    def tag(self, content, threshold=0.3, max_tags=3, multi_label=True):
        """
        Tag content with relevant categories.
        
        Args:
            content: The text content to tag
            threshold: Minimum confidence for a tag
            max_tags: Maximum number of tags to assign
            multi_label: Allow multiple tags
            
        Returns:
            TagResult with tags, scores, and suggestions
        """
        result = self.classifier(
            content, 
            self.categories, 
            multi_label=multi_label
        )
        
        # Filter and limit tags
        selected_tags = []
        for label, score in zip(result['labels'], result['scores']):
            if score >= threshold and len(selected_tags) < max_tags:
                selected_tags.append({'tag': label, 'confidence': score})
        
        # Generate hashtag suggestions if available
        hashtags = []
        if self.preset_config and 'hashtag_map' in self.preset_config:
            for tag_info in selected_tags:
                if tag_info['tag'] in self.preset_config['hashtag_map']:
                    hashtags.extend(self.preset_config['hashtag_map'][tag_info['tag']][:2])
        
        # Get priority if available
        priority = None
        if self.preset_config and 'priority_map' in self.preset_config:
            for tag_info in selected_tags:
                if tag_info['tag'] in self.preset_config['priority_map']:
                    tag_priority = self.preset_config['priority_map'][tag_info['tag']]
                    if priority is None or \
                       (['low', 'medium', 'high'].index(tag_priority) > 
                        ['low', 'medium', 'high'].index(priority or 'low')):
                        priority = tag_priority
        
        return {
            'content_preview': content[:100] + '...' if len(content) > 100 else content,
            'tags': selected_tags,
            'hashtags': list(set(hashtags))[:5],
            'priority': priority,
            'all_scores': dict(zip(result['labels'], result['scores'])),
        }
    
    def batch_tag(self, contents, **kwargs):
        """
        Tag multiple pieces of content.
        """
        return [self.tag(content, **kwargs) for content in contents]
    
    def format_result(self, result):
        """
        Format a tag result for display.
        """
        lines = []
        lines.append(f"Content: \"{result['content_preview']}\"")
        lines.append("")
        lines.append("Tags:")
        for tag_info in result['tags']:
            lines.append(f"  - {tag_info['tag']} ({tag_info['confidence']:.1%})")
        
        if result['hashtags']:
            lines.append("")
            lines.append(f"Suggested Hashtags: {' '.join(result['hashtags'])}")
        
        if result['priority']:
            lines.append(f"Priority: {result['priority'].upper()}")
        
        return '\n'.join(lines)


# Create a social media tagger
tagger = CustomContentTagger(preset='social_media')

print("Custom Content Tagger - Social Media Mode")
print(f"Categories: {tagger.categories}")

In [None]:
# Test with social media posts
posts = [
    "Just got back from an amazing week in Bali! The beaches were incredible and the food was even better. Highly recommend the local warung restaurants! üå¥üçú",
    
    "Completed my first marathon today! 26.2 miles of pure determination. Training for 6 months was worth every early morning run. üèÉ‚Äç‚ôÄÔ∏èüí™",
    
    "Excited to announce that our startup just closed a $5M Series A! Thanks to our incredible team and investors who believed in our vision. üöÄ",
    
    "New recipe alert! Made the most delicious vegan pasta with homemade cashew cream sauce. Simple ingredients, amazing taste. Recipe in comments! üçù",
]

print("\nTagging Social Media Posts:")
print("="*70)

for post in posts:
    result = tagger.tag(post, threshold=0.25, max_tags=3)
    print("\n" + "-"*60)
    print(tagger.format_result(result))

In [None]:
# Try customer support mode
support_tagger = CustomContentTagger(preset='customer_support')

support_messages = [
    "I've been charged twice for my subscription this month. Please fix this immediately.",
    
    "The app keeps crashing whenever I try to upload photos. Using iPhone 14 with latest iOS.",
    
    "Would be great if you could add dark mode to the app. It would make late night browsing much easier!",
    
    "Your customer service team was incredibly helpful! Shoutout to Sarah who resolved my issue in minutes.",
]

print("Customer Support Message Classification:")
print("="*70)

for message in support_messages:
    result = support_tagger.tag(message, threshold=0.3, max_tags=2)
    print("\n" + "-"*60)
    print(support_tagger.format_result(result))

In [None]:
# Try with custom categories
custom_tagger = CustomContentTagger(
    custom_categories=[
        "product review",
        "how-to tutorial",
        "personal story",
        "promotional content",
        "opinion piece"
    ]
)

custom_content = [
    "After using this laptop for 3 months, here's my honest opinion: the battery life is incredible but the keyboard could be better.",
    
    "Step 1: Open the settings menu. Step 2: Click on Privacy. Step 3: Toggle off location services.",
    
    "Limited time offer! Use code SAVE50 for 50% off all products. Don't miss out!",
]

print("Custom Category Classification:")
print("="*70)

for content in custom_content:
    result = custom_tagger.tag(content, threshold=0.3)
    print(f"\n\"{content[:70]}...\"")
    for tag_info in result['tags']:
        print(f"  ‚Üí {tag_info['tag']} ({tag_info['confidence']:.1%})")

In [None]:
# Try your own content and categories
# Uncomment and modify:

# my_tagger = CustomContentTagger(custom_categories=["your", "categories", "here"])
# my_result = my_tagger.tag("Your content here")
# print(my_tagger.format_result(my_result))

### Extension Ideas

If you want to extend this project further:

1. **Sentiment overlay**: Add sentiment analysis to each tagged post
2. **Trending detection**: Track tag frequency over time to identify trends
3. **Auto-routing**: Route content to different teams based on tags
4. **Custom templates**: Allow users to specify hypothesis templates
5. **Hierarchical tagging**: Support parent/child category relationships

---

# Part 6: Wrap-Up

## Key Takeaways

1. **Zero-shot classification** lets you classify text without task-specific training

2. **NLI-based approach** converts classification into entailment testing:
   - Text becomes the premise
   - Labels become hypotheses
   - Highest entailment score wins

3. **Label design matters**:
   - Use specific, descriptive labels
   - Avoid overlapping categories
   - Test multiple phrasings

4. **Multi-label mode** (`multi_label=True`):
   - Scores each label independently
   - Use when categories can overlap
   - Set appropriate thresholds

5. **Hypothesis templates** can improve accuracy for specific domains

## Common Mistakes to Avoid

| Mistake | Why It's a Problem |
|---------|-------------------|
| Vague labels like "good" or "bad" | Model can't distinguish effectively |
| Too many overlapping categories | Scores become unreliable |
| Ignoring confidence scores | Low-confidence predictions may be wrong |
| Using single-label for overlapping concepts | Misses valid secondary categories |

## What's Next?

In **Notebook 7: Translation**, you'll learn:
- How encoder-decoder models handle translation
- Working with multiple language pairs
- Evaluating translation quality

Translation uses similar encoder-decoder architecture to summarization but for cross-lingual tasks!

---

## Solutions

### Check Your Understanding (Quiz Answers)

1. **B) Because it needs zero task-specific training examples** - You don't need labeled data for your specific task
2. **B) Entailment** - The model checks if the text "entails" (implies) each label
3. **B) To convert labels into testable sentences** - Labels become hypotheses for NLI testing
4. **A) Classifying text into categories you just invented** - This is the power of zero-shot classification

### Exercise 2: Multi-Label Tagging (Key Insights)

In [None]:
# Key insights from multi-label tagging:

# 1. Use multi_label=True when:
#    - Categories can overlap (e.g., "tech" AND "business")
#    - You want to assign multiple tags
#    - Content can belong to several categories

# 2. Set appropriate thresholds:
#    - 0.5+ for high confidence only
#    - 0.3-0.5 for balanced coverage
#    - 0.2-0.3 for broad tagging

# 3. Limit max_tags to avoid noise:
#    - 3-5 tags usually sufficient
#    - Too many tags reduce signal

threshold_guide = {
    'strict': {'threshold': 0.6, 'use_case': 'High-stakes classification'},
    'balanced': {'threshold': 0.4, 'use_case': 'General content tagging'},
    'broad': {'threshold': 0.25, 'use_case': 'Exploratory tagging, suggestions'},
}

print("Threshold Selection Guide:")
print("="*60)
for level, config in threshold_guide.items():
    print(f"  {level:10s}: threshold={config['threshold']} - {config['use_case']}")

---

## Additional Resources

- [Hugging Face Zero-Shot Classification](https://huggingface.co/tasks/zero-shot-classification)
- [BART Paper](https://arxiv.org/abs/1910.13461) - Base model for many zero-shot classifiers
- [Natural Language Inference Explained](https://nlp.stanford.edu/projects/snli/) - Stanford NLI dataset
- [Zero-Shot Text Classification Blog](https://joeddav.github.io/blog/2020/05/29/ZSL.html) - In-depth tutorial
- [Multi-NLI Dataset](https://cims.nyu.edu/~sbowman/multinli/) - Training data for NLI models