# Week 5: Exploring Hugging Face Models
## The App Store for AI

**Today's Goals:**
1. Navigate the Hugging Face Model Hub
2. Understand different AI tasks (text, image, audio)
3. Try 5+ different model types
4. Learn to find models that work on free Colab

---

## Part 1: The Hugging Face Model Hub

**Hugging Face** hosts over 500,000 AI models. Think of it as:
- An **app store** for AI models
- Each model is trained to do a specific task
- Most are free to use!

**Go to:** [huggingface.co/models](https://huggingface.co/models)

### How to Find Models:
1. **By Task** - What do you want to do? (classify text, generate images, etc.)
2. **By Size** - Smaller models work better on free Colab
3. **By Downloads** - Popular models are usually reliable

## Setup: Install Libraries

Run this first:

In [None]:
# Install required libraries
!pip install transformers -q
!pip install sentencepiece -q  # Needed for some models

print("Libraries installed!")

---
## Part 2: The Pipeline Magic

The `pipeline()` function makes using models super easy:

```python
from transformers import pipeline

# Just tell it what task you want!
my_model = pipeline("task-name")
result = my_model("your input")
```

Let's explore different tasks!

---
## Task 1: Sentiment Analysis (Review)

We did this in Week 2 - determines if text is positive or negative.

In [None]:
from transformers import pipeline

# Create sentiment analyzer
sentiment = pipeline("sentiment-analysis")

# Try it
texts = [
    "I love this course!",
    "This is so frustrating.",
    "The weather is nice today."
]

for text in texts:
    result = sentiment(text)[0]
    print(f"{text}")
    print(f"  → {result['label']} ({result['score']:.1%})\n")

---
## Task 2: Text Classification

Classify text into categories. We can even define our own categories!

In [None]:
# Zero-shot classification - define your own categories!
classifier = pipeline("zero-shot-classification")

text = "I need to finish my homework before the game tonight."
categories = ["school", "sports", "entertainment", "work"]

result = classifier(text, categories)

print(f"Text: {text}\n")
print("Categories:")
for label, score in zip(result['labels'], result['scores']):
    print(f"  {label}: {score:.1%}")

### Try It Yourself!

Change the text and categories to classify something interesting:

In [None]:
# Your turn! Change the text and categories
my_text = "The new iPhone has an amazing camera."
my_categories = ["technology", "fashion", "food", "travel"]

result = classifier(my_text, my_categories)

print(f"Text: {my_text}\n")
for label, score in zip(result['labels'], result['scores']):
    print(f"  {label}: {score:.1%}")

---
## Task 3: Question Answering

Give the model some text, then ask questions about it!

In [None]:
# Question answering
qa = pipeline("question-answering")

# Give it some context
context = """
The Eiffel Tower is a famous landmark in Paris, France. 
It was built in 1889 and is 330 meters tall. 
The tower was designed by Gustave Eiffel and took about 2 years to build.
Every year, about 7 million people visit the tower.
"""

# Ask questions
questions = [
    "Where is the Eiffel Tower?",
    "When was it built?",
    "How tall is it?",
    "Who designed it?"
]

print("Context: (Eiffel Tower facts)\n")
for q in questions:
    result = qa(question=q, context=context)
    print(f"Q: {q}")
    print(f"A: {result['answer']} (confidence: {result['score']:.1%})\n")

### Try It Yourself!

Write your own context (maybe about a topic you know well) and ask questions:

In [None]:
# Your turn!
my_context = """
Write some facts about something here.
Maybe your favorite game, movie, or sport.
Include specific details like names, dates, and numbers.
"""

my_question = "Your question here?"

result = qa(question=my_question, context=my_context)
print(f"Q: {my_question}")
print(f"A: {result['answer']}")

---
## Task 4: Summarization

Make long text shorter while keeping the main points!

In [None]:
# Summarization (using a smaller model that works on free Colab)
summarizer = pipeline("summarization", model="sshleifer/distilbart-cnn-12-6")

long_text = """
Artificial intelligence has made remarkable progress in recent years, transforming 
numerous industries and aspects of daily life. Machine learning algorithms now power 
everything from recommendation systems on streaming platforms to medical diagnosis tools. 
Self-driving cars use AI to navigate roads, while virtual assistants like Siri and Alexa 
help millions of people with daily tasks. In education, AI tutoring systems can adapt to 
individual student needs, providing personalized learning experiences. However, these 
advances also raise important questions about privacy, job displacement, and the need 
for ethical guidelines in AI development. As AI continues to evolve, society must 
carefully consider both its benefits and potential risks.
"""

summary = summarizer(long_text, max_length=50, min_length=20)

print("Original text:")
print(long_text[:200] + "...\n")
print("Summary:")
print(summary[0]['summary_text'])

---
## Task 5: Translation

Translate text between languages!

In [None]:
# English to French translation
translator = pipeline("translation_en_to_fr")

english_texts = [
    "Hello, how are you?",
    "I love learning about artificial intelligence.",
    "The weather is beautiful today."
]

print("English → French:\n")
for text in english_texts:
    result = translator(text)
    print(f"EN: {text}")
    print(f"FR: {result[0]['translation_text']}\n")

In [None]:
# English to German
translator_de = pipeline("translation_en_to_de")

text = "Artificial intelligence is changing the world."
result = translator_de(text)

print(f"EN: {text}")
print(f"DE: {result[0]['translation_text']}")

---
## Task 6: Named Entity Recognition (NER)

Find names, places, organizations, and other entities in text!

In [None]:
# Named Entity Recognition
ner = pipeline("ner", grouped_entities=True)

text = "Apple CEO Tim Cook announced a new iPhone at the event in California last Tuesday."

entities = ner(text)

print(f"Text: {text}\n")
print("Entities found:")
for entity in entities:
    print(f"  {entity['word']}: {entity['entity_group']} ({entity['score']:.1%})")

### Entity Types:
- **PER** - Person names
- **ORG** - Organizations
- **LOC** - Locations
- **MISC** - Miscellaneous

---
## Part 3: Finding the Right Model

### How to Browse Hugging Face:

1. Go to [huggingface.co/models](https://huggingface.co/models)
2. Use the **Tasks** filter on the left
3. Sort by **Most Downloads** for reliable models
4. Look for model size (smaller = faster, works on free Colab)

### Free Colab Friendly Models:

| Task | Good Model | Size |
|------|-----------|------|
| Sentiment | default pipeline | ~250MB |
| Zero-shot | facebook/bart-large-mnli | ~400MB |
| Q&A | distilbert-base | ~250MB |
| Summarization | sshleifer/distilbart-cnn-12-6 | ~300MB |
| Translation | Helsinki-NLP models | ~300MB |

---
## Part 4: The Input → Model → Output Pattern

Every AI model follows this pattern:

```
INPUT          →    MODEL           →    OUTPUT
(your data)         (the AI brain)       (results)
```

**Examples:**

| Task | Input | Output |
|------|-------|--------|
| Sentiment | "I love this!" | POSITIVE (95%) |
| Translation | "Hello" | "Bonjour" |
| Summarization | Long article | Short summary |
| Q&A | Question + Context | Answer |
| NER | Sentence | List of entities |

---
## Challenge: Build a Text Analysis Tool

Use AI to help you combine multiple models into one analysis tool!

In [None]:
# Mini project: Analyze any text with multiple models

def analyze_text(text):
    """Analyze text using multiple AI models."""
    print(f"Analyzing: \"{text}\"\n")
    print("=" * 50)
    
    # Sentiment
    result = sentiment(text)[0]
    print(f"\n1. SENTIMENT: {result['label']} ({result['score']:.1%})")
    
    # Entities
    entities = ner(text)
    if entities:
        print(f"\n2. ENTITIES FOUND:")
        for e in entities:
            print(f"   - {e['word']}: {e['entity_group']}")
    else:
        print(f"\n2. ENTITIES: None found")
    
    # Classification
    categories = ["news", "opinion", "question", "statement"]
    result = classifier(text, categories)
    print(f"\n3. TYPE: {result['labels'][0]} ({result['scores'][0]:.1%})")
    
    print("\n" + "=" * 50)

# Try it!
analyze_text("Elon Musk announced that Tesla will build a new factory in Texas next year.")

In [None]:
# Try analyzing your own text!
my_text = "Type your own text here to analyze!"
analyze_text(my_text)

---
## Checklist: What You Learned Today

- [ ] How to navigate the Hugging Face Model Hub
- [ ] The `pipeline()` function for easy model use
- [ ] Different AI tasks: sentiment, classification, Q&A, summarization, translation, NER
- [ ] The Input → Model → Output pattern
- [ ] How to find models that work on free Colab

---

## Pipeline Quick Reference

```python
from transformers import pipeline

# Available tasks:
sentiment = pipeline("sentiment-analysis")
classifier = pipeline("zero-shot-classification")
qa = pipeline("question-answering")
summarizer = pipeline("summarization")
translator = pipeline("translation_en_to_fr")
ner = pipeline("ner")
```

---
## Looking Ahead: Next Week

Next week we'll go deeper:
- Load models manually (more control!)
- Understand tokenizers and model components
- Customize model behavior

**Homework (optional):**
- Try a model we didn't cover (check the Model Hub!)
- Expand the text analyzer with more features
- Save your work to GitHub!

---

*Youth Horizons AI Researcher Program - Level 2*