# Weeks 5-6: Images & Vision Track
## Deep Dive into Visual AI

**Track Focus:** Build tools that understand, classify, and work with images.

This notebook covers TWO weeks. Work through it at your own pace, but aim to:
- Complete Part 1-3 by end of Week 5
- Complete Part 4-5 and start your project by end of Week 6

---

## Setup

In [None]:
# Install everything we need
!pip install transformers torch pillow requests timm -q
print("Ready!")

In [None]:
from transformers import pipeline
from PIL import Image
import requests
from io import BytesIO

# Helper function to load images from URLs
def load_image(url):
    response = requests.get(url)
    return Image.open(BytesIO(response.content))

print("Imports complete!")

---

# Part 1: Image Classification Deep Dive (45 min)

## Understanding What AI Sees

Image classifiers assign labels to entire images. Let's explore different models.

In [None]:
# Load a general image classifier
classifier = pipeline("image-classification", model="google/vit-base-patch16-224")
print("Classifier loaded!")

In [None]:
# Test with an image
image_url = "https://upload.wikimedia.org/wikipedia/commons/thumb/4/4d/Cat_November_2010-1a.jpg/1200px-Cat_November_2010-1a.jpg"
image = load_image(image_url)

# Show the image
display(image.resize((300, 300)))

# Classify it
results = classifier(image)

print("\nTop 5 predictions:")
for r in results[:5]:
    confidence_bar = "*" * int(r['score'] * 20)
    print(f"  {r['label'][:30]:30} {confidence_bar} ({r['score']:.1%})")

## Exercise: Test the Classifier's Limits

In [None]:
# Test with various images - find what confuses it!
test_images = {
    "dog": "https://upload.wikimedia.org/wikipedia/commons/thumb/2/26/YellowLabradorLooking_new.jpg/1200px-YellowLabradorLooking_new.jpg",
    "car": "https://upload.wikimedia.org/wikipedia/commons/thumb/1/1b/2019_Honda_Civic_sedan_%28facelift%29%2C_front_8.27.19.jpg/1200px-2019_Honda_Civic_sedan_%28facelift%29%2C_front_8.27.19.jpg",
    "food": "https://upload.wikimedia.org/wikipedia/commons/thumb/6/6d/Good_Food_Display_-_NCI_Visuals_Online.jpg/800px-Good_Food_Display_-_NCI_Visuals_Online.jpg",
    "building": "https://upload.wikimedia.org/wikipedia/commons/thumb/a/a9/Empire_State_Building_by_David_Shankbone_crop.jpg/800px-Empire_State_Building_by_David_Shankbone_crop.jpg"
}

for name, url in test_images.items():
    print(f"\n{'='*40}")
    print(f"Testing: {name}")
    try:
        image = load_image(url)
        results = classifier(image)
        top = results[0]
        print(f"Predicted: {top['label']} ({top['score']:.1%})")
        print(f"Correct? {'Yes' if name.lower() in top['label'].lower() else 'Maybe?'}")
    except Exception as e:
        print(f"Error: {e}")

In [None]:
# YOUR TURN: Test with your own image URLs!
# Find images online and paste the URLs here

my_image_url = ""  # Paste an image URL here

if my_image_url:
    image = load_image(my_image_url)
    display(image.resize((300, 300)))
    results = classifier(image)
    print("\nTop predictions:")
    for r in results[:3]:
        print(f"  {r['label']}: {r['score']:.1%}")

---

# Part 2: Zero-Shot Image Classification (45 min)

Just like with text, you can classify images into ANY categories you define!

In [None]:
# Load zero-shot image classifier
zero_shot = pipeline("zero-shot-image-classification", model="openai/clip-vit-base-patch32")
print("Zero-shot classifier loaded!")

In [None]:
# Example: Classify a pet photo with custom categories
image_url = "https://upload.wikimedia.org/wikipedia/commons/thumb/4/4d/Cat_November_2010-1a.jpg/1200px-Cat_November_2010-1a.jpg"
image = load_image(image_url)
display(image.resize((200, 200)))

# YOUR categories!
categories = ["happy pet", "sleepy pet", "curious pet", "scared pet", "playful pet"]

results = zero_shot(image, candidate_labels=categories)

print("\nMood analysis:")
for r in results:
    bar = "*" * int(r['score'] * 20)
    print(f"  {r['label']:15} {bar} ({r['score']:.1%})")

In [None]:
# Another example: Classify by style
style_categories = ["photograph", "painting", "drawing", "cartoon", "digital art"]

# Test with a painting
painting_url = "https://upload.wikimedia.org/wikipedia/commons/thumb/e/ec/Mona_Lisa%2C_by_Leonardo_da_Vinci%2C_from_C2RMF_retouched.jpg/800px-Mona_Lisa%2C_by_Leonardo_da_Vinci%2C_from_C2RMF_retouched.jpg"
painting = load_image(painting_url)
display(painting.resize((200, 200)))

results = zero_shot(painting, candidate_labels=style_categories)

print("\nStyle analysis:")
for r in results:
    print(f"  {r['label']}: {r['score']:.1%}")

## Exercise: Build an Image Mood Detector

In [None]:
def analyze_image_mood(image_url):
    """
    Analyze an image for mood, style, and content.
    """
    image = load_image(image_url)
    display(image.resize((250, 250)))
    
    print("\n" + "="*40)
    print("IMAGE ANALYSIS")
    print("="*40)
    
    # Mood analysis
    moods = ["happy", "sad", "peaceful", "dramatic", "energetic", "mysterious"]
    mood_result = zero_shot(image, candidate_labels=moods)
    print(f"\nMood: {mood_result[0]['label']} ({mood_result[0]['score']:.1%})")
    
    # Color temperature
    colors = ["warm colors", "cool colors", "neutral colors", "vibrant colors", "muted colors"]
    color_result = zero_shot(image, candidate_labels=colors)
    print(f"Colors: {color_result[0]['label']} ({color_result[0]['score']:.1%})")
    
    # Setting
    settings = ["indoor", "outdoor", "urban", "nature", "abstract"]
    setting_result = zero_shot(image, candidate_labels=settings)
    print(f"Setting: {setting_result[0]['label']} ({setting_result[0]['score']:.1%})")
    
    # Standard classification for content
    content_result = classifier(image)
    print(f"Content: {content_result[0]['label']} ({content_result[0]['score']:.1%})")
    
    return {
        'mood': mood_result[0]['label'],
        'colors': color_result[0]['label'],
        'setting': setting_result[0]['label'],
        'content': content_result[0]['label']
    }

# Test it!
analyze_image_mood("https://upload.wikimedia.org/wikipedia/commons/thumb/b/b6/Image_created_with_a_mobile_phone.png/1200px-Image_created_with_a_mobile_phone.png")

In [None]:
# YOUR TURN: Analyze different types of images
# Try: sunset photo, sports photo, food photo, artwork

your_image_url = ""  # Paste URL here

if your_image_url:
    analyze_image_mood(your_image_url)

---

# Part 3: Image Captioning (45 min)

AI that describes what it sees in images.

In [None]:
# Load image captioning model
captioner = pipeline("image-to-text", model="Salesforce/blip-image-captioning-base")
print("Caption model loaded!")

In [None]:
# Generate a caption
image_url = "https://upload.wikimedia.org/wikipedia/commons/thumb/3/3a/Cat03.jpg/1200px-Cat03.jpg"
image = load_image(image_url)
display(image.resize((300, 300)))

caption = captioner(image)[0]['generated_text']
print(f"\nCaption: {caption}")

In [None]:
# Caption multiple images
images_to_caption = [
    "https://upload.wikimedia.org/wikipedia/commons/thumb/2/26/YellowLabradorLooking_new.jpg/1200px-YellowLabradorLooking_new.jpg",
    "https://upload.wikimedia.org/wikipedia/commons/thumb/a/a3/Eq_it-na_pizza-margherita_sep2005_sml.jpg/1200px-Eq_it-na_pizza-margherita_sep2005_sml.jpg",
    "https://upload.wikimedia.org/wikipedia/commons/thumb/1/1b/2019_Honda_Civic_sedan_%28facelift%29%2C_front_8.27.19.jpg/1200px-2019_Honda_Civic_sedan_%28facelift%29%2C_front_8.27.19.jpg"
]

for url in images_to_caption:
    print("\n" + "="*40)
    image = load_image(url)
    display(image.resize((200, 200)))
    caption = captioner(image)[0]['generated_text']
    print(f"Caption: {caption}")

## Exercise: Build an Automatic Photo Album Organizer

In [None]:
def organize_photos(image_urls):
    """
    Organize photos by automatically categorizing and captioning them.
    """
    print("PHOTO ALBUM ORGANIZER")
    print("="*50)
    
    categories = {
        'people': [],
        'animals': [],
        'food': [],
        'nature': [],
        'objects': [],
        'other': []
    }
    
    for i, url in enumerate(image_urls, 1):
        print(f"\n--- Photo {i} ---")
        try:
            image = load_image(url)
            
            # Get caption
            caption = captioner(image)[0]['generated_text']
            print(f"Caption: {caption}")
            
            # Categorize using zero-shot
            cat_labels = list(categories.keys())[:-1]  # Exclude 'other'
            result = zero_shot(image, candidate_labels=cat_labels)
            best_category = result[0]['label']
            
            if result[0]['score'] > 0.3:
                categories[best_category].append({'url': url, 'caption': caption})
                print(f"Category: {best_category}")
            else:
                categories['other'].append({'url': url, 'caption': caption})
                print(f"Category: other (low confidence)")
                
        except Exception as e:
            print(f"Error: {e}")
    
    # Summary
    print("\n" + "="*50)
    print("ALBUM SUMMARY")
    for cat, photos in categories.items():
        if photos:
            print(f"\n{cat.upper()}: {len(photos)} photo(s)")
            for p in photos:
                print(f"  - {p['caption']}")
    
    return categories

# Test it!
test_photos = [
    "https://upload.wikimedia.org/wikipedia/commons/thumb/4/4d/Cat_November_2010-1a.jpg/1200px-Cat_November_2010-1a.jpg",
    "https://upload.wikimedia.org/wikipedia/commons/thumb/a/a3/Eq_it-na_pizza-margherita_sep2005_sml.jpg/1200px-Eq_it-na_pizza-margherita_sep2005_sml.jpg",
    "https://upload.wikimedia.org/wikipedia/commons/thumb/b/b6/Image_created_with_a_mobile_phone.png/1200px-Image_created_with_a_mobile_phone.png"
]

organize_photos(test_photos)

---

# Part 4: Combining Vision Capabilities (45 min)

Create powerful tools by combining multiple vision models.

In [None]:
def comprehensive_image_analysis(image_url):
    """
    Complete analysis of an image using multiple AI capabilities.
    """
    print("COMPREHENSIVE IMAGE ANALYSIS")
    print("="*60)
    
    image = load_image(image_url)
    display(image.resize((300, 300)))
    
    results = {}
    
    # 1. Caption
    print("\n1. DESCRIPTION")
    caption = captioner(image)[0]['generated_text']
    results['caption'] = caption
    print(f"   {caption}")
    
    # 2. Classification
    print("\n2. CONTENT CLASSIFICATION")
    classification = classifier(image)[:3]
    results['classification'] = classification
    for c in classification:
        print(f"   {c['label']}: {c['score']:.1%}")
    
    # 3. Mood/Atmosphere
    print("\n3. MOOD & ATMOSPHERE")
    moods = ["happy", "peaceful", "dramatic", "mysterious", "energetic", "melancholic"]
    mood_result = zero_shot(image, candidate_labels=moods)
    results['mood'] = mood_result[0]['label']
    print(f"   Mood: {mood_result[0]['label']} ({mood_result[0]['score']:.1%})")
    
    # 4. Color palette
    print("\n4. COLOR ANALYSIS")
    colors = ["warm tones", "cool tones", "vibrant colors", "muted colors", "black and white"]
    color_result = zero_shot(image, candidate_labels=colors)
    results['colors'] = color_result[0]['label']
    print(f"   Dominant: {color_result[0]['label']}")
    
    # 5. Style/Type
    print("\n5. IMAGE STYLE")
    styles = ["photograph", "painting", "illustration", "digital art", "screenshot"]
    style_result = zero_shot(image, candidate_labels=styles)
    results['style'] = style_result[0]['label']
    print(f"   Type: {style_result[0]['label']}")
    
    # 6. Use case suggestions
    print("\n6. SUGGESTED USES")
    uses = ["social media post", "professional presentation", "personal album", "art collection", "educational material"]
    use_result = zero_shot(image, candidate_labels=uses)
    results['suggested_use'] = use_result[0]['label']
    print(f"   Best for: {use_result[0]['label']}")
    
    return results

# Test it!
comprehensive_image_analysis("https://upload.wikimedia.org/wikipedia/commons/thumb/b/b6/Image_created_with_a_mobile_phone.png/1200px-Image_created_with_a_mobile_phone.png")

In [None]:
# YOUR TURN: Analyze your own images!
your_image = ""  # Paste URL here

if your_image:
    comprehensive_image_analysis(your_image)

---

# Part 5: Your Track Project (Week 6 Focus)

## Project Ideas for Images & Vision Track

### Idea 1: Smart Photo Organizer
- Input: Collection of photo URLs or uploaded images
- Output: Photos sorted by category, mood, and/or date
- Features: Auto-generates captions, suggests albums

### Idea 2: Art Style Analyzer
- Input: Artwork images
- Output: Analysis of style, period, mood, colors
- Features: Compares to famous art movements

### Idea 3: Photo Accessibility Tool
- Input: Any image
- Output: Detailed description for visually impaired users
- Features: Multiple detail levels, key element identification

### Idea 4: Social Media Post Analyzer
- Input: Image + optional context
- Output: Suggested captions, hashtags, best platform
- Features: Mood matching, audience suggestions

### Idea 5: Your Own Idea!
What visual AI tool would be useful to you?

## Project Planning Template

**My Project:** 

**What it does:** 

**Who would use it:** 

**AI capabilities I'll use:**
- [ ] Image classification
- [ ] Zero-shot image classification
- [ ] Image captioning
- [ ] Mood/style analysis
- [ ] Other: ___________

**What the input will look like:**

**What the output will look like:**

**Stretch goals (if I have time):**

In [None]:
# START YOUR PROJECT HERE!
# Use AI (ChatGPT/Claude) to help you build it.
# Remember the CLEAR framework for prompting.



In [None]:
# PROJECT CODE CELL 2



In [None]:
# PROJECT CODE CELL 3



In [None]:
# PROJECT TESTING



---

## Checklist: Weeks 5-6

**Week 5:**
- [ ] Completed Part 1: Image Classification
- [ ] Completed Part 2: Zero-Shot Classification
- [ ] Completed Part 3: Image Captioning
- [ ] Chose a project idea

**Week 6:**
- [ ] Completed Part 4: Combining Models
- [ ] Started building project
- [ ] Got something working (even if basic)
- [ ] Saved to GitHub

---

## Looking Ahead: Week 7

Next week begins the **Project Phase**. You'll:
- Define your project clearly
- Build a working prototype
- Get feedback and iterate

Come prepared with your project started!

---

*Youth Horizons AI Researcher Program - Level 2 | Images & Vision Track*