# Session 8: Image Story Pipeline
[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/buildLittleWorlds/level-2-course-material/blob/main/session-08/notebook.ipynb)

Chain two models together: image in, caption out, mood judged.

In [None]:
# Setup — run this cell first! (this may take a minute)
!pip install -q transformers torch Pillow

from transformers import pipeline, BlipProcessor, BlipForConditionalGeneration
from PIL import Image
import io

print("Loading BLIP captioner (this is a big model, be patient)...")
processor = BlipProcessor.from_pretrained("Salesforce/blip-image-captioning-base")
caption_model = BlipForConditionalGeneration.from_pretrained("Salesforce/blip-image-captioning-base")
print("BLIP loaded!")

print("Loading sentiment model...")
sentiment = pipeline("sentiment-analysis",
    model="distilbert-base-uncased-finetuned-sst-2-english")
print("All models loaded!")

## What We Built Tonight

We built an **Image Story Pipeline** — two models chained together:
1. **BLIP** looks at an image and writes a caption
2. **DistilBERT** reads that caption and judges the mood

Neither model knows the other exists.

Check out the live Space: [Image Story Pipeline on Hugging Face](https://huggingface.co/spaces/profplate/image-story-pipeline)

In [None]:
# The pipeline function: image -> caption -> sentiment
def analyze_image(image):
    # Step 1: Generate caption
    inputs = processor(image, return_tensors="pt")
    out = caption_model.generate(**inputs, max_length=50)
    caption = processor.decode(out[0], skip_special_tokens=True)

    # Step 2: Analyze caption sentiment
    result = sentiment(caption)[0]

    # Show the full pipeline
    print(f"Caption: {caption}")
    print(f"Sentiment: {result['label']} ({result['score']:.1%})")
    print(f"\nFull pipeline:")
    print(f"  IMAGE -> BLIP -> \"{caption}\"")
    print(f"  \"{caption}\" -> DistilBERT -> {result['label']} ({result['score']:.1%})")
    return caption, result

## How to Upload Images in Colab

Run the cell below — it will open a **file picker**. Choose an image from your computer.

This is a new Colab skill: working with files!

In [None]:
# Upload an image from your computer
from google.colab import files

print("Click 'Choose Files' to upload an image...")
uploaded = files.upload()

# Open the uploaded image
filename = list(uploaded.keys())[0]
image = Image.open(io.BytesIO(uploaded[filename]))
print(f"\nUploaded: {filename}")
image  # Display the image

In [None]:
# Run the pipeline on your uploaded image
caption, result = analyze_image(image)

## Try a URL Image (No Upload Needed)

You can also load images from the web. This is handy for quick tests.

In [None]:
# Load an image from a URL
import requests

url = "https://upload.wikimedia.org/wikipedia/commons/thumb/2/26/YellowLabradorLooking_new.jpg/1200px-YellowLabradorLooking_new.jpg"
response = requests.get(url)
web_image = Image.open(io.BytesIO(response.content))

print("Image from the web:")
display(web_image)
print()
analyze_image(web_image)

## Experiments

### Experiment 1: Find an Image Where the Caption Is Wrong

Upload different images. Find one where BLIP describes it incorrectly.

In [None]:
# Experiment 1: Upload another image
from google.colab import files

print("Upload an image that might confuse the captioner...")
uploaded2 = files.upload()
filename2 = list(uploaded2.keys())[0]
image2 = Image.open(io.BytesIO(uploaded2[filename2]))
display(image2)
print()
analyze_image(image2)

### Experiment 2: Caption Right, Sentiment Wrong?

Find an image where the caption is accurate but the sentiment model gets the mood wrong.

In [None]:
# Experiment 2: Upload another image
from google.colab import files

print("Upload an image where the mood might be tricky...")
uploaded3 = files.upload()
filename3 = list(uploaded3.keys())[0]
image3 = Image.open(io.BytesIO(uploaded3[filename3]))
display(image3)
print()
analyze_image(image3)

### Experiment 3: Where Does the Error Start?

For each image you tested, fill in this table:

| Image | What It Actually Shows | BLIP Caption | Caption Correct? | Sentiment | Sentiment Correct? | Which Step Failed? |
|-------|----------------------|-------------|-----------------|-----------|-------------------|-------------------|
| Image 1 | | | | | | |
| Image 2 | | | | | | |
| Image 3 | | | | | | |

In [None]:
# Experiment 3: Test the sentiment model directly on a corrected caption
# If the caption was wrong, what SHOULD it have said?

wrong_caption = "a dog sitting on a couch"  # <-- What BLIP actually said
correct_caption = "a cat sleeping on a bed"  # <-- What it SHOULD have said

print("Wrong caption:")
r1 = sentiment(wrong_caption)[0]
print(f"  \"{wrong_caption}\" -> {r1['label']} ({r1['score']:.1%})")

print("\nCorrected caption:")
r2 = sentiment(correct_caption)[0]
print(f"  \"{correct_caption}\" -> {r2['label']} ({r2['score']:.1%})")

print("\nDoes fixing the caption change the sentiment?")

## Challenge

Find an image that **breaks the pipeline**. Figure out which step failed:
- Did the captioner describe it incorrectly? (Step 1 error)
- Did the sentiment model misread a correct caption? (Step 2 error)
- Or did both steps fail?

Bring your most interesting broken result to next session.

**GitHub:** If you haven't uploaded a notebook yet, try it this week!

## Vocabulary

| Term | Meaning |
|------|---------|
| **Pipeline (multi-model)** | Connecting models so the output of one feeds the input of the next |
| **Error cascade** | When one model's mistake causes every model after it to be wrong |
| **Captioning** | Generating a text description of an image |
| **Chain** | Linking multiple steps together where each depends on the previous one |
| **Image-to-text** | A model that takes an image as input and produces text as output |