# Homework: Module 6

In this homework, we'll explore key concepts from generative AI, including GANs, diffusion models, and CLIP. We'll use pre-trained models from Hugging Face to complete various tasks.

## Setup

First, let's install the necessary libraries.

In [None]:
!pip install -q transformers torch torchvision diffusers

In [None]:
import torch
import torchvision
import matplotlib.pyplot as plt
from transformers import CLIPProcessor, CLIPModel
from diffusers import StableDiffusionPipeline
from torchvision.utils import make_grid

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")

## Part 1: Exploring CLIP

CLIP (Contrastive Language-Image Pre-training) is a model that can understand and compare images and text. Let's use it to perform image-text similarity tasks.

In [None]:
def setup_clip_model():
    model = CLIPModel.from_pretrained("openai/clip-vit-base-patch32")
    processor = CLIPProcessor.from_pretrained("openai/clip-vit-base-patch32")
    return model.to(device), processor

clip_model, clip_processor = setup_clip_model()

### Task 1: Image-Text Similarity

Complete the following function to compute the similarity between an image and a list of text prompts using CLIP.

In [None]:
def compute_image_text_similarity(model, processor, image, texts):
    # TODO: Implement the function to compute similarity scores
    # Hint: Use the processor to prepare inputs and the model to get logits
    # Return the similarity scores
    pass

# Test your function
image_url = "http://images.cocodataset.org/val2017/000000039769.jpg"
texts = ["a photo of a cat", "a photo of a dog", "a photo of a giraffe"]

# TODO: Load the image and compute similarities
# Print the results

## Part 2: Exploring Diffusion Models

Diffusion models are a class of generative models that have shown impressive results in image generation. We'll use a pre-trained Stable Diffusion model to generate images from text prompts.

In [None]:
def setup_diffusion_model():
    model_id = "runwayml/stable-diffusion-v1-5"
    pipe = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16)
    return pipe.to(device)

diffusion_pipe = setup_diffusion_model()

### Task 2: Text-to-Image Generation

Complete the following function to generate images from text prompts using the Stable Diffusion model.

In [None]:
def generate_image_from_text(pipe, prompt, num_images=1):
    # TODO: Implement the function to generate images from text
    # Hint: Use the pipe object to generate images
    # Return the generated images
    pass

# Test your function
prompt = "A serene landscape with mountains and a lake at sunset"

# TODO: Generate images and display them
# Hint: Use plt.imshow() to display the images

### Task 3: Exploring Guidance Scale

The guidance scale in Stable Diffusion models controls the trade-off between image quality and prompt adherence. Implement a function to compare different guidance scales.

In [None]:
def compare_guidance_scales(pipe, prompt, scales=[5.0, 7.5, 10.0]):
    # TODO: Implement the function to generate images with different guidance scales
    # Hint: Use the guidance_scale parameter in the pipe
    # Generate and display images side by side
    pass

# Test your function
prompt = "A futuristic city skyline at night"
compare_guidance_scales(diffusion_pipe, prompt)

## Part 3: Combining CLIP and Diffusion Models

Now, let's combine what we've learned about CLIP and diffusion models to create a more advanced application.

### Task 4: Image Generation and Evaluation

Implement a function that generates an image using the diffusion model and then evaluates how well it matches the original prompt using CLIP.

In [None]:
def generate_and_evaluate(diffusion_pipe, clip_model, clip_processor, prompt):
    # TODO: Implement the function to generate an image and evaluate it
    # 1. Generate an image using the diffusion model
    # 2. Use CLIP to compute the similarity between the generated image and the original prompt
    # 3. Display the image and print the similarity score
    pass

# Test your function
prompt = "A colorful hot air balloon floating over a field of flowers"
generate_and_evaluate(diffusion_pipe, clip_model, clip_processor, prompt)

## Conclusion

In this homework, you've explored key concepts in generative AI, including:

1. Using CLIP for image-text similarity tasks
2. Generating images with diffusion models
3. Understanding the impact of guidance scale in image generation
4. Combining CLIP and diffusion models for image generation and evaluation

These techniques are fundamental to many advanced applications in computer vision and natural language processing. Keep exploring and experimenting with these powerful models!