# Multimodal Image Generation with Diffusion Models

In this notebook, you'll explore how multimodal image generation works using a text-to-image pipeline powered by diffusion models. You'll interact with a pre-trained model and reflect on how prompts shape outputs.

### **Learning Objectives**
By the end of this notebook, you will:
- Understand the concept of multimodal learning and diffusion-based image generation.
- Generate images using Stable Diffusion with Hugging Face’s `diffusers` library.
- Reflect on how prompts and model parameters influence the generated outputs.


In [None]:
# Install required libraries
!pip install diffusers transformers accelerate scipy --quiet

In [None]:
from diffusers import StableDiffusionPipeline
import torch

# Load the model
pipe = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4")
pipe = pipe.to("cuda" if torch.cuda.is_available() else "cpu")

In [None]:
# Generate an image from a text prompt
prompt = "A futuristic city at sunset with flying cars"
image = pipe(prompt).images[0]
image.show()

In [None]:
# Experiment with guidance_scale to control adherence to the prompt
prompt = "A dragon flying over a medieval castle"
image = pipe(prompt, guidance_scale=7.5).images[0]
image.show()

## Reflection Questions

1. What is **multimodal learning**, and how is it used in this notebook?
2. How does a **text encoder** contribute to image generation?
3. What happens when you change the **prompt**? Try both surreal and realistic descriptions.
4. What is the effect of modifying `guidance_scale`?
5. What are some potential **ethical considerations** when generating synthetic images from text?

Take notes in your learning journal or class workspace.

## Optional: Try Your Own Prompts

Change the prompt in the code cell above to anything you like. Some ideas:
- "A classroom in the year 3000"
- "An astronaut having coffee on Mars"
- "A forest made of crystals under a purple sky"

Observe how the model interprets different elements and styles.