# Lab 4.1.2: Image Generation with Diffusion Models

**Module:** 4.1 - Multimodal AI  
**Time:** 2 hours  
**Difficulty:** ⭐⭐⭐

---

## Learning Objectives

By the end of this notebook, you will:
- [ ] Understand how diffusion models generate images
- [ ] Use SDXL to generate high-quality images from text
- [ ] Apply ControlNet for guided image generation
- [ ] Explore Flux for state-of-the-art image quality
- [ ] Customize generation with different samplers and parameters

---

## Prerequisites

- Completed: Lab 4.1.1 (Vision-Language Demo)
- Knowledge of: Neural networks, basic image processing
- Running in: NGC PyTorch container with diffusers installed

---

## Real-World Context

AI image generation is revolutionizing creative industries:

**Industry Applications:**
- **Marketing**: Generate product mockups and advertisements
- **Gaming**: Create concept art and textures
- **Architecture**: Visualize building designs
- **Fashion**: Design clothing patterns and styles
- **Film**: Storyboarding and concept visualization

**Why DGX Spark?**
- SDXL: ~8GB VRAM - generates in 5-8 seconds
- Flux.1-dev: ~24GB - highest quality, ~15-20 seconds
- No cloud API costs - generate unlimited images locally!

---

## ELI5: How Do Diffusion Models Work?

> **Imagine you're playing a game of "guess the drawing" but in reverse!**
>
> Normally, someone draws a cat, and you guess "cat!" But diffusion models learned to play it backwards:
>
> 1. **Training**: You show the AI a picture of a cat, then slowly add TV static (noise) until it's ALL static. The AI learns: "To get from this static to a cat, I need to remove THIS specific pattern of noise."
>
> 2. **Generation**: When you say "draw me a cat!", the AI starts with pure static (random noise), then removes noise step by step, guided by the word "cat", until a cat appears!
>
> Think of it like a sculptor: you start with a block of marble (random noise) and chip away (denoise) until a statue (image) emerges.
>
> **In AI terms:**
> - **Forward process**: Add Gaussian noise to images gradually (training data)
> - **Reverse process**: Learn to predict and remove noise step by step
> - **Conditioning**: Text embeddings guide which noise patterns to remove

---

## Part 1: Environment Setup

In [None]:
# Install required packages (run once)
# !pip install diffusers>=0.27.0 transformers accelerate matplotlib -q
# !pip install controlnet_aux opencv-python -q

In [None]:
import torch
import gc
from PIL import Image
import numpy as np
import time
from typing import Optional, List
import warnings
from IPython.display import display
import matplotlib.pyplot as plt  # For displaying image grids
warnings.filterwarnings('ignore')

# Check GPU
print("=" * 50)
print("GPU Configuration")
print("=" * 50)

if torch.cuda.is_available():
    gpu_name = torch.cuda.get_device_name(0)
    total_memory = torch.cuda.get_device_properties(0).total_memory / 1e9
    print(f"GPU: {gpu_name}")
    print(f"Total Memory: {total_memory:.1f} GB")
    print(f"\nDGX Spark has plenty of room for image generation!")
else:
    print("No GPU available - generation will be very slow!")

---

## Key Libraries for This Lab

### Matplotlib for Image Display

We'll use matplotlib to display multiple images in grids:

```python
import matplotlib.pyplot as plt

# Create a grid of subplots (rows x cols)
fig, axes = plt.subplots(2, 3, figsize=(15, 10))  # 2 rows, 3 columns

# axes is a 2D array - flatten for easy iteration
for ax, img in zip(axes.flatten(), images):
    ax.imshow(img)           # Display image
    ax.axis('off')           # Hide axis ticks
    ax.set_title("Title")    # Add title

plt.tight_layout()  # Adjust spacing
plt.show()
```

### OpenCV for Image Processing

OpenCV (cv2) provides powerful image processing functions:

```python
import cv2
import numpy as np
from PIL import Image

# Convert PIL Image to numpy array for OpenCV
img_array = np.array(pil_image)

# Convert RGB to grayscale
gray = cv2.cvtColor(img_array, cv2.COLOR_RGB2GRAY)

# Canny edge detection - finds edges in images
# low_threshold: edges with gradient below this are discarded
# high_threshold: edges with gradient above this are kept
edges = cv2.Canny(gray, low_threshold=100, high_threshold=200)

# Convert back to RGB for display (edges are single channel)
edges_rgb = np.stack([edges, edges, edges], axis=-1)
result = Image.fromarray(edges_rgb)
```

### PIL ImageDraw for Creating Images

We'll also use PIL's ImageDraw (covered in Lab 4.1.1) to create control images:

```python
from PIL import Image, ImageDraw

img = Image.new('RGB', (1024, 1024), color='white')
draw = ImageDraw.Draw(img)
draw.rectangle([x1, y1, x2, y2], outline='black', width=5)
draw.polygon([(x1,y1), (x2,y2), (x3,y3)], outline='black')
draw.ellipse([x1, y1, x2, y2], outline='black')
```

In [None]:
def clear_gpu_memory():
    """Clear GPU memory cache."""
    gc.collect()
    if torch.cuda.is_available():
        torch.cuda.empty_cache()
        torch.cuda.synchronize()
    print("GPU memory cleared!")

def get_memory_usage():
    """Get current GPU memory usage."""
    if torch.cuda.is_available():
        allocated = torch.cuda.memory_allocated(0) / 1e9
        reserved = torch.cuda.memory_reserved(0) / 1e9
        return f"Allocated: {allocated:.2f}GB, Reserved: {reserved:.2f}GB"
    return "No GPU"

def display_images_grid(images: List[Image.Image], titles: Optional[List[str]] = None, cols: int = 2):
    """Display multiple images in a grid."""
    from IPython.display import display
    import matplotlib.pyplot as plt
    
    rows = (len(images) + cols - 1) // cols
    fig, axes = plt.subplots(rows, cols, figsize=(5*cols, 5*rows))
    axes = axes.flatten() if hasattr(axes, 'flatten') else [axes]
    
    for idx, (ax, img) in enumerate(zip(axes, images)):
        ax.imshow(img)
        ax.axis('off')
        if titles and idx < len(titles):
            ax.set_title(titles[idx], fontsize=10)
    
    # Hide empty subplots
    for idx in range(len(images), len(axes)):
        axes[idx].axis('off')
    
    plt.tight_layout()
    plt.show()

print("Utility functions loaded!")

---

## Part 2: Understanding the Diffusion Pipeline

### The Components

```
┌──────────────┐    ┌───────────────┐    ┌─────────────┐    ┌──────────────┐
│ Text Prompt  │───▶│ Text Encoder  │───▶│   U-Net     │───▶│    VAE       │
│              │    │  (CLIP/T5)    │    │  Denoiser   │    │   Decoder    │
└──────────────┘    └───────────────┘    └─────────────┘    └──────────────┘
                                              ▲
                                              │
                                    ┌─────────────────┐
                                    │  Noise/Latents  │
                                    │   (Random)      │
                                    └─────────────────┘
```

### Key Parameters

| Parameter | What It Does | Typical Range |
|-----------|--------------|---------------|
| `num_inference_steps` | How many denoising steps | 20-50 |
| `guidance_scale` | How closely to follow the prompt | 5.0-15.0 |
| `negative_prompt` | What to avoid in the image | Text string |
| `seed` | Controls randomness for reproducibility | Any integer |

---

## Part 3: Stable Diffusion XL (SDXL)

SDXL is a powerful open-source model that generates high-quality 1024x1024 images.

### Why SDXL?
- **High resolution**: Native 1024x1024 output
- **Two-stage**: Base + Refiner for extra detail
- **Fast**: 5-8 seconds on DGX Spark
- **Flexible**: Works with many LoRAs and ControlNets

In [None]:
from diffusers import StableDiffusionXLPipeline, DPMSolverMultistepScheduler

clear_gpu_memory()

print("Loading SDXL Base model...")
print(f"Memory before: {get_memory_usage()}")
start_time = time.time()

# Load SDXL pipeline
sdxl_pipe = StableDiffusionXLPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    torch_dtype=torch.bfloat16,  # bfloat16 for Blackwell
    use_safetensors=True,
    variant="fp16"  # Use fp16 weights (converted to bf16)
)

# Use a faster scheduler
sdxl_pipe.scheduler = DPMSolverMultistepScheduler.from_config(sdxl_pipe.scheduler.config)

# Move to GPU
sdxl_pipe = sdxl_pipe.to("cuda")

load_time = time.time() - start_time
print(f"\nSDXL loaded in {load_time:.1f} seconds!")
print(f"Memory after: {get_memory_usage()}")

In [None]:
def generate_image(
    prompt: str,
    negative_prompt: str = "ugly, blurry, low quality, distorted",
    num_steps: int = 30,
    guidance_scale: float = 7.5,
    seed: Optional[int] = None,
    width: int = 1024,
    height: int = 1024
) -> Image.Image:
    """
    Generate an image using SDXL.
    
    Args:
        prompt: Text description of desired image
        negative_prompt: What to avoid in the image
        num_steps: Number of denoising steps (more = higher quality but slower)
        guidance_scale: How closely to follow the prompt (higher = more literal)
        seed: Random seed for reproducibility
        width: Output width in pixels
        height: Output height in pixels
        
    Returns:
        Generated PIL Image
    """
    # Set seed for reproducibility
    if seed is not None:
        generator = torch.Generator(device="cuda").manual_seed(seed)
    else:
        generator = None
        seed = torch.randint(0, 2**32, (1,)).item()
    
    print(f"Generating with seed: {seed}")
    start_time = time.time()
    
    # Generate image
    with torch.inference_mode():
        result = sdxl_pipe(
            prompt=prompt,
            negative_prompt=negative_prompt,
            num_inference_steps=num_steps,
            guidance_scale=guidance_scale,
            generator=generator,
            width=width,
            height=height
        )
    
    generation_time = time.time() - start_time
    print(f"Generated in {generation_time:.1f} seconds")
    
    return result.images[0]

print("generate_image() function ready!")

In [None]:
# Let's generate our first image!
prompt = "A majestic mountain landscape at sunset, golden hour lighting, \
photorealistic, 8k resolution, dramatic clouds, peaceful lake reflection"

image1 = generate_image(prompt, seed=42)
display(image1)

In [None]:
# Generate with different styles
prompts = [
    "A cyberpunk city at night, neon lights, rain, blade runner style, cinematic",
    "A serene Japanese garden with cherry blossoms, koi pond, peaceful, watercolor style",
    "Portrait of a wise wizard, long white beard, magical staff, fantasy art, detailed",
    "Futuristic spaceship interior, sci-fi, clean design, blue holographic displays"
]

print("Generating 4 images with different styles...")
print("This will take about 30 seconds...\n")

images = []
for i, prompt in enumerate(prompts):
    print(f"\n[{i+1}/4] {prompt[:50]}...")
    img = generate_image(prompt, seed=42+i, num_steps=25)
    images.append(img)

# Display in grid
display_images_grid(images, [p[:30] + "..." for p in prompts])

### What Just Happened?

For each image:
1. **Text Encoding**: CLIP encoded your prompt into embeddings (what the image should look like)
2. **Noise Generation**: Random latent noise was created (starting point)
3. **Iterative Denoising**: The U-Net predicted and removed noise 25 times, guided by text embeddings
4. **VAE Decoding**: The final latents were decoded to a full-resolution image

The `seed` parameter controls the initial noise, so the same seed + prompt = same image!

---

## Part 4: Prompt Engineering for Better Results

### The Art of Prompting

Great prompts follow a structure:

```
[Subject] + [Style] + [Details] + [Quality Boosters]
```

**Examples:**
- Subject: "A red fox in a snowy forest"
- Style: "oil painting, impressionist"
- Details: "morning light, misty atmosphere"
- Quality: "highly detailed, 4k, artstation"

### Quality Boosters That Work
- `highly detailed`, `intricate details`
- `8k`, `4k resolution`
- `professional photography`, `cinematic`
- `trending on artstation`, `award winning`
- `golden hour`, `dramatic lighting`

In [None]:
# Let's compare basic vs detailed prompts

basic_prompt = "A cat sitting on a couch"

detailed_prompt = """A fluffy orange tabby cat sitting regally on a vintage velvet couch, 
soft afternoon sunlight streaming through lace curtains, cozy living room, 
photorealistic, shallow depth of field, professional pet photography, 
warm color palette, 8k resolution"""

print("Comparing basic vs detailed prompts...\n")

print("Basic prompt:")
img_basic = generate_image(basic_prompt, seed=123, num_steps=25)

print("\nDetailed prompt:")
img_detailed = generate_image(detailed_prompt, seed=123, num_steps=25)

display_images_grid([img_basic, img_detailed], ["Basic Prompt", "Detailed Prompt"])

In [None]:
# The power of negative prompts

prompt = "Portrait of a young woman, natural beauty, soft lighting"

# Standard negative prompt
negative_basic = "ugly, blurry, low quality"

# Detailed negative prompt for portraits
negative_detailed = """ugly, deformed, noisy, blurry, distorted, grainy, 
bad anatomy, bad proportions, extra limbs, cloned face, disfigured, 
gross proportions, malformed limbs, missing arms, missing legs, 
extra arms, extra legs, mutated hands, fused fingers, too many fingers, 
long neck, mutation, mutilated, morbid, ugly, poorly drawn hands, 
poorly drawn face, extra fingers, mutated, dehydrated, bad quality, 
worst quality, watermark, text, signature"""

print("Basic negative prompt:")
img_neg_basic = generate_image(prompt, negative_prompt=negative_basic, seed=456, num_steps=25)

print("\nDetailed negative prompt:")
img_neg_detailed = generate_image(prompt, negative_prompt=negative_detailed, seed=456, num_steps=25)

display_images_grid([img_neg_basic, img_neg_detailed], ["Basic Negative", "Detailed Negative"])

---

## Part 5: Guidance Scale - Finding the Sweet Spot

Guidance scale controls how closely the image follows your prompt:
- **Low (1-5)**: More creative, may ignore parts of prompt
- **Medium (7-10)**: Good balance
- **High (12-20)**: Very literal, but may look "overcooked"

Let's experiment!

In [None]:
# Compare different guidance scales
prompt = "A castle floating in the clouds, fantasy art, magical atmosphere"

guidance_values = [3.0, 7.5, 12.0, 20.0]
images = []

print("Generating with different guidance scales...\n")

for guidance in guidance_values:
    print(f"Guidance scale: {guidance}")
    img = generate_image(prompt, guidance_scale=guidance, seed=789, num_steps=25)
    images.append(img)

display_images_grid(images, [f"Guidance: {g}" for g in guidance_values])

---

## Part 6: ControlNet - Guided Generation

ControlNet allows you to guide image generation using:
- **Canny edges**: Outline of shapes
- **Depth maps**: 3D structure
- **Pose**: Human body positions
- **Segmentation**: Region definitions

### ELI5: What is ControlNet?

> **Imagine you're giving directions to an artist:**
>
> Without ControlNet: "Draw me a house" - the artist uses imagination
>
> With ControlNet: "Draw me a house, but follow these lines" - you give them a rough sketch
>
> The artist still adds their own style and details, but the basic structure follows your guide!

In [None]:
# First, let's clean up SDXL to make room for ControlNet
del sdxl_pipe
clear_gpu_memory()
print(f"Memory after cleanup: {get_memory_usage()}")

In [None]:
from diffusers import StableDiffusionXLControlNetPipeline, ControlNetModel
from diffusers.utils import load_image
import cv2

print("Loading SDXL with Canny ControlNet...")
start_time = time.time()

# Load ControlNet
controlnet = ControlNetModel.from_pretrained(
    "diffusers/controlnet-canny-sdxl-1.0",
    torch_dtype=torch.bfloat16,
    use_safetensors=True
)

# Load pipeline with ControlNet
controlnet_pipe = StableDiffusionXLControlNetPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    controlnet=controlnet,
    torch_dtype=torch.bfloat16,
    use_safetensors=True,
    variant="fp16"
)

controlnet_pipe = controlnet_pipe.to("cuda")

load_time = time.time() - start_time
print(f"\nControlNet pipeline loaded in {load_time:.1f} seconds!")
print(f"Memory: {get_memory_usage()}")

In [None]:
def get_canny_edges(image: Image.Image, low_threshold: int = 100, high_threshold: int = 200) -> Image.Image:
    """Extract Canny edges from an image."""
    # Convert to numpy and grayscale
    img_array = np.array(image)
    gray = cv2.cvtColor(img_array, cv2.COLOR_RGB2GRAY)
    
    # Apply Canny edge detection
    edges = cv2.Canny(gray, low_threshold, high_threshold)
    
    # Convert back to RGB PIL Image
    edges_rgb = np.stack([edges, edges, edges], axis=-1)
    return Image.fromarray(edges_rgb)

print("get_canny_edges() function ready!")

In [None]:
# Create a simple test image with clear edges
# We'll draw some geometric shapes

from PIL import ImageDraw

# Create a simple composition
control_source = Image.new('RGB', (1024, 1024), color='white')
draw = ImageDraw.Draw(control_source)

# Draw a simple house shape
draw.rectangle([300, 400, 700, 800], outline='black', width=5)  # House body
draw.polygon([(250, 400), (500, 150), (750, 400)], outline='black', width=5)  # Roof
draw.rectangle([420, 550, 580, 800], outline='black', width=5)  # Door
draw.rectangle([350, 500, 400, 580], outline='black', width=5)  # Window 1
draw.rectangle([600, 500, 650, 580], outline='black', width=5)  # Window 2
draw.ellipse([800, 100, 950, 250], outline='black', width=5)  # Sun

# Get edges
canny_image = get_canny_edges(control_source)

display_images_grid([control_source, canny_image], ["Source Drawing", "Canny Edges"])

In [None]:
# Generate images guided by the edges
prompts = [
    "A cozy cottage in a meadow, wildflowers, sunny day, photorealistic",
    "A haunted house at night, spooky atmosphere, full moon, gothic style",
    "A gingerbread house, candy decorations, whimsical, children's book illustration"
]

print("Generating images with ControlNet guidance...\n")

controlled_images = []
for i, prompt in enumerate(prompts):
    print(f"\n[{i+1}/3] {prompt[:40]}...")
    
    generator = torch.Generator(device="cuda").manual_seed(42 + i)
    
    with torch.inference_mode():
        result = controlnet_pipe(
            prompt=prompt,
            negative_prompt="ugly, blurry, low quality",
            image=canny_image,
            num_inference_steps=25,
            guidance_scale=7.5,
            controlnet_conditioning_scale=0.7,  # How much to follow the control image
            generator=generator
        )
    
    controlled_images.append(result.images[0])
    print(f"Generated!")

# Display results
all_images = [canny_image] + controlled_images
titles = ["Control (Edges)"] + [p[:25] + "..." for p in prompts]
display_images_grid(all_images, titles)

### What Just Happened?

ControlNet added an extra condition to the generation process:

1. **Normal SDXL**: Text embedding → Guide denoising
2. **With ControlNet**: Text embedding + Edge features → Guide denoising

The `controlnet_conditioning_scale` (0.0-1.0) controls how strictly to follow the edges:
- 0.0 = Ignore edges completely
- 0.5 = Soft guidance
- 1.0 = Strict adherence to edges

Notice how all three images maintain the house structure but with completely different styles!

---

## Try It Yourself: Custom ControlNet

Create your own control image and generate variations!

1. Draw a simple scene (a robot, a car, or anything with clear edges)
2. Extract edges with `get_canny_edges()`
3. Generate different styles with the same structure

<details>
<summary>Hint: Creating Control Images</summary>

```python
# You can also load an existing image
my_image = Image.open("/path/to/image.jpg").resize((1024, 1024))
my_edges = get_canny_edges(my_image)

# Or download from URL
import requests
from io import BytesIO
response = requests.get("https://example.com/image.jpg")
my_image = Image.open(BytesIO(response.content)).convert('RGB')
```
</details>

In [None]:
# YOUR CODE HERE
# Try creating your own control image and generating variations!



---

## Part 7: Flux - State of the Art (Optional)

Flux is a newer model that produces incredibly detailed and coherent images. It requires more memory (~24GB) but DGX Spark handles it easily!

**Note**: Flux.1-dev requires accepting the license on Hugging Face.

For this demo, we'll use the schnell (fast) version which is permissively licensed.

In [None]:
# Clean up ControlNet first
del controlnet_pipe
del controlnet
clear_gpu_memory()
print(f"Memory after cleanup: {get_memory_usage()}")

In [None]:
from diffusers import FluxPipeline

print("Loading Flux.1-schnell...")
print("This model is larger and may take a minute to load...")
print(f"Memory before: {get_memory_usage()}")

start_time = time.time()

flux_pipe = FluxPipeline.from_pretrained(
    "black-forest-labs/FLUX.1-schnell",
    torch_dtype=torch.bfloat16
)

flux_pipe = flux_pipe.to("cuda")

load_time = time.time() - start_time
print(f"\nFlux loaded in {load_time:.1f} seconds!")
print(f"Memory after: {get_memory_usage()}")

In [None]:
def generate_flux(
    prompt: str,
    num_steps: int = 4,  # Flux.schnell is optimized for 4 steps
    seed: Optional[int] = None,
    width: int = 1024,
    height: int = 1024
) -> Image.Image:
    """
    Generate an image using Flux.
    
    Args:
        prompt: Text description
        num_steps: Inference steps (4 for schnell)
        seed: Random seed
        width: Image width
        height: Image height
        
    Returns:
        Generated image
    """
    if seed is not None:
        generator = torch.Generator(device="cuda").manual_seed(seed)
    else:
        generator = None
    
    print(f"Generating with Flux...")
    start_time = time.time()
    
    with torch.inference_mode():
        result = flux_pipe(
            prompt=prompt,
            num_inference_steps=num_steps,
            generator=generator,
            width=width,
            height=height,
            guidance_scale=0.0  # Flux.schnell doesn't use guidance
        )
    
    generation_time = time.time() - start_time
    print(f"Generated in {generation_time:.1f} seconds")
    
    return result.images[0]

print("generate_flux() function ready!")

In [None]:
# Test Flux with the same prompt we used earlier
flux_prompt = "A majestic mountain landscape at sunset, golden hour lighting, \
photorealistic, 8k resolution, dramatic clouds, peaceful lake reflection"

flux_image = generate_flux(flux_prompt, seed=42)
display(flux_image)

In [None]:
# Flux excels at complex scenes and text rendering
complex_prompts = [
    "A bookshelf filled with colorful books, each spine has readable titles like 'Adventure' and 'Mystery', cozy reading nook",
    "A futuristic city with flying cars, holographic advertisements showing 'WELCOME 2050', neon lights"
]

print("Testing Flux with complex prompts...\n")

flux_images = []
for i, prompt in enumerate(complex_prompts):
    print(f"\n[{i+1}/2] {prompt[:50]}...")
    img = generate_flux(prompt, seed=42+i)
    flux_images.append(img)

display_images_grid(flux_images, ["Bookshelf", "Future City"])

---

## Common Mistakes

### Mistake 1: Using Wrong Image Dimensions

```python
# Wrong - SDXL trained on 1024x1024
image = generate_image(prompt, width=512, height=512)  # Lower quality

# Right - use native resolution
image = generate_image(prompt, width=1024, height=1024)  # Best quality

# Also good - SDXL supports these aspect ratios:
# 1024x1024, 1152x896, 896x1152, 1216x832, 832x1216, 1344x768, 768x1344
```

### Mistake 2: Guidance Scale Too High

```python
# Wrong - oversaturated, artifacts
image = generate_image(prompt, guidance_scale=30)  # Too high!

# Right - balanced generation
image = generate_image(prompt, guidance_scale=7.5)  # Good default
```

### Mistake 3: Not Using Negative Prompts

```python
# Wrong - may get unwanted artifacts
image = generate_image("portrait of a person")

# Right - explicitly avoid problems
image = generate_image(
    "portrait of a person",
    negative_prompt="ugly, deformed, bad anatomy, extra limbs"
)
```

### Mistake 4: Forgetting to Set Seed for Reproducibility

```python
# Wrong - can't reproduce the result
image = generate_image(prompt)  # Random seed each time

# Right - reproducible results
image = generate_image(prompt, seed=42)  # Same image every time
```

---

## Checkpoint

You've learned:
- How diffusion models generate images from noise
- How to use SDXL for high-quality text-to-image generation
- How to craft effective prompts and negative prompts
- How to use ControlNet for guided generation
- How to use Flux for state-of-the-art results

### Key Takeaways

1. **Diffusion = Denoising**: Models learn to remove noise step by step
2. **Prompts matter**: More descriptive = better results
3. **Guidance scale**: 7-10 is usually the sweet spot
4. **ControlNet**: Add structure while keeping creative freedom
5. **Seeds**: Use them for reproducibility!

---

## Challenge (Optional)

### Build an Image Variation Generator

Create a function that:
1. Takes a base prompt and a list of style modifiers
2. Generates the same scene in multiple styles
3. Uses the same seed for each to maintain consistency
4. Returns a grid of all variations

Example:
```python
base = "A dragon flying over a castle"
styles = ["oil painting", "anime style", "photorealistic", "pixel art"]
variations = generate_style_variations(base, styles)
```

In [None]:
# YOUR CHALLENGE CODE HERE

def generate_style_variations(base_prompt: str, styles: List[str], seed: int = 42) -> List[Image.Image]:
    """
    Generate the same scene in multiple artistic styles.
    
    Args:
        base_prompt: The core scene description
        styles: List of style modifiers
        seed: Random seed for consistency
        
    Returns:
        List of generated images
    """
    # TODO: Implement this function
    pass

---

## Further Reading

- [SDXL Paper: High-Resolution Image Synthesis](https://arxiv.org/abs/2307.01952)
- [Diffusers Library Documentation](https://huggingface.co/docs/diffusers)
- [ControlNet Paper](https://arxiv.org/abs/2302.05543)
- [Flux Technical Report](https://blackforestlabs.ai/)
- [Prompt Engineering Guide for SD](https://stable-diffusion-art.com/prompt-guide/)

---

## Cleanup

In [None]:
# Clean up GPU memory
if 'flux_pipe' in dir():
    del flux_pipe
if 'sdxl_pipe' in dir():
    del sdxl_pipe
if 'controlnet_pipe' in dir():
    del controlnet_pipe

clear_gpu_memory()
print(f"Final memory state: {get_memory_usage()}")
print("\nNotebook complete! Ready for the next task.")