# Lab 2.6.3: ControlNet Workshop - Guided Image Generation

**Module:** 2.6 - Diffusion Models  
**Time:** 2 hours  
**Difficulty:** ‚≠ê‚≠ê‚≠ê (Intermediate)

---

## üéØ Learning Objectives

By the end of this notebook, you will:
- [ ] Understand how ControlNet enables structural control
- [ ] Use Canny edge detection for outline-guided generation
- [ ] Apply depth maps for spatial composition
- [ ] Control character poses using OpenPose
- [ ] Combine multiple control methods
- [ ] Create consistent characters across images

---

## üìö Prerequisites

- Completed: Lab 2.6.2 (Stable Diffusion Generation)
- Knowledge of: Basic SDXL usage, prompt engineering
- **Required packages:**
  - `diffusers>=0.27.0`
  - `opencv-python>=4.8.0` (for edge detection)
  - `controlnet_aux` (optional, for advanced preprocessors)

**Install if needed:**
```bash
pip install opencv-python>=4.8.0
pip install controlnet_aux  # Optional: for pose, depth, etc.
```

---

## üåç Real-World Context

**ControlNet solves the "I know what I want but can't describe it" problem:**

- **Architects** sketch a building outline ‚Üí get photorealistic renders
- **Game developers** use character poses ‚Üí generate concept art
- **Interior designers** provide room layouts ‚Üí visualize different styles
- **Fashion designers** sketch silhouettes ‚Üí see clothing designs

It transforms AI from "creative assistant" to "precise tool"!

---

## üßí ELI5: What is ControlNet?

> **Imagine you're teaching someone to color in a coloring book:**
>
> Without ControlNet:
> - "Draw a cat" ‚Üí They draw whatever cat they imagine
>
> With ControlNet:
> - You give them an outline of a specific cat
> - "Color this cat" ‚Üí They follow YOUR lines!
>
> **ControlNet adds "training wheels" to diffusion models:**
> - **Canny edges**: "Follow these outlines"
> - **Depth map**: "Near things here, far things there"
> - **Pose skeleton**: "The person should stand like this"
> - **Segmentation**: "Sky here, ground there, tree here"

### How It Works

```
Control Image (edges/depth/pose)    Text Prompt
         ‚îÇ                              ‚îÇ
         ‚ñº                              ‚ñº
  ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê               ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê
  ‚îÇ  ControlNet ‚îÇ               ‚îÇ    CLIP     ‚îÇ
  ‚îÇ  (Encoder)  ‚îÇ               ‚îÇ  (Encoder)  ‚îÇ
  ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚î¨‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò               ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚î¨‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò
         ‚îÇ                              ‚îÇ
         ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚î¨‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò
                    ‚ñº
              ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê
              ‚îÇ   U-Net  ‚îÇ  ‚óÑ‚îÄ‚îÄ Noise
              ‚îÇ(Denoiser)‚îÇ
              ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚î¨‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò
                   ‚îÇ
                   ‚ñº
            Generated Image
       (follows structure + prompt)
```

---

## Part 1: Setting Up

In [None]:
# Core imports
import torch
import gc
import time
import numpy as np
from pathlib import Path

# Diffusers
from diffusers import (
    StableDiffusionXLControlNetPipeline,
    ControlNetModel,
    AutoencoderKL,
)
from diffusers.utils import load_image

# Image processing
from PIL import Image
import cv2

# Visualization
import matplotlib.pyplot as plt

# Device setup
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Device: {device}")

if torch.cuda.is_available():
    print(f"GPU: {torch.cuda.get_device_name()}")
    total_mem = torch.cuda.get_device_properties(0).total_memory / 1e9
    print(f"Total Memory: {total_mem:.1f} GB")
    print(f"\nDGX Spark's 128GB unified memory = ControlNet at full precision! üöÄ")

In [None]:
# Helper functions
def show_images_side_by_side(images, titles, figsize=(15, 5)):
    """Display images side by side for comparison."""
    n = len(images)
    fig, axes = plt.subplots(1, n, figsize=figsize)
    if n == 1:
        axes = [axes]
    
    for ax, img, title in zip(axes, images, titles):
        if isinstance(img, np.ndarray):
            if len(img.shape) == 2:  # Grayscale
                ax.imshow(img, cmap='gray')
            else:
                ax.imshow(img)
        else:
            ax.imshow(img)
        ax.set_title(title, fontsize=11)
        ax.axis('off')
    
    plt.tight_layout()
    plt.show()

def get_memory_usage():
    """Get current GPU memory usage."""
    if torch.cuda.is_available():
        allocated = torch.cuda.memory_allocated() / 1e9
        reserved = torch.cuda.memory_reserved() / 1e9
        return f"Allocated: {allocated:.2f}GB, Reserved: {reserved:.2f}GB"
    return "No GPU"

def create_sample_image(size=(1024, 1024)):
    """Create a simple sample image for testing."""
    # Create a simple geometric composition
    img = np.zeros((*size, 3), dtype=np.uint8)
    img[:] = (200, 200, 200)  # Light gray background
    
    # Add some shapes
    center = (size[0] // 2, size[1] // 2)
    cv2.circle(img, center, size[0] // 4, (100, 100, 100), -1)
    cv2.rectangle(img, (size[0]//4, size[1]//4), (3*size[0]//4, 3*size[1]//4), (50, 50, 50), 3)
    
    return Image.fromarray(img)

print("Helper functions ready!")

---

## Part 2: Canny Edge ControlNet

### üßí ELI5: Canny Edges

> **Canny edge detection is like tracing the outlines in a picture:**
>
> Take a photo of your room ‚Üí Canny finds all the edges of furniture, walls, windows
> ‚Üí Now you can "color in" those outlines in any style!
>
> Perfect for:
> - Keeping the structure of a scene while changing the style
> - Using sketches as input
> - Maintaining architectural details

In [None]:
# Load ControlNet for Canny edges
print("Loading Canny ControlNet...")
print(f"Memory before: {get_memory_usage()}")

controlnet_canny = ControlNetModel.from_pretrained(
    "diffusers/controlnet-canny-sdxl-1.0",
    torch_dtype=torch.bfloat16,
    use_safetensors=True,
)

# Load the pipeline with ControlNet
pipe_canny = StableDiffusionXLControlNetPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    controlnet=controlnet_canny,
    torch_dtype=torch.bfloat16,
    use_safetensors=True,
    variant="fp16",
)
pipe_canny = pipe_canny.to(device)

print(f"\n‚úÖ Canny ControlNet loaded!")
print(f"Memory after: {get_memory_usage()}")

In [None]:
def get_canny_edges(image, low_threshold=100, high_threshold=200):
    """
    Extract Canny edges from an image.
    
    Args:
        image: PIL Image or numpy array
        low_threshold: Lower threshold for edge detection
        high_threshold: Upper threshold for edge detection
    
    Returns:
        PIL Image with edges (white on black)
    """
    if isinstance(image, Image.Image):
        image = np.array(image)
    
    # Convert to grayscale if needed
    if len(image.shape) == 3:
        gray = cv2.cvtColor(image, cv2.COLOR_RGB2GRAY)
    else:
        gray = image
    
    # Apply Canny edge detection
    edges = cv2.Canny(gray, low_threshold, high_threshold)
    
    # Convert to 3-channel for ControlNet
    edges_3ch = np.stack([edges, edges, edges], axis=-1)
    
    return Image.fromarray(edges_3ch)

# Test with a sample image
# You can replace this with any image URL
sample_url = "https://upload.wikimedia.org/wikipedia/commons/thumb/a/a7/Camponotus_flavomarginatus_ant.jpg/640px-Camponotus_flavomarginatus_ant.jpg"

try:
    original_image = load_image(sample_url)
    original_image = original_image.resize((1024, 1024))
except:
    print("Could not load sample image, creating a synthetic one...")
    original_image = create_sample_image()

# Get edges
canny_edges = get_canny_edges(original_image, low_threshold=100, high_threshold=200)

# Display
show_images_side_by_side(
    [original_image, canny_edges],
    ["Original Image", "Canny Edges"]
)

In [None]:
# Generate images with different styles using the same edges
prompts = [
    "A detailed pencil sketch, artistic drawing, fine lines, on paper",
    "A neon cyberpunk scene, glowing edges, futuristic, dark background",
    "An oil painting, impressionist style, vibrant colors, artistic",
    "A watercolor painting, soft colors, artistic, delicate brushstrokes",
]

negative_prompt = "blurry, low quality, distorted, ugly"

images = []
for i, prompt in enumerate(prompts):
    print(f"Generating {i+1}/{len(prompts)}: {prompt[:40]}...")
    
    generator = torch.Generator(device=device).manual_seed(42)
    
    image = pipe_canny(
        prompt=prompt,
        negative_prompt=negative_prompt,
        image=canny_edges,
        num_inference_steps=25,
        guidance_scale=7.5,
        controlnet_conditioning_scale=0.5,  # How strongly to follow edges
        generator=generator,
    ).images[0]
    
    images.append(image)

# Display all results
fig, axes = plt.subplots(2, 3, figsize=(15, 10))
axes = axes.flatten()

# Show original and edges first
axes[0].imshow(original_image)
axes[0].set_title("Original")
axes[0].axis('off')

axes[1].imshow(canny_edges)
axes[1].set_title("Canny Edges (Control)")
axes[1].axis('off')

# Show generated images
for i, (img, prompt) in enumerate(zip(images, prompts)):
    axes[i+2].imshow(img)
    axes[i+2].set_title(prompt[:30] + "...", fontsize=9)
    axes[i+2].axis('off')

plt.tight_layout()
plt.show()

print("\nüí° Notice how all images follow the same edge structure!")

### Controlling the Strength

The `controlnet_conditioning_scale` parameter controls how strictly the model follows the control image:

- **0.0**: Ignore control completely (just text prompt)
- **0.3-0.5**: Loose guidance (inspired by edges, not strict)
- **0.7-0.8**: Moderate guidance (balanced)
- **1.0**: Strong guidance (strict edge following)

In [None]:
# Compare different conditioning scales
prompt = "A fantasy castle with magical towers, digital art, highly detailed"
scales = [0.0, 0.3, 0.5, 0.7, 1.0]

images = []
for scale in scales:
    print(f"Generating with scale={scale}...")
    generator = torch.Generator(device=device).manual_seed(42)
    
    image = pipe_canny(
        prompt=prompt,
        image=canny_edges,
        num_inference_steps=20,
        guidance_scale=7.5,
        controlnet_conditioning_scale=scale,
        generator=generator,
    ).images[0]
    
    images.append(image)

# Display
fig, axes = plt.subplots(1, len(scales), figsize=(20, 4))
for ax, img, scale in zip(axes, images, scales):
    ax.imshow(img)
    ax.set_title(f"Scale: {scale}")
    ax.axis('off')

plt.suptitle("ControlNet Conditioning Scale Comparison", fontsize=14, y=1.02)
plt.tight_layout()
plt.show()

print("\nüìä Observations:")
print("  - 0.0: Pure text-to-image (no edge guidance)")
print("  - 0.3: Loose inspiration from edges")
print("  - 0.5: Balanced (recommended for most cases)")
print("  - 1.0: Strict edge following")

---

## Part 3: Depth Map ControlNet

### üßí ELI5: Depth Maps

> **A depth map shows how far away things are:**
>
> - White = Close to camera
> - Black = Far from camera
> - Shades of gray = In between
>
> This lets you control the 3D composition:
> - "Put the subject in front, background far away"
> - "Make a scene with proper perspective"
> - "Control spatial relationships between objects"

In [None]:
# Clean up Canny pipeline to free memory
del pipe_canny, controlnet_canny
gc.collect()
torch.cuda.empty_cache()
print(f"Memory after cleanup: {get_memory_usage()}")

In [None]:
# Load Depth ControlNet
print("Loading Depth ControlNet...")

controlnet_depth = ControlNetModel.from_pretrained(
    "diffusers/controlnet-depth-sdxl-1.0",
    torch_dtype=torch.bfloat16,
    use_safetensors=True,
)

pipe_depth = StableDiffusionXLControlNetPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    controlnet=controlnet_depth,
    torch_dtype=torch.bfloat16,
    use_safetensors=True,
    variant="fp16",
)
pipe_depth = pipe_depth.to(device)

print(f"\n‚úÖ Depth ControlNet loaded!")
print(f"Memory: {get_memory_usage()}")

### Understanding `torch.hub.load()` for Pre-trained Models

Before we create our depth estimation function, let's understand `torch.hub.load()`:

```python
# torch.hub.load() downloads and loads pre-trained models from GitHub repos
# Syntax: torch.hub.load('repo_owner/repo_name', 'model_name', trust_repo=True)

# Example: Load MiDaS depth estimation model
midas = torch.hub.load("intel-isl/MiDaS", "MiDaS_small", trust_repo=True)
```

**Key parameters:**
- `'intel-isl/MiDaS'`: GitHub repository (owner/repo)
- `'MiDaS_small'`: Specific model or function to load
- `trust_repo=True`: Required to acknowledge you trust the code

**Why use it?**
- Access pre-trained models without manual download
- Models are cached locally after first download
- Many popular models available (MiDaS, YOLO, etc.)

**Note:** This is OPTIONAL for this lab - we provide `create_synthetic_depth()` as a faster alternative.

In [None]:
def estimate_depth(image, model_type="MiDaS_small"):
    """
    Estimate depth from an image using MiDaS.
    
    Args:
        image: PIL Image
        model_type: "MiDaS_small", "DPT_Hybrid", or "DPT_Large"
    
    Returns:
        PIL Image with depth map
    """
    import torch.nn.functional as F
    
    # Load MiDaS model
    midas = torch.hub.load("intel-isl/MiDaS", model_type, trust_repo=True)
    midas = midas.to(device).eval()
    
    # Load transforms
    midas_transforms = torch.hub.load("intel-isl/MiDaS", "transforms", trust_repo=True)
    if model_type in ["DPT_Large", "DPT_Hybrid"]:
        transform = midas_transforms.dpt_transform
    else:
        transform = midas_transforms.small_transform
    
    # Prepare input
    np_image = np.array(image)
    input_batch = transform(np_image).to(device)
    
    # Inference
    with torch.no_grad():
        prediction = midas(input_batch)
        prediction = F.interpolate(
            prediction.unsqueeze(1),
            size=np_image.shape[:2],
            mode="bicubic",
            align_corners=False,
        ).squeeze()
    
    # Normalize to 0-255
    depth = prediction.cpu().numpy()
    depth = (depth - depth.min()) / (depth.max() - depth.min()) * 255
    depth = depth.astype(np.uint8)
    
    # Convert to 3-channel
    depth_3ch = np.stack([depth, depth, depth], axis=-1)
    
    # Cleanup
    del midas
    gc.collect()
    torch.cuda.empty_cache()
    
    return Image.fromarray(depth_3ch)

# Or create a synthetic depth map for testing
def create_synthetic_depth(size=(1024, 1024)):
    """Create a synthetic depth map with gradient."""
    h, w = size
    # Create radial gradient (center bright, edges dark)
    y, x = np.ogrid[:h, :w]
    center = (h // 2, w // 2)
    dist = np.sqrt((x - center[1])**2 + (y - center[0])**2)
    max_dist = np.sqrt(center[0]**2 + center[1]**2)
    depth = 255 - (dist / max_dist * 255).astype(np.uint8)
    depth_3ch = np.stack([depth, depth, depth], axis=-1)
    return Image.fromarray(depth_3ch)

print("Depth estimation functions ready!")

In [None]:
# Create or estimate depth map
# Using synthetic for speed - you can use estimate_depth() for real images

# Option 1: Synthetic depth (fast)
depth_map = create_synthetic_depth((1024, 1024))

# Option 2: From real image (slower but more realistic)
# depth_map = estimate_depth(original_image)

# Generate images with different prompts using depth control
prompts = [
    "A mystical forest with ancient trees, foggy atmosphere, fantasy art",
    "An underwater scene with coral reef, tropical fish, sunlight rays",
    "A space scene with nebula and stars, cosmic, ethereal",
]

images = []
for prompt in prompts:
    print(f"Generating: {prompt[:40]}...")
    generator = torch.Generator(device=device).manual_seed(42)
    
    image = pipe_depth(
        prompt=prompt,
        image=depth_map,
        num_inference_steps=25,
        guidance_scale=7.5,
        controlnet_conditioning_scale=0.5,
        generator=generator,
    ).images[0]
    
    images.append(image)

# Display
fig, axes = plt.subplots(1, 4, figsize=(20, 5))

axes[0].imshow(depth_map)
axes[0].set_title("Depth Map (Control)")
axes[0].axis('off')

for i, (img, prompt) in enumerate(zip(images, prompts)):
    axes[i+1].imshow(img)
    axes[i+1].set_title(prompt[:25] + "...", fontsize=10)
    axes[i+1].axis('off')

plt.suptitle("Depth-Controlled Generation", fontsize=14, y=1.02)
plt.tight_layout()
plt.show()

print("\nüí° Notice how the depth composition is preserved across different styles!")

---

## Part 4: Creating Custom Control Images

You can draw your own control images! This is powerful for creative control.

In [None]:
def draw_simple_scene():
    """
    Draw a simple scene with lines for edge control.
    This simulates a hand-drawn sketch.
    """
    # Create blank canvas
    img = np.zeros((1024, 1024, 3), dtype=np.uint8)
    
    # Draw horizon line
    cv2.line(img, (0, 512), (1024, 512), (255, 255, 255), 2)
    
    # Draw mountains
    pts = np.array([[0, 512], [200, 300], [400, 450], [600, 250], [800, 400], [1024, 350], [1024, 512]], np.int32)
    cv2.polylines(img, [pts], False, (255, 255, 255), 2)
    
    # Draw sun circle
    cv2.circle(img, (800, 150), 80, (255, 255, 255), 2)
    
    # Draw tree
    cv2.line(img, (150, 512), (150, 400), (255, 255, 255), 3)  # Trunk
    cv2.ellipse(img, (150, 350), (60, 80), 0, 0, 360, (255, 255, 255), 2)  # Foliage
    
    # Draw house
    cv2.rectangle(img, (400, 450), (550, 512), (255, 255, 255), 2)  # Body
    pts_roof = np.array([[380, 450], [475, 380], [570, 450]], np.int32)
    cv2.polylines(img, [pts_roof], True, (255, 255, 255), 2)  # Roof
    cv2.rectangle(img, (450, 480), (500, 512), (255, 255, 255), 2)  # Door
    
    return Image.fromarray(img)

# Create custom sketch
custom_sketch = draw_simple_scene()

plt.figure(figsize=(8, 8))
plt.imshow(custom_sketch)
plt.title("Custom Sketch for ControlNet")
plt.axis('off')
plt.show()

In [None]:
# Clean up depth pipeline
del pipe_depth, controlnet_depth
gc.collect()
torch.cuda.empty_cache()

# Reload Canny for sketch-to-image
print("Reloading Canny ControlNet for sketch-to-image...")

controlnet_canny = ControlNetModel.from_pretrained(
    "diffusers/controlnet-canny-sdxl-1.0",
    torch_dtype=torch.bfloat16,
)

pipe_canny = StableDiffusionXLControlNetPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    controlnet=controlnet_canny,
    torch_dtype=torch.bfloat16,
    variant="fp16",
)
pipe_canny = pipe_canny.to(device)

In [None]:
# Generate from custom sketch
prompts = [
    "A peaceful countryside landscape at sunset, oil painting, warm colors",
    "A fantasy village in a magical land, digital art, vibrant",
    "A winter scene with snow-covered mountains, photorealistic",
]

images = []
for prompt in prompts:
    print(f"Generating: {prompt[:40]}...")
    generator = torch.Generator(device=device).manual_seed(42)
    
    image = pipe_canny(
        prompt=prompt,
        negative_prompt="blurry, low quality, ugly",
        image=custom_sketch,
        num_inference_steps=25,
        guidance_scale=7.5,
        controlnet_conditioning_scale=0.7,
        generator=generator,
    ).images[0]
    
    images.append(image)

# Display
fig, axes = plt.subplots(2, 2, figsize=(14, 14))
axes = axes.flatten()

axes[0].imshow(custom_sketch)
axes[0].set_title("Your Sketch")
axes[0].axis('off')

for i, (img, prompt) in enumerate(zip(images, prompts)):
    axes[i+1].imshow(img)
    axes[i+1].set_title(prompt[:35] + "...", fontsize=10)
    axes[i+1].axis('off')

plt.suptitle("From Sketch to Art with ControlNet", fontsize=14, y=1.02)
plt.tight_layout()
plt.show()

print("\nüé® Your simple sketch became beautiful artwork!")

---

## Part 5: Creating Consistent Characters

One powerful use of ControlNet is maintaining character consistency across multiple images.

In [None]:
# Generate a base character
character_prompt = "A portrait of a young woman with red hair and green eyes, fantasy style, detailed face, looking at camera"

generator = torch.Generator(device=device).manual_seed(12345)

base_character = pipe_canny.text_encoder.device  # Just to use the base pipe

# Use the underlying pipe without controlnet for first generation
from diffusers import StableDiffusionXLPipeline

# Generate base
pipe_base = StableDiffusionXLPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    torch_dtype=torch.bfloat16,
    variant="fp16",
)
pipe_base = pipe_base.to(device)

generator = torch.Generator(device=device).manual_seed(12345)
base_character = pipe_base(
    prompt=character_prompt,
    negative_prompt="blurry, low quality, deformed",
    num_inference_steps=25,
    generator=generator,
).images[0]

plt.figure(figsize=(6, 6))
plt.imshow(base_character)
plt.title("Base Character")
plt.axis('off')
plt.show()

In [None]:
# Extract edges from the character
character_edges = get_canny_edges(base_character, low_threshold=50, high_threshold=150)

# Generate variations with the same structure but different contexts
variation_prompts = [
    "The same red-haired woman in a medieval castle, fantasy portrait",
    "The same red-haired woman in a cyberpunk city, neon lights, futuristic",
    "The same red-haired woman in a enchanted forest, magical atmosphere",
    "The same red-haired woman in an art studio, painting, warm light",
]

variations = []
for prompt in variation_prompts:
    print(f"Generating: {prompt[:40]}...")
    generator = torch.Generator(device=device).manual_seed(42)
    
    image = pipe_canny(
        prompt=prompt,
        negative_prompt="blurry, low quality, different person, different face",
        image=character_edges,
        num_inference_steps=25,
        guidance_scale=7.5,
        controlnet_conditioning_scale=0.6,
        generator=generator,
    ).images[0]
    
    variations.append(image)

# Display
fig, axes = plt.subplots(2, 3, figsize=(15, 10))
axes = axes.flatten()

axes[0].imshow(base_character)
axes[0].set_title("Base Character")
axes[0].axis('off')

axes[1].imshow(character_edges)
axes[1].set_title("Edge Control")
axes[1].axis('off')

for i, (img, prompt) in enumerate(zip(variations, variation_prompts)):
    axes[i+2].imshow(img)
    axes[i+2].set_title(prompt[20:50] + "...", fontsize=9)
    axes[i+2].axis('off')

plt.suptitle("Consistent Character Across Scenes", fontsize=14, y=1.02)
plt.tight_layout()
plt.show()

print("\nüí° The character maintains consistency while the scene changes!")

---

## ‚ö†Ô∏è Common Mistakes

### Mistake 1: Control Image Wrong Size

```python
# ‚ùå Wrong: Control image different size than output
control = Image.open("control.png")  # 512x512
image = pipe(control=control, width=1024, height=1024)  # Mismatch!

# ‚úÖ Right: Match sizes
control = Image.open("control.png").resize((1024, 1024))
image = pipe(image=control, width=1024, height=1024)
```

### Mistake 2: Conditioning Scale Too High

```python
# ‚ùå Wrong: Too strict, loses creativity
controlnet_conditioning_scale = 1.5  

# ‚úÖ Right: Balanced control
controlnet_conditioning_scale = 0.5  # Good starting point
```

### Mistake 3: Wrong Control Type

```python
# ‚ùå Wrong: Giving edges to depth ControlNet
depth_controlnet = ControlNetModel.from_pretrained("depth-model")
pipe(image=canny_edges)  # Wrong input type!

# ‚úÖ Right: Match control image to model type
depth_controlnet = ControlNetModel.from_pretrained("depth-model")
pipe(image=depth_map)  # Correct!
```

---

## üéâ Checkpoint

You've learned:
- ‚úÖ How ControlNet adds structural control to diffusion
- ‚úÖ Using Canny edge detection for outline-guided generation
- ‚úÖ Controlling conditioning strength
- ‚úÖ Creating custom control images from sketches
- ‚úÖ Maintaining character consistency across scenes

---

## üöÄ Challenge (Optional)

1. **Architecture Control**: Draw a simple building outline and generate it in 5 different architectural styles
2. **Pose Control**: Find an OpenPose model and control character poses
3. **Multi-ControlNet**: Combine edges + depth for even more control

---

## üßπ Cleanup

In [None]:
# Clean up all pipelines
del pipe_canny, controlnet_canny, pipe_base
gc.collect()
torch.cuda.empty_cache()
print("GPU memory cleared!")

---

## Next Steps

Proceed to **Lab 2.6.4: Flux Exploration** to explore the Flux architecture and compare it with SDXL!