# The Speed Landscape

**Module 7.3, Lesson 3** | CourseAI

You have learned six ways to generate images faster across three modules. Each was taught as a response to a limitation of the previous approach. This notebook puts them head-to-head so you can see the tradeoffs for yourself.

**What you will do:**
- Generate the same image with DPM-Solver++ at different step counts and time each one—see where the "free speedup" ends and quality loss begins
- Compare DPM-Solver++ (Level 1) against LCM-LoRA (Level 3a) on the same model and prompt—a head-to-head acceleration showdown
- Compose LCM-LoRA with a style LoRA and verify that speed + style combine as the framework predicts
- Build your own decision cheat sheet by analyzing scenarios and verifying one choice with code

**For each exercise, PREDICT the output before running the cell.**

Every concept in this notebook comes from the lesson. The four-dimensional framework (speed, quality, flexibility, composability), the quality-speed curve, the composability rules. No new theory—just hands-on comparison of approaches you already understand.

**Estimated time:** 35-50 minutes. All exercises use pre-trained models (no training). Requires a GPU runtime for reasonable generation times.

## Setup

Run this cell to install dependencies and configure the environment.

**Important:** Switch to a GPU runtime in Colab (Runtime > Change runtime type > T4 GPU). Generation works on CPU but is very slow.

In [None]:
!pip install -q diffusers transformers accelerate safetensors peft

In [None]:
import torch
import time
import matplotlib.pyplot as plt
from diffusers import (
    StableDiffusionPipeline,
    DDPMScheduler,
    LCMScheduler,
    DPMSolverMultistepScheduler,
)
from IPython.display import display

# Reproducible results
SEED = 42

# Device
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
dtype = torch.float16 if torch.cuda.is_available() else torch.float32

# Nice plots
plt.style.use('dark_background')
plt.rcParams['figure.figsize'] = [14, 5]
plt.rcParams['figure.dpi'] = 100

print(f'Device: {device}')
print(f'Dtype: {dtype}')
if device.type == 'cpu':
    print('WARNING: No GPU detected. Generation will be very slow.')
    print('Switch to a GPU runtime: Runtime > Change runtime type > T4 GPU')
print()
print('Setup complete.')

## Shared Helpers

Utility functions for timing generation and displaying image comparisons. Run this cell now—these are used across all four exercises.

In [None]:
def generate_timed(pipe, prompt, num_inference_steps, guidance_scale, seed=SEED):
    """Generate an image and return (image, elapsed_seconds)."""
    generator = torch.Generator(device=device).manual_seed(seed)
    start = time.time()
    result = pipe(
        prompt,
        num_inference_steps=num_inference_steps,
        guidance_scale=guidance_scale,
        generator=generator,
    )
    elapsed = time.time() - start
    return result.images[0], elapsed


def show_image_row(images, titles, suptitle=None, figsize=None):
    """Display a row of PIL images with titles."""
    n = len(images)
    fig_w = figsize[0] if figsize else max(4 * n, 12)
    fig_h = figsize[1] if figsize else 5
    fig, axes = plt.subplots(1, n, figsize=(fig_w, fig_h))
    if n == 1:
        axes = [axes]
    for ax, img, title in zip(axes, images, titles):
        ax.imshow(img)
        ax.set_title(title, fontsize=10)
        ax.axis('off')
    if suptitle:
        plt.suptitle(suptitle, fontsize=13, y=1.02)
    plt.tight_layout()
    plt.show()


def print_timing_table(labels, times):
    """Print a formatted timing comparison table."""
    max_label = max(len(l) for l in labels)
    print("Timing comparison:")
    for label, t in zip(labels, times):
        print(f"  {label:<{max_label}}  {t:.2f}s")
    if len(times) >= 2:
        print(f"  Speedup (first vs last): {times[0] / times[-1]:.1f}x")


print('Helpers defined: generate_timed, show_image_row, print_timing_table')

---

## Exercise 1: The Sampler Swap `[Guided]`

Level 1 acceleration is the simplest: swap the scheduler to DPM-Solver++, a higher-order ODE solver that reads trajectory curvature. No model changes, no adapter, no retraining. One line of code.

From the lesson: the quality-speed curve is **nonlinear**. The first 50x speedup (1000 to 20 steps) costs nothing—it just removes redundant computation. The next reduction (20 to 10 steps) starts showing quality loss.

We will generate the same image at three step counts to see this curve in action:
1. **DDPM at 50 steps**—the baseline (many small steps, Markov chain)
2. **DPM-Solver++ at 20 steps**—Level 1 acceleration (higher-order ODE solver)
3. **DPM-Solver++ at 10 steps**—pushing Level 1 further

**Before running, predict:**
- How much quality difference will you see between DDPM at 50 steps and DPM-Solver++ at 20 steps? (Hint: from the lesson, the first speedup is "free.")
- What about DPM-Solver++ at 10 steps—will you start to see degradation?

In [None]:
# ============================================================
# Exercise 1: Load SD v1.5 and compare sampler step counts
# ============================================================

PROMPT = "a small cottage on a hillside at sunset, warm golden light, detailed"

# --- Step 1: Load SD v1.5 ---
print("Loading SD v1.5...")
pipe = StableDiffusionPipeline.from_pretrained(
    "stable-diffusion-v1-5/stable-diffusion-v1-5",
    torch_dtype=dtype,
    safety_checker=None,
).to(device)

# Save the original scheduler config so we can swap schedulers
original_scheduler_config = pipe.scheduler.config
print("SD v1.5 loaded.")

In [None]:
# --- Step 2: Generate with DDPM at 50 steps (baseline) ---
pipe.scheduler = DDPMScheduler.from_config(original_scheduler_config)

print("Generating: DDPM at 50 steps (baseline)...")
img_ddpm_50, time_ddpm_50 = generate_timed(
    pipe, PROMPT, num_inference_steps=50, guidance_scale=7.5
)
print(f"  Done in {time_ddpm_50:.2f}s")

# --- Step 3: Generate with DPM-Solver++ at 20 steps ---
pipe.scheduler = DPMSolverMultistepScheduler.from_config(original_scheduler_config)

print("Generating: DPM-Solver++ at 20 steps...")
img_dpm_20, time_dpm_20 = generate_timed(
    pipe, PROMPT, num_inference_steps=20, guidance_scale=7.5
)
print(f"  Done in {time_dpm_20:.2f}s")

# --- Step 4: Generate with DPM-Solver++ at 10 steps ---
print("Generating: DPM-Solver++ at 10 steps...")
img_dpm_10, time_dpm_10 = generate_timed(
    pipe, PROMPT, num_inference_steps=10, guidance_scale=7.5
)
print(f"  Done in {time_dpm_10:.2f}s")

In [None]:
# --- Step 5: Compare all three side by side ---
show_image_row(
    [img_ddpm_50, img_dpm_20, img_dpm_10],
    [
        f"DDPM | 50 steps\n{time_ddpm_50:.1f}s | guidance=7.5",
        f"DPM-Solver++ | 20 steps\n{time_dpm_20:.1f}s | guidance=7.5",
        f"DPM-Solver++ | 10 steps\n{time_dpm_10:.1f}s | guidance=7.5",
    ],
    suptitle=f'Level 1 Acceleration: "{PROMPT}"',
)

print_timing_table(
    ["DDPM (50 steps):", "DPM-Solver++ (20 steps):", "DPM-Solver++ (10 steps):"],
    [time_ddpm_50, time_dpm_20, time_dpm_10],
)
print()
print("What changed between images:")
print("  - DDPM to DPM-Solver++ at 20: scheduler swap only. Same model, same weights.")
print("    Quality should be nearly identical. This is the 'free speedup' from the lesson.")
print("  - DPM-Solver++ 20 to 10: same scheduler, fewer steps.")
print("    You may see softening in textures or loss of fine detail.")
print("    This is where Level 1 starts hitting its floor.")

### What Just Happened

You saw Level 1 acceleration in action—the simplest and most accessible speedup:

- **DDPM at 50 steps vs DPM-Solver++ at 20 steps: nearly identical quality.** The DPM-Solver++ higher-order ODE solver reads trajectory curvature to take larger, more accurate steps. It removes the *redundancy* in DDPM's many small steps—this is wasted computation eliminated, not quality sacrificed. This is the "free speedup" region from the lesson's quality-speed curve.

- **DPM-Solver++ at 10 steps: quality starts to degrade.** You may notice softening in fine textures, loss of small details, or slightly less coherent composition. The solver is now taking steps large enough that even its higher-order corrections cannot fully compensate. This is the boundary of the "free" region—beyond this, Level 1 needs help from Level 2 or Level 3.

- **Nothing changed about the model.** Same U-Net weights, same VAE, same text encoder. The only change was the scheduler class and step count. This is the defining property of Level 1: **it is an inference-time choice that works with any diffusion model**.

- **The timing confirms the speedup is roughly proportional to step count.** Each step costs about the same amount of compute (one forward pass through the U-Net). Fewer steps = proportionally less time.

---

## Exercise 2: LCM-LoRA vs DPM-Solver++ `[Guided]`

Level 1 (DPM-Solver++ at 20 steps) gave us a free speedup. But what if we want to go faster—say, 4 steps? That is where Level 3a comes in.

From the lesson: LCM-LoRA captures the consistency distillation as a 4 MB LoRA adapter. It is a different *kind* of acceleration—not a better ODE solver, but a direct mapping that bypasses the trajectory entirely.

We will compare them head-to-head on the same model and prompt:
1. **DPM-Solver++ at 20 steps**—Level 1 (our best result from Exercise 1)
2. **LCM-LoRA at 8 steps**—Level 3a (moderate step count)
3. **LCM-LoRA at 4 steps**—Level 3a (the typical sweet spot)

**Before running, predict:**
- LCM-LoRA at 4 steps uses the consistency objective—it was trained to produce good results in few steps. Will it match DPM-Solver++ at 20 steps in quality?
- What about at 8 steps? More steps with LCM-LoRA means more consistency refinement cycles. Will 8 steps look better than 4?
- Which approach will be faster: DPM-Solver++ at 20 steps or LCM-LoRA at 4 steps?

In [None]:
# ============================================================
# Exercise 2: Head-to-head—DPM-Solver++ vs LCM-LoRA
# ============================================================

# We reuse the pipeline from Exercise 1.
# If you restarted the runtime, re-run Exercise 1 first.

# --- Step 1: DPM-Solver++ at 20 steps (Level 1 best) ---
# We already have this from Exercise 1, but let's regenerate for a clean comparison.
pipe.scheduler = DPMSolverMultistepScheduler.from_config(original_scheduler_config)

print("Generating: DPM-Solver++ at 20 steps (Level 1)...")
img_level1, time_level1 = generate_timed(
    pipe, PROMPT, num_inference_steps=20, guidance_scale=7.5
)
print(f"  Done in {time_level1:.2f}s")

In [None]:
# --- Step 2: Load LCM-LoRA and generate at 8 steps and 4 steps ---
print("Loading LCM-LoRA adapter...")
pipe.load_lora_weights("latent-consistency/lcm-lora-sdv1-5")
pipe.scheduler = LCMScheduler.from_config(original_scheduler_config)
print("LCM-LoRA loaded. Scheduler swapped to LCMScheduler.")

# Note: LCM uses a reduced guidance scale (1.5 instead of 7.5).
# The augmented PF-ODE already incorporates guidance into the trajectory.

print("Generating: LCM-LoRA at 8 steps (Level 3a)...")
img_lcm_8, time_lcm_8 = generate_timed(
    pipe, PROMPT, num_inference_steps=8, guidance_scale=1.5
)
print(f"  Done in {time_lcm_8:.2f}s")

print("Generating: LCM-LoRA at 4 steps (Level 3a)...")
img_lcm_4, time_lcm_4 = generate_timed(
    pipe, PROMPT, num_inference_steps=4, guidance_scale=1.5
)
print(f"  Done in {time_lcm_4:.2f}s")

In [None]:
# --- Step 3: Compare all three side by side ---
show_image_row(
    [img_level1, img_lcm_8, img_lcm_4],
    [
        f"Level 1: DPM-Solver++\n20 steps | {time_level1:.1f}s | cfg=7.5",
        f"Level 3a: LCM-LoRA\n8 steps | {time_lcm_8:.1f}s | cfg=1.5",
        f"Level 3a: LCM-LoRA\n4 steps | {time_lcm_4:.1f}s | cfg=1.5",
    ],
    suptitle=f'Level 1 vs Level 3a: "{PROMPT}"',
)

print_timing_table(
    ["DPM-Solver++ (20 steps):", "LCM-LoRA (8 steps):", "LCM-LoRA (4 steps):"],
    [time_level1, time_lcm_8, time_lcm_4],
)
print()
print("These are two fundamentally different acceleration strategies:")
print("  - DPM-Solver++ (Level 1): a better ODE solver stepping along the trajectory.")
print("    Works with any model. No adapter needed. 20 steps, high quality.")
print("  - LCM-LoRA (Level 3a): a consistency model that bypasses the trajectory.")
print("    Needs the 4 MB adapter + LCMScheduler. 4 steps, good quality.")
print()
print("The framework from the lesson evaluates this across four dimensions:")
print("  Speed:         LCM-LoRA wins (4 steps vs 20)")
print("  Quality:       DPM-Solver++ has an edge (20 ODE steps vs 4 consistency steps)")
print("  Flexibility:   DPM-Solver++ wins (no adapter needed, works on anything)")
print("  Composability: LCM-LoRA wins (composable with style LoRAs and ControlNet)")

### What Just Happened

You saw Level 1 and Level 3a compared directly—the same model, same prompt, same seed, different acceleration strategies:

- **DPM-Solver++ at 20 steps produces the highest quality.** It is a mathematical ODE solver that takes carefully computed steps along the diffusion trajectory. At 20 steps, it has enough steps to capture the trajectory curvature accurately. But it takes 20 forward passes through the U-Net.

- **LCM-LoRA at 4 steps is faster with a small quality tradeoff.** The consistency model does not follow the trajectory step-by-step—it maps directly from noisy input to clean output. The quality is good but may show slight softening compared to 20-step DPM-Solver++. This is the "cheap speedup" region of the quality-speed curve.

- **LCM-LoRA at 8 steps shows diminishing returns.** The improvement from 4 to 8 steps is modest—the consistency model was optimized for very few steps. Going beyond 4 steps provides marginal gains, unlike DPM-Solver++ which benefits more linearly from additional steps.

- **Neither approach is universally better.** This is the core insight from the lesson: the decision has four dimensions. If you want maximum quality with no setup, DPM-Solver++ at 20 steps wins. If you want fast previews and plan to compose with other LoRAs, LCM-LoRA at 4 steps wins. The "right" answer depends on your constraints.

---

In [None]:
# --- Cleanup: unload LCM-LoRA so we start fresh for Exercise 3 ---
pipe.unload_lora_weights()
pipe.scheduler = DPMSolverMultistepScheduler.from_config(original_scheduler_config)
print("LCM-LoRA unloaded. Scheduler reset to DPM-Solver++.")

## Exercise 3: The Composability Test `[Supported]`

From the lesson: approaches **compose across levels** but **conflict within levels**. LCM-LoRA (Level 3a) and a style LoRA are both LoRA adapters, but they target different behaviors—speed and style respectively. The framework predicts they should compose cleanly.

Your task: verify this prediction by generating three comparison images:
1. **Style LoRA alone** at 20 steps with DPM-Solver++—the style baseline
2. **LCM-LoRA alone** at 4 steps—the speed baseline (from Exercise 2)
3. **LCM-LoRA + style LoRA** at 4 steps—the composition

If the composition works, image 3 should look like image 1's style but generated in image 2's time.

Fill in the TODO markers to complete the comparison.

In [None]:
# ============================================================
# Exercise 3: Composability—LCM-LoRA + style LoRA
# ============================================================

STYLE_LORA = "artificialguybr/pixelartredmond-1-5v-pixel-art-loras-for-sd-1-5"
PROMPT_3 = "a lighthouse on a rocky coast during a storm, waves crashing, PixArFK"

# --- Step 1: Style LoRA alone at 20 steps (style baseline) ---
# Load the style LoRA with a named adapter so we can manage multiple LoRAs.
print("Loading style LoRA...")
pipe.load_lora_weights(STYLE_LORA, adapter_name="style")
pipe.set_adapters(["style"], adapter_weights=[1.0])
pipe.scheduler = DPMSolverMultistepScheduler.from_config(original_scheduler_config)

print("Generating: style LoRA alone, DPM-Solver++ at 20 steps...")
img_style_only, time_style_only = generate_timed(
    pipe, PROMPT_3, num_inference_steps=20, guidance_scale=7.5
)
print(f"  Done in {time_style_only:.2f}s")

In [None]:
# --- Step 2: LCM-LoRA alone at 4 steps (speed baseline) ---

# TODO: Load the LCM-LoRA as a second named adapter (adapter_name="lcm").
# Use pipe.load_lora_weights("latent-consistency/lcm-lora-sdv1-5", adapter_name="lcm")
raise NotImplementedError(
    "TODO: Load the LCM-LoRA with adapter_name='lcm'. See the hint above."
)

# TODO: Activate ONLY the LCM adapter (deactivate the style adapter).
# Use pipe.set_adapters(["lcm"], adapter_weights=[1.0])
raise NotImplementedError(
    "TODO: Set only the 'lcm' adapter active with weight 1.0."
)

# TODO: Swap the scheduler to LCMScheduler.
# Use pipe.scheduler = LCMScheduler.from_config(original_scheduler_config)
raise NotImplementedError(
    "TODO: Swap scheduler to LCMScheduler."
)

print("Generating: LCM-LoRA alone at 4 steps...")
img_lcm_only, time_lcm_only = generate_timed(
    pipe, PROMPT_3, num_inference_steps=4, guidance_scale=1.5
)
print(f"  Done in {time_lcm_only:.2f}s")

In [None]:
# --- Step 3: LCM-LoRA + style LoRA composed at 4 steps ---

# TODO: Activate BOTH adapters with appropriate weights.
# Use pipe.set_adapters(["lcm", "style"], adapter_weights=[1.0, 1.0])
# Keep the LCMScheduler from Step 2—we are generating with LCM.
raise NotImplementedError(
    "TODO: Activate both 'lcm' and 'style' adapters. See the hint above."
)

print("Generating: LCM-LoRA + style LoRA composed at 4 steps...")
img_composed, time_composed = generate_timed(
    pipe, PROMPT_3, num_inference_steps=4, guidance_scale=1.5
)
print(f"  Done in {time_composed:.2f}s")

In [None]:
# --- Step 4: Compare all three ---
show_image_row(
    [img_style_only, img_lcm_only, img_composed],
    [
        f"Style LoRA only\n20 steps | {time_style_only:.1f}s | cfg=7.5",
        f"LCM-LoRA only\n4 steps | {time_lcm_only:.1f}s | cfg=1.5",
        f"LCM + Style composed\n4 steps | {time_composed:.1f}s | cfg=1.5",
    ],
    suptitle=f'Composability Test: "{PROMPT_3}"',
)

print_timing_table(
    ["Style LoRA alone (20 steps):", "LCM-LoRA alone (4 steps):", "LCM + Style (4 steps):"],
    [time_style_only, time_lcm_only, time_composed],
)
print()
print("Compare the three images:")
print("  - Left (style only): the style target. Should show pixel art aesthetics.")
print("  - Middle (LCM only): the speed target. Fast generation, no style adapter.")
print("  - Right (composed): should combine pixel art style with 4-step speed.")
print()
print("If the composition works, the right image has the style of the left")
print("and the speed of the middle. This confirms the lesson's prediction:")
print("LoRA bypasses are additive, speed and style are orthogonal skills.")

<details>
<summary>Solution</summary>

The key insight is that LoRA bypasses are additive. LCM-LoRA modifies the denoising dynamics ("generate faster") while the style LoRA modifies the output aesthetics ("generate in pixel art style"). They target different aspects of the model's behavior, so they compose without significant interference.

```python
# Step 2: LCM-LoRA alone
pipe.load_lora_weights(
    "latent-consistency/lcm-lora-sdv1-5",
    adapter_name="lcm",
)
pipe.set_adapters(["lcm"], adapter_weights=[1.0])
pipe.scheduler = LCMScheduler.from_config(original_scheduler_config)

# Step 3: Both adapters composed
pipe.set_adapters(["lcm", "style"], adapter_weights=[1.0, 1.0])
```

**Why it works:** Both LoRAs modify the U-Net weights via low-rank bypasses: $W_{\text{effective}} = W_{\text{base}} + \alpha_1 B_1 A_1 + \alpha_2 B_2 A_2$. The LCM-LoRA was trained to collapse the denoising trajectory into 4 steps. The style LoRA was trained to produce pixel art aesthetics. Because they target different "axes" of behavior, the sum of both bypasses achieves both goals simultaneously.

**Common mistakes:**
- Forgetting to swap to `LCMScheduler` when using LCM-LoRA (produces garbage at 4 steps)
- Using `guidance_scale=7.5` with LCM-LoRA (causes oversaturation—the augmented PF-ODE already incorporates guidance)
- Loading LoRAs without `adapter_name` (makes it impossible to control weights independently)
- Not including the trigger word (`PixArFK`) in the prompt—the style LoRA may not activate without it

</details>

### What Just Happened

You verified the composability prediction from the lesson's framework:

- **LCM-LoRA + style LoRA compose.** The composed image should show the pixel art style from the style LoRA at the 4-step speed of LCM-LoRA. Two LoRA adapters, targeting different behaviors (speed and style), combined successfully.

- **This works because the adapters are at different "levels" in the framework.** LCM-LoRA is a Level 3a acceleration technique. The style LoRA is not an acceleration technique at all—it is a content/style modification. They do not conflict because they address different aspects of generation.

- **The scheduler must match the acceleration strategy.** When using LCM-LoRA, you need LCMScheduler and low guidance (1.5). When using only the style LoRA with DPM-Solver++, you use standard guidance (7.5). Mixing these up produces poor results—not because the approaches conflict, but because the configuration is wrong.

- **This is the "speed as a skill" insight from Latent Consistency & Turbo.** The LCM-LoRA adds the skill of fast generation. The style LoRA adds the skill of pixel art. Skills are composable when they target different behaviors.

---

## Exercise 4: Build Your Decision Cheat Sheet `[Independent]`

The lesson gave you a four-dimensional framework: **speed, quality, flexibility, composability**. This exercise asks you to apply it.

Below are three generation scenarios. For each one:
1. **Write** which acceleration approach you would choose
2. **Explain** why, evaluating across all four dimensions
3. **Pick one** scenario and verify your choice by implementing it and comparing with an alternative

The reasoning matters more than the code. The framework should guide your choice.

### Scenario A: The Concept Artist

A concept artist wants to rapidly explore compositions for a client pitch. They use a **vanilla SD v1.5 model** (no custom fine-tune), need to generate **50+ variations** in an afternoon, and do not need production-quality output—they just need to see if a composition idea works.

### Scenario B: The Game Studio

A small game studio has a **custom SD 1.5 fine-tune** trained on their game's art style. They want to generate **character portraits** for their wiki at **good quality**. They also use a **ControlNet pose model** for consistent character poses. Speed matters but quality matters more.

### Scenario C: The A/B Tester

A product team wants to **A/B test** generated hero images on their landing page. They need to generate images **on every page load** (real-time), using a **fixed prompt template** with no custom models or adapters. Latency must be **under 1 second**. Slight quality reduction is acceptable.

In [None]:
# ============================================================
# Exercise 4: Your decision cheat sheet
# ============================================================
#
# Step 1: For each scenario, write your choice and reasoning
#          in the markdown cell below (or as comments here).
#
# Step 2: Pick ONE scenario and verify your choice.
#          - Implement the approach you chose
#          - Implement an alternative approach
#          - Compare quality, speed, and any other relevant dimension
#
# You have access to the pipeline from earlier exercises.
# Clean up LoRA adapters first:

pipe.unload_lora_weights()
pipe.scheduler = DPMSolverMultistepScheduler.from_config(original_scheduler_config)
print("Pipeline reset. Ready for your implementation.")
print()
print("Available tools:")
print("  - pipe: SD v1.5 pipeline (currently no LoRA, DPM-Solver++ scheduler)")
print("  - generate_timed(pipe, prompt, steps, guidance): returns (image, time)")
print("  - show_image_row(images, titles): display comparison")
print("  - LCMScheduler, DPMSolverMultistepScheduler: scheduler classes")
print("  - LCM-LoRA: 'latent-consistency/lcm-lora-sdv1-5'")
print()
print("Write your scenario analysis in the next markdown cell,")
print("then implement your verification in the code cell after that.")

### Your Analysis

**Scenario A—The Concept Artist:**

*Your choice:* [Which approach?]

*Your reasoning (evaluate all 4 dimensions):*

- Speed:
- Quality:
- Flexibility:
- Composability:

---

**Scenario B—The Game Studio:**

*Your choice:* [Which approach?]

*Your reasoning (evaluate all 4 dimensions):*

- Speed:
- Quality:
- Flexibility:
- Composability:

---

**Scenario C—The A/B Tester:**

*Your choice:* [Which approach?]

*Your reasoning (evaluate all 4 dimensions):*

- Speed:
- Quality:
- Flexibility:
- Composability:

In [None]:
# ============================================================
# Your verification: pick one scenario and implement it
# ============================================================
#
# Generate with your chosen approach AND an alternative.
# Compare the results.
#
# Your code here:


<details>
<summary>Solution</summary>

There is no single correct answer—the point is applying the framework. Here is how each scenario evaluates:

**Scenario A—The Concept Artist:**

**Choose: DPM-Solver++ at 15-20 steps (Level 1).**
- Speed: 15-20 steps is fast enough for batch exploration. 2-3x faster than the default 50 steps.
- Quality: No quality loss at 20 steps—"free speedup" region.
- Flexibility: Works immediately with vanilla SD 1.5. No downloading adapters.
- Composability: Not relevant—no custom adapters needed.

Why not LCM-LoRA? It would be faster (4 steps vs 20), but requires downloading the adapter. For quick exploration with a vanilla model, Level 1 has zero friction.

**Scenario B—The Game Studio:**

**Choose: LCM-LoRA at 4-8 steps (Level 3a).**
- Speed: 4-8 steps is fast. Good for batch generation.
- Quality: At 8 steps, quality is good. "Cheap speedup" region.
- Flexibility: LCM-LoRA works with their custom SD 1.5 fine-tune—same architecture.
- Composability: LCM-LoRA composes with ControlNet for pose control. This is the decisive factor.

Why not DPM-Solver++ at 20 steps? It would work too, but the studio wants speed AND ControlNet. LCM-LoRA at 8 steps with ControlNet gives both.

**Scenario C—The A/B Tester:**

**Choose: LCM-LoRA at 4 steps (Level 3a) or SDXL Turbo at 1-4 steps (Level 3b) if latency is critical.**
- Speed: Must be under 1 second. LCM-LoRA at 4 steps on a GPU should hit this. SDXL Turbo at 1 step is even faster.
- Quality: Slight reduction acceptable. Both approaches deliver at 4 steps.
- Flexibility: Fixed prompt template, no custom models—no flexibility needed.
- Composability: No adapters needed—composability is not a factor.

For this scenario, if sub-500ms is truly needed, SDXL Turbo's 1-step generation is the strongest option. The locked-model tradeoff does not matter because they do not need custom models or adapters. (We cannot test SDXL Turbo in this notebook due to VRAM constraints, but we can verify LCM-LoRA at 4 steps.)

**Example verification for Scenario A:**

```python
PROMPT_VERIFY = "a futuristic cityscape with flying cars, neon lights, wide angle"

# Approach 1: DPM-Solver++ at 20 steps (our choice)
pipe.scheduler = DPMSolverMultistepScheduler.from_config(original_scheduler_config)
img_choice, time_choice = generate_timed(
    pipe, PROMPT_VERIFY, num_inference_steps=20, guidance_scale=7.5
)

# Alternative: DPM-Solver++ at 50 steps (slower baseline)
img_alt, time_alt = generate_timed(
    pipe, PROMPT_VERIFY, num_inference_steps=50, guidance_scale=7.5
)

show_image_row(
    [img_choice, img_alt],
    [
        f"Our choice: DPM-Solver++ 20 steps\n{time_choice:.1f}s",
        f"Alternative: DPM-Solver++ 50 steps\n{time_alt:.1f}s",
    ],
    suptitle="Scenario A Verification",
)

print(f"Speedup: {time_alt / time_choice:.1f}x faster")
print("Quality comparison: nearly identical. The 'free speedup' confirmed.")
print("For 50+ variations in an afternoon, this time savings adds up.")
```

The important thing is not which specific code you wrote, but whether your analysis correctly evaluated the scenario across all four dimensions and reached a well-reasoned conclusion.

</details>

---

## Key Takeaways

1. **The first speedup is free.** Going from 50 to 20 steps with DPM-Solver++ costs zero quality. This is the "free speedup" region—redundant computation removed by a better ODE solver. Every diffusion user should take this speedup.

2. **Level 1 and Level 3a are different tools, not different versions.** DPM-Solver++ (Level 1) is a better solver you swap in instantly. LCM-LoRA (Level 3a) is a consistency model adapter that bypasses the trajectory. Neither replaces the other—they serve different scenarios.

3. **Composability is a real dimension.** LCM-LoRA composes with style LoRAs because they target orthogonal behaviors (speed and style). This is not theoretical—you verified it produces a styled image at 4-step speed.

4. **The decision framework has four dimensions, not one.** Speed alone does not determine the best approach. Quality, flexibility (works with your model?), and composability (works with your adapters?) often matter more in practice.

5. **The right approach depends on your constraints.** A concept artist exploring compositions needs Level 1 (zero friction). A game studio with a custom model and ControlNet needs LCM-LoRA (composability). A real-time app needs maximum speed (Level 3a or 3b). The framework tells you which.