# Latent Consistency & Turbo

**Module 7.3, Lesson 2** | CourseAI

You know the theory from the lesson: Latent Consistency Models (LCM) apply the consistency distillation you already know to Stable Diffusion's latent space, and LCM-LoRA captures that distillation as a tiny plug-and-play adapter. This notebook makes that concrete.

**What you will do:**
- Load LCM-LoRA onto SD v1.5 and compare 50-step standard, 4-step standard (bad), and 4-step LCM-LoRA (good) generation with timing
- Apply the same LCM-LoRA to a community fine-tune and verify that it generalizes — the "one adapter, many models" insight
- Sweep step counts (1, 2, 4, 8) and guidance scales (1.0, 1.5, 2.0, 4.0) to find the practical sweet spots for LCM-LoRA
- Compose LCM-LoRA with a style LoRA and compare to the style LoRA alone at 50 steps

**For each exercise, PREDICT the output before running the cell.**

Every concept in this notebook comes from the lesson. LCM as consistency distillation on latent diffusion, LCM-LoRA as a speed adapter, the reduced guidance scale, LoRA composability. No new theory — just hands-on practice with production-ready tools.

**Estimated time:** 35–50 minutes. All exercises use pre-trained models (no training). Requires a GPU runtime for reasonable generation times.

## Setup

Run this cell to install dependencies and configure the environment.

**Important:** Switch to a GPU runtime in Colab (Runtime → Change runtime type → T4 GPU). Generation works on CPU but is very slow.

In [None]:
!pip install -q diffusers transformers accelerate safetensors peft

In [None]:
import torch
import time
import matplotlib.pyplot as plt
from diffusers import (
    StableDiffusionPipeline,
    LCMScheduler,
    DPMSolverMultistepScheduler,
)
from IPython.display import display

# Reproducible results
SEED = 42

# Device
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
dtype = torch.float16 if torch.cuda.is_available() else torch.float32

# Nice plots
plt.style.use('dark_background')
plt.rcParams['figure.figsize'] = [14, 5]
plt.rcParams['figure.dpi'] = 100

print(f'Device: {device}')
print(f'Dtype: {dtype}')
if device.type == 'cpu':
    print('WARNING: No GPU detected. Generation will be very slow.')
    print('Switch to a GPU runtime: Runtime → Change runtime type → T4 GPU')
print()
print('Setup complete.')

## Shared Helpers

Utility functions for timing generation, displaying image grids, and managing pipelines. Run this cell now — these are used across all four exercises.

In [None]:
def generate_timed(pipe, prompt, num_inference_steps, guidance_scale, seed=SEED):
    """Generate an image and return (image, elapsed_seconds)."""
    generator = torch.Generator(device=device).manual_seed(seed)
    start = time.time()
    result = pipe(
        prompt,
        num_inference_steps=num_inference_steps,
        guidance_scale=guidance_scale,
        generator=generator,
    )
    elapsed = time.time() - start
    return result.images[0], elapsed


def show_image_row(images, titles, suptitle=None, figsize=None):
    """Display a row of PIL images with titles."""
    n = len(images)
    fig_w = figsize[0] if figsize else max(4 * n, 12)
    fig_h = figsize[1] if figsize else 5
    fig, axes = plt.subplots(1, n, figsize=(fig_w, fig_h))
    if n == 1:
        axes = [axes]
    for ax, img, title in zip(axes, images, titles):
        ax.imshow(img)
        ax.set_title(title, fontsize=10)
        ax.axis('off')
    if suptitle:
        plt.suptitle(suptitle, fontsize=13, y=1.02)
    plt.tight_layout()
    plt.show()


def show_image_grid(images, titles, nrows, ncols, suptitle=None):
    """Display a grid of PIL images with titles."""
    fig, axes = plt.subplots(nrows, ncols, figsize=(4 * ncols, 4.5 * nrows))
    for idx, (img, title) in enumerate(zip(images, titles)):
        row = idx // ncols
        col = idx % ncols
        ax = axes[row][col] if nrows > 1 else axes[col]
        ax.imshow(img)
        ax.set_title(title, fontsize=9)
        ax.axis('off')
    # Hide unused axes
    for idx in range(len(images), nrows * ncols):
        row = idx // ncols
        col = idx % ncols
        ax = axes[row][col] if nrows > 1 else axes[col]
        ax.axis('off')
    if suptitle:
        plt.suptitle(suptitle, fontsize=13, y=1.02)
    plt.tight_layout()
    plt.show()


print('Helpers defined: generate_timed, show_image_row, show_image_grid')

---

## Exercise 1: LCM-LoRA — 4-Step Generation `[Guided]`

From the lesson: LCM-LoRA is a 4 MB LoRA adapter that turns any compatible SD model into a few-step model. It captures "how to generate in 4 steps" as a low-rank bypass — the same LoRA mechanism you know from style and subject adaptation, but capturing a *skill* instead of a *style*.

We will generate three images from the same prompt and seed:
1. **Standard SD v1.5 at 50 steps** — the baseline (good quality, slow)
2. **Standard SD v1.5 at 4 steps** — same model, fewer steps (bad quality, fast)
3. **SD v1.5 + LCM-LoRA at 4 steps** — same model with the speed adapter (good quality, fast)

We time each generation so the speedup is concrete.

**Before running, predict:**
- What will the standard model at 4 steps look like? (Hint: the model was not trained for this — it expects 20–50 steps to refine details.)
- The LCM-LoRA version also runs 4 steps. Will it be closer in quality to the 4-step standard or the 50-step standard?

In [None]:
# ============================================================
# Exercise 1: Load SD v1.5 and compare generation modes
# ============================================================

PROMPT = "a small cottage on a hillside at sunset, warm golden light, detailed"

# --- Step 1: Load SD v1.5 with standard scheduler ---
print("Loading SD v1.5...")
pipe = StableDiffusionPipeline.from_pretrained(
    "stable-diffusion-v1-5/stable-diffusion-v1-5",
    torch_dtype=dtype,
    safety_checker=None,
).to(device)

# Save the original scheduler config so we can swap back and forth
original_scheduler_config = pipe.scheduler.config
print("SD v1.5 loaded.")

In [None]:
# --- Step 2: Generate at 50 steps (standard baseline) ---
pipe.scheduler = DPMSolverMultistepScheduler.from_config(original_scheduler_config)

print("Generating: 50 steps, standard scheduler, guidance_scale=7.5...")
img_50step, time_50step = generate_timed(
    pipe, PROMPT, num_inference_steps=50, guidance_scale=7.5
)
print(f"  Done in {time_50step:.2f}s")

# --- Step 3: Generate at 4 steps (standard scheduler — bad quality) ---
print("Generating: 4 steps, standard scheduler, guidance_scale=7.5...")
img_4step_bad, time_4step_bad = generate_timed(
    pipe, PROMPT, num_inference_steps=4, guidance_scale=7.5
)
print(f"  Done in {time_4step_bad:.2f}s")

In [None]:
# --- Step 4: Load LCM-LoRA and generate at 4 steps ---
# Three lines change: load the LoRA, swap the scheduler, change steps + guidance

print("Loading LCM-LoRA adapter...")
pipe.load_lora_weights("latent-consistency/lcm-lora-sdv1-5")
pipe.scheduler = LCMScheduler.from_config(original_scheduler_config)
print("LCM-LoRA loaded. Scheduler swapped to LCMScheduler.")

# Note the reduced guidance scale: 1.5 instead of 7.5
# The augmented PF-ODE already incorporates guidance into the trajectory,
# so additional guidance is redundant and causes oversaturation.
print("Generating: 4 steps, LCM scheduler, guidance_scale=1.5...")
img_4step_lcm, time_4step_lcm = generate_timed(
    pipe, PROMPT, num_inference_steps=4, guidance_scale=1.5
)
print(f"  Done in {time_4step_lcm:.2f}s")

In [None]:
# --- Step 5: Compare all three side by side ---
show_image_row(
    [img_50step, img_4step_bad, img_4step_lcm],
    [
        f"50 Steps (standard)\n{time_50step:.1f}s | guidance=7.5",
        f"4 Steps (standard)\n{time_4step_bad:.1f}s | guidance=7.5",
        f"4 Steps + LCM-LoRA\n{time_4step_lcm:.1f}s | guidance=1.5",
    ],
    suptitle=f'SD v1.5: "{PROMPT}"',
)

print("Timing comparison:")
print(f"  50-step standard:   {time_50step:.2f}s (baseline quality)")
print(f"  4-step standard:    {time_4step_bad:.2f}s (model out of comfort zone)")
print(f"  4-step + LCM-LoRA:  {time_4step_lcm:.2f}s (near-baseline quality)")
print(f"  Speedup: {time_50step / time_4step_lcm:.1f}x")
print()
print("The only difference between the middle and right images is a 4 MB LoRA file.")
print("The LoRA changed WHAT the model does at each step, so 4 steps produce")
print("good results. The standard model at 4 steps produces poor results because")
print("it was not trained for so few steps.")

### What Just Happened

You saw the LCM-LoRA drop-in acceleration pattern in action:

- **The standard model at 4 steps is poor.** It expects 20–50 steps to refine details. At 4 steps, the denoising process is incomplete — the image is blurry, incoherent, or has obvious artifacts. This is the "model out of its comfort zone" from the lesson.

- **LCM-LoRA at 4 steps is near-baseline quality.** The LoRA modified the U-Net weights (via a low-rank bypass) so that each step covers more ground. The model was *trained* to produce good results in 4 steps via consistency distillation.

- **Three lines changed.** Load the LoRA (`load_lora_weights`), swap the scheduler (`LCMScheduler`), reduce the guidance scale (1.5 instead of 7.5). Everything else — the pipeline, the prompt, the VAE, the text encoder — is identical. This is the modularity principle from **Stable Diffusion Architecture** in action.

- **The reduced guidance scale matters.** LCM uses the augmented PF-ODE, which incorporates classifier-free guidance *into the trajectory*. Additional guidance on top of that is redundant and causes oversaturation. That is why we dropped from 7.5 to 1.5.

---

## Exercise 2: LCM-LoRA Universality `[Guided]`

From the lesson: the LCM-LoRA trained on the base SD v1.5 checkpoint can be applied to Dreamshaper, Realistic Vision, Anything V5, or any other SD v1.5-compatible fine-tune. One 4 MB file turns all of them into 4-step models.

Why does this work? The LCM-LoRA captures a general behavior change: "collapse the denoising trajectory into 4 steps." This is about the *dynamics of denoising*, not about specific content or style.

We will:
1. Load a community fine-tune compatible with SD v1.5
2. Apply the SAME LCM-LoRA from Exercise 1
3. Generate at 4 steps and compare to the base model's 4-step LCM-LoRA output

**Before running, predict:**
- Will the same LCM-LoRA work on a different fine-tune? Why or why not?
- The LCM-LoRA was trained on the base SD v1.5 checkpoint. The fine-tune has different weights. Should the LoRA still be effective?

In [None]:
# ============================================================
# Exercise 2: Apply LCM-LoRA to a community fine-tune
# ============================================================

# Clean up the previous pipeline to free GPU memory
del pipe
torch.cuda.empty_cache() if torch.cuda.is_available() else None

PROMPT_2 = "a medieval castle on a cliff overlooking the sea, dramatic lighting"

# --- Step 1: Load a community fine-tune ---
# Dreamshaper is a popular SD v1.5-compatible fine-tune with a
# distinctive artistic style.
print("Loading Dreamshaper (SD v1.5-compatible fine-tune)...")
pipe_finetune = StableDiffusionPipeline.from_pretrained(
    "Lykon/dreamshaper-8",
    torch_dtype=dtype,
    safety_checker=None,
).to(device)

original_ft_scheduler_config = pipe_finetune.scheduler.config
print("Dreamshaper loaded.")

In [None]:
# --- Step 2: Generate at 50 steps (standard) for baseline ---
pipe_finetune.scheduler = DPMSolverMultistepScheduler.from_config(
    original_ft_scheduler_config
)

print("Generating: Dreamshaper at 50 steps (baseline)...")
img_ft_50, time_ft_50 = generate_timed(
    pipe_finetune, PROMPT_2, num_inference_steps=50, guidance_scale=7.5
)
print(f"  Done in {time_ft_50:.2f}s")

# --- Step 3: Load the SAME LCM-LoRA and generate at 4 steps ---
print("Loading LCM-LoRA onto Dreamshaper...")
pipe_finetune.load_lora_weights("latent-consistency/lcm-lora-sdv1-5")
pipe_finetune.scheduler = LCMScheduler.from_config(original_ft_scheduler_config)
print("LCM-LoRA loaded onto fine-tune.")

print("Generating: Dreamshaper + LCM-LoRA at 4 steps...")
img_ft_4_lcm, time_ft_4_lcm = generate_timed(
    pipe_finetune, PROMPT_2, num_inference_steps=4, guidance_scale=1.5
)
print(f"  Done in {time_ft_4_lcm:.2f}s")

In [None]:
# --- Step 4: Compare the fine-tune at 50 steps vs 4 steps with LCM-LoRA ---
show_image_row(
    [img_ft_50, img_ft_4_lcm],
    [
        f"Dreamshaper — 50 Steps (standard)\n{time_ft_50:.1f}s | guidance=7.5",
        f"Dreamshaper + LCM-LoRA — 4 Steps\n{time_ft_4_lcm:.1f}s | guidance=1.5",
    ],
    suptitle=f'Universality Test: "{PROMPT_2}"',
)

print("The SAME LCM-LoRA file that worked on base SD v1.5 works on Dreamshaper.")
print()
print("Why this works:")
print("- The LCM-LoRA captures 'how to generate in 4 steps' — the dynamics")
print("  of fast denoising, not a specific visual style.")
print("- Dreamshaper is SD v1.5-compatible: same U-Net architecture, same")
print("  layer dimensions, same attention heads. The LoRA matrices fit.")
print("- The fine-tune follows the same ODE trajectory structure — it just")
print("  arrives at a different endpoint (its distinctive style).")
print("- One 4 MB adapter → every SD v1.5-compatible model becomes a 4-step model.")
print()
print("Note: the 4-step output may look slightly different in character from")
print("the 50-step output. The LCM-LoRA was distilled from the base model,")
print("not from Dreamshaper specifically. Minor quality variations are expected.")

### What Just Happened

You verified the "one adapter, many models" universality insight:

- **The same LCM-LoRA works on a different fine-tune.** The adapter was trained on base SD v1.5, but it transfers to Dreamshaper because both share the same U-Net architecture. The LoRA captures the *skill* of fast generation, not model-specific content.

- **This is the same swappability pattern from LoRA Fine-Tuning and ControlNet.** A small adapter file works across all compatible base models. LCM-LoRA extends this pattern to speed: one speed adapter for an entire model family.

- **Universality has limits.** This LCM-LoRA is architecture-specific. It works on any SD v1.5-compatible model but does NOT work on SDXL (different U-Net dimensions, different attention heads). Separate LCM-LoRA checkpoints exist for SD v1.5 and SDXL.

---

## Exercise 3: Step Count and Guidance Scale Exploration `[Supported]`

From the lesson: LCM typically uses a reduced guidance scale (1.0–2.0) compared to standard SD (7.0–7.5), and quality plateaus at around 4 steps. But where exactly are the sweet spots?

Your task: generate a grid of images varying step count and guidance scale, then identify:
- At which step count does quality plateau?
- At which guidance scale does oversaturation appear?
- What is the practical sweet spot for LCM-LoRA?

Fill in the TODO markers to complete the sweep.

In [None]:
# ============================================================
# Exercise 3: Sweep step counts and guidance scales
# ============================================================

# We reuse the pipeline from Exercise 2 (Dreamshaper + LCM-LoRA already loaded).
# If you restarted the runtime, re-run Exercises 1–2 first.

PROMPT_3 = "a red fox sitting in a snowy forest, soft morning light, highly detailed"

step_counts = [1, 2, 4, 8]
guidance_scales = [1.0, 1.5, 2.0, 4.0]

# TODO: Generate an image for each (step_count, guidance_scale) combination.
# Store the results in `grid_images` (a list of PIL images) and
# `grid_titles` (a list of strings describing each image).
#
# Use the `generate_timed` helper. You need:
#   - pipe_finetune as the pipeline (already has LCM-LoRA and LCMScheduler)
#   - PROMPT_3 as the prompt
#   - The same seed (SEED) for all images so differences come from
#     step count and guidance scale, not random noise
#
# Loop structure:
#   for steps in step_counts:
#       for gs in guidance_scales:
#           image, elapsed = generate_timed(pipe, prompt, steps, gs)
#           append image and a descriptive title

grid_images = []
grid_titles = []

print(f"Generating {len(step_counts) * len(guidance_scales)} images...")
print(f"Step counts: {step_counts}")
print(f"Guidance scales: {guidance_scales}")
print()

for steps in step_counts:
    for gs in guidance_scales:
        # TODO: Generate the image and append to grid_images and grid_titles
        # Title format suggestion: f"{steps} steps | cfg={gs}\n{elapsed:.1f}s"
        raise NotImplementedError(
            "TODO: Generate the image with generate_timed() and append to "
            "grid_images and grid_titles. See the loop structure hint above."
        )

In [None]:
# Display the grid: 4 rows (step counts) × 4 columns (guidance scales)
show_image_grid(
    grid_images,
    grid_titles,
    nrows=len(step_counts),
    ncols=len(guidance_scales),
    suptitle=(
        f'LCM-LoRA Parameter Sweep: "{PROMPT_3}"\n'
        f'Rows: step count ({step_counts}) | Columns: guidance scale ({guidance_scales})'
    ),
)

print("What to look for:")
print()
print("STEP COUNT (rows, top to bottom):")
print("- 1 step: recognizable subject and composition, but soft textures.")
print("  Adequate for previews and rapid iteration.")
print("- 2 steps: significant quality jump. Most textures resolved.")
print("- 4 steps: the practical sweet spot. Near-baseline quality.")
print("- 8 steps: diminishing returns. Hard to distinguish from 4 steps.")
print()
print("GUIDANCE SCALE (columns, left to right):")
print("- 1.0: minimal guidance. May look slightly less prompt-faithful.")
print("- 1.5: the typical LCM sweet spot. Good balance.")
print("- 2.0: slightly more saturated. Still acceptable.")
print("- 4.0: likely oversaturated. Colors pushed too far.")
print("  The augmented PF-ODE already incorporates guidance, so adding more")
print("  is redundant. This is unlike standard SD where 7.5 is typical.")

<details>
<summary>Solution</summary>

The key insight is that LCM-LoRA has a different sweet spot than standard SD: fewer steps (4 instead of 50) and lower guidance (1.0–2.0 instead of 7.0–7.5). The augmented PF-ODE bakes guidance into the trajectory, making additional guidance redundant.

```python
grid_images = []
grid_titles = []

print(f"Generating {len(step_counts) * len(guidance_scales)} images...")
print(f"Step counts: {step_counts}")
print(f"Guidance scales: {guidance_scales}")
print()

for steps in step_counts:
    for gs in guidance_scales:
        img, elapsed = generate_timed(
            pipe_finetune, PROMPT_3,
            num_inference_steps=steps,
            guidance_scale=gs,
        )
        grid_images.append(img)
        grid_titles.append(f"{steps} steps | cfg={gs}\n{elapsed:.1f}s")
        print(f"  {steps} steps, cfg={gs}: {elapsed:.1f}s")
```

**What you should observe:**
- Quality improves significantly from 1 to 2 steps, moderately from 2 to 4, and barely from 4 to 8
- Guidance scale 1.0–1.5 looks natural; 4.0 likely shows oversaturation (overly vivid colors, harsh contrasts)
- The practical sweet spot is 4 steps at guidance 1.5

**Common mistakes:**
- Forgetting to pass the same `seed` to all calls (makes the comparison unfair — differences would come from random noise, not parameters)
- Using standard guidance scale (7.5) — this causes severe oversaturation with LCM because the augmented PF-ODE already incorporates guidance

</details>

### What Just Happened

You mapped the practical parameter space of LCM-LoRA:

- **Quality plateaus at 4 steps.** The jump from 1 to 2 steps is dramatic (soft to mostly-resolved). From 2 to 4 is noticeable. From 4 to 8 shows diminishing returns. The consistency model was optimized for low step counts.

- **Guidance scale should be low (1.0–2.0).** The augmented PF-ODE already incorporates CFG guidance into the trajectory. Adding more guidance on top is redundant and causes oversaturation — visible as unnaturally vivid colors and harsh contrasts at guidance 4.0.

- **The practical sweet spot is 4 steps at guidance 1.5.** This is the combination that balances quality, speed, and prompt fidelity. Most LCM-LoRA workflows should start here.

---

## Exercise 4: LCM-LoRA + Style LoRA Composition `[Independent]`

From the lesson: LoRA weights are additive. You can load an LCM-LoRA *and* a style LoRA at the same time — the LCM-LoRA captures "generate in 4 steps" and the style LoRA captures a visual style. Both bypasses are summed.

**Your task:**

1. Load a fresh SD v1.5 pipeline
2. Load a style LoRA and generate at 50 steps — this is your style baseline
3. Load the LCM-LoRA *alongside* the style LoRA, swap to LCMScheduler, and generate at 4 steps
4. Compare: does the 4-step LCM + style result preserve the style while being fast?
5. Experiment with different LoRA adapter weight scales to balance the two influences

**Suggested style LoRA:** `"artificialguybr/pixelartredmond-1-5v-pixel-art-loras-for-sd-1-5"` (SD v1.5-compatible pixel art LoRA, trigger word: `PixArFK`). Any SD v1.5-compatible style LoRA from the HuggingFace Hub will work.

**Hints:**
- Use `pipe.load_lora_weights(adapter_name, adapter_name="lcm")` and `pipe.load_lora_weights(style_name, adapter_name="style")` to load multiple LoRAs
- Use `pipe.set_adapters(["lcm", "style"], adapter_weights=[1.0, 0.8])` to control the weight of each
- Generate with the same prompt and seed for fair comparison
- Try varying the style LoRA weight (0.5, 0.8, 1.0) while keeping LCM weight at 1.0
- Include the trigger word (`PixArFK`) in your prompt if using the suggested LoRA

**Before running, predict:**
- Will the composed result look like the style LoRA output, but generated faster?
- What happens if you set the LCM weight to 0.5? Will the style be stronger but the generation quality drop?

In [None]:
# ============================================================
# Exercise 4: Your code here
# ============================================================
#
# Suggested structure:
#
# 1. Clean up previous pipeline:
#    del pipe_finetune
#    torch.cuda.empty_cache()
#
# 2. Load a fresh SD v1.5 pipeline
#
# 3. Generate a style baseline:
#    a. Load a style LoRA
#    b. Generate at 50 steps with standard scheduler and guidance 7.5
#    c. Save this image as your "style baseline"
#
# 4. Add LCM-LoRA alongside the style LoRA:
#    a. Load LCM-LoRA as a second adapter
#    b. Use pipe.set_adapters() to activate both with chosen weights
#    c. Swap to LCMScheduler
#    d. Generate at 4 steps with guidance 1.5
#
# 5. Experiment with weight scales:
#    - Try style_weight = 0.5, 0.8, 1.0 with lcm_weight = 1.0
#    - Observe how the balance shifts between style fidelity and
#      generation quality
#
# 6. Display comparison:
#    - Style LoRA alone at 50 steps (baseline)
#    - LCM + Style at 4 steps (various weight combinations)
#
# Remember:
# - Same PROMPT and SEED for all generations
# - LCM needs guidance_scale around 1.5, not 7.5
# - The style baseline at 50 steps uses guidance_scale 7.5
# - pipe.set_adapters(["lcm", "style"], adapter_weights=[lcm_w, style_w])
#   controls the balance


<details>
<summary>Solution</summary>

The key insight is that LoRA bypasses are additive — you can sum multiple low-rank updates and the model applies all of them. The LCM-LoRA modifies the denoising dynamics ("generate faster") while the style LoRA modifies the output aesthetics ("generate in this style"). These are largely orthogonal modifications, so they compose well.

```python
# Clean up
del pipe_finetune
torch.cuda.empty_cache() if torch.cuda.is_available() else None

PROMPT_4 = "a lighthouse on a rocky coast during a storm, waves crashing, PixArFK"

# --- Load a fresh SD v1.5 pipeline ---
pipe_comp = StableDiffusionPipeline.from_pretrained(
    "stable-diffusion-v1-5/stable-diffusion-v1-5",
    torch_dtype=dtype,
    safety_checker=None,
).to(device)

comp_scheduler_config = pipe_comp.scheduler.config

# --- Style baseline: style LoRA alone at 50 steps ---
STYLE_LORA = "artificialguybr/pixelartredmond-1-5v-pixel-art-loras-for-sd-1-5"

pipe_comp.load_lora_weights(STYLE_LORA, adapter_name="style")
pipe_comp.set_adapters(["style"], adapter_weights=[1.0])
pipe_comp.scheduler = DPMSolverMultistepScheduler.from_config(comp_scheduler_config)

print("Generating: style LoRA alone at 50 steps (baseline)...")
img_style_baseline, t_style = generate_timed(
    pipe_comp, PROMPT_4, num_inference_steps=50, guidance_scale=7.5
)
print(f"  Done in {t_style:.1f}s")

# --- Load LCM-LoRA alongside the style LoRA ---
pipe_comp.load_lora_weights(
    "latent-consistency/lcm-lora-sdv1-5",
    adapter_name="lcm",
)
pipe_comp.scheduler = LCMScheduler.from_config(comp_scheduler_config)

# --- Generate with different style weight combinations ---
images = [img_style_baseline]
titles = [f"Style LoRA only — 50 steps\n{t_style:.1f}s | guidance=7.5"]

for style_w in [0.5, 0.8, 1.0]:
    pipe_comp.set_adapters(
        ["lcm", "style"],
        adapter_weights=[1.0, style_w],
    )
    img, elapsed = generate_timed(
        pipe_comp, PROMPT_4, num_inference_steps=4, guidance_scale=1.5
    )
    images.append(img)
    titles.append(f"LCM + style={style_w} — 4 steps\n{elapsed:.1f}s")
    print(f"  LCM=1.0, style={style_w}: {elapsed:.1f}s")

show_image_row(
    images,
    titles,
    suptitle=f'LoRA Composition: "{PROMPT_4}"',
)
```

**Why LoRA composition works:** LoRA bypasses are additive to the base weights: $W_{\text{effective}} = W_{\text{base}} + \alpha_1 B_1 A_1 + \alpha_2 B_2 A_2$. Each LoRA modifies a different "axis" of behavior. The LCM-LoRA modifies the denoising dynamics; the style LoRA modifies the output aesthetics. Because they target different aspects of the model's behavior, they compose without significant interference.

**Weight scale effects:**
- LCM weight below 1.0: the model partially reverts to multi-step behavior. At 4 steps, quality may degrade because the model is not fully in "LCM mode."
- Style weight below 1.0: the style effect is reduced. At 0.5, you see a faint pixel art influence; at 1.0, the style dominates.
- The sweet spot is typically LCM weight=1.0 and style weight=0.7–1.0.

**Common mistakes:**
- Forgetting to swap to LCMScheduler when using LCM-LoRA (generates garbage)
- Using guidance_scale=7.5 with LCM-LoRA (causes oversaturation)
- Loading LoRAs without adapter names (makes it impossible to control weights independently)
- Not using the `peft` library (required for multi-adapter support in diffusers)
- Forgetting the trigger word (`PixArFK`) in the prompt — the style LoRA may not activate without it

</details>

---

## Key Takeaways

1. **LCM-LoRA is a drop-in accelerator.** Three lines of code — load the LoRA, swap the scheduler, reduce the guidance scale — turn any SD v1.5 pipeline from 50-step to 4-step generation with near-baseline quality. The model, the prompt, the VAE, the text encoder all stay the same.

2. **One adapter, many models.** The same LCM-LoRA trained on base SD v1.5 works on Dreamshaper, Realistic Vision, and other compatible fine-tunes. It captures the *skill* of fast generation (denoising dynamics), not a specific visual style. This is universality within an architecture family.

3. **LCM-LoRA has different sweet spots than standard SD.** Quality plateaus at 4 steps (not 50). Guidance scale should be 1.0–2.0 (not 7.0–7.5). The augmented PF-ODE bakes guidance into the trajectory, making additional guidance redundant and harmful.

4. **LoRA composition works.** LCM-LoRA + style LoRA can be loaded simultaneously because LoRA bypasses are additive. The LCM-LoRA handles speed while the style LoRA handles aesthetics. Adjust adapter weights to balance the two influences.

5. **Same recipe, bigger kitchen.** Everything you saw here is the consistency distillation from the previous lesson — applied to Stable Diffusion's latent space instead of 2D point clouds. The training procedure is structurally identical. The LoRA variant captures the weight change as a portable adapter.