<a href="https://colab.research.google.com/github/Evil-Tux/Diffusion-Models/blob/main/Article_3_Fine_Tune_Pretrained.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
!pip install diffusers==0.16.1 accelerate open_clip_torch transformers

In [None]:
import torch
import torch.nn as nn
from torch.utils.data import DataLoader

import torchvision
from torchvision.transforms import Compose, Resize, ToTensor, ToPILImage

from diffusers import DDPMScheduler, DDIMScheduler, DDPMPipeline, DDIMPipeline

from matplotlib import pyplot as plt
from PIL import Image
from tqdm import tqdm
import numpy as np

def plot_images(images, n=8, axs=None):
    if axs is None:
        fig, axs = plt.subplots(1, n, figsize=(10, 3))
    assert len(axs) == len(images)
    for i, img in enumerate(images):
        axs[i].axis('off')
        if isinstance(img, torch.Tensor):
            img = ToPILImage()((img/2+0.5).clamp(0, 1))
        axs[i].imshow(img.resize((64, 64), resample=Image.NEAREST), cmap='gray_r', vmin=0, vmax=255)

## Fine-Tuning Pretrained Models

It is the 2020’s and barely anyone trains deep learning models from scratch anymore. Everyone fine-tunes an existing pretrained model so it fits their own purposes instead. By fine-tuning pretrained models, individuals and companies alike expend fewer resources to produce a usable model which suits their needs. There are many pretrained models to choose from in the [Hugging Face Model Hub](https://huggingface.co/models) with different weights and architectures. In this post, we’re going to fine-tune a pretrained model and upload it to Hugging Face.

Let's take a pretrained pipeline, Google's `ddpm-cifar10-32`, a diffusion model trained on the CIFAR10 dataset, and fine-tune it on our own MNIST digits.

### Loading Pretrained Pipeline

Let's load the pretrained pipeline directly from the HuggingFace Hub:

In [None]:
device = 'cuda' if torch.cuda.is_available() else 'cpu'
image_pipe = DDPMPipeline.from_pretrained("google/ddpm-cifar10-32")
image_pipe.to(device)

In [None]:
image_pipe.unet.num_parameters()/1e6

It’s now time to generate some images, so we know what we're dealing with here:

In [None]:
images = image_pipe(batch_size=8).images

Although you can use the provided code, it’s worth reconsidering this approach. Why? Generating images with DDPM proves to be a time-intensive process.

Instead, let's create a much faster DDIM scheduler and assign it to the pipeline:


In [None]:
scheduler = DDIMScheduler.from_pretrained('google/ddpm-cifar10-32')
image_pipe.scheduler = scheduler

In [None]:
images = image_pipe(batch_size=8, num_inference_steps=40).images

Much better! Let's take a look at the images.

In [None]:
plot_images(images)

They are indeed CIFAR-ish, as expected. Notice that, unlike MNIST images, they are colored RGB images. Therefore, we need to adjust our own images to the pipeline's expected input: 3-channel 32x32 pixel images. We can use Torchvision's Lambda transform to replicate MNIST's single channel three times.

In [None]:
from torchvision.transforms import Lambda

composed = Compose([Resize(32), ToTensor(), Lambda(lambda x: x.repeat(3, 1, 1))])
dataset = torchvision.datasets.MNIST(root="mnist/", train=True, download=True, transform=composed)
train_dataloader = DataLoader(dataset, batch_size=64, shuffle=True)

In [None]:
images = next(iter(train_dataloader))[0][:8]

In [None]:
plot_images(images)

### Training Loop

The training loop is essentially the same as the one we used to train a model from scratch. There are a few small differences, since we're using both scheduler and model directly from the pipeline, `pipeline.scheduler` and `pipeline.unet`, respectively.

It goes over the same steps (1-6; see below) and typical PyTorch training stuff (computing gradients, updating parameters, zeroing gradients).

This code was adapted from [Unit 2 of HuggingFace's Diffusion Models class](https://colab.research.google.com/github/huggingface/diffusion-models-class/blob/main/unit2/01_finetuning_and_guidance.ipynb). Let's take a look at the output below.

In [None]:
device = 'cuda' if torch.cuda.is_available() else 'cpu'

# Sending pipeline to device
image_pipe.to(device)

## FIXED A MISTAKED HERE - PLEASE UPDATE ##
optimizer = torch.optim.AdamW(image_pipe.unet.parameters(), lr=1e-5)
loss_fn = nn.MSELoss()

# Fetching scheduler from the pipeline
num_train_timesteps = image_pipe.scheduler.config.num_train_timesteps

In [None]:
losses = []

for epoch in tqdm(range(3)):
    for step, batch in tqdm(enumerate(train_dataloader), total=len(train_dataloader)):
        # Step 1: Fetch Clean Images
        clean_images = batch[0].to(device)

        # Step 2: Generate (Full) Noise
        noise = torch.randn_like(clean_images).to(device)

        # Step 3: Random Timesteps
        bs = clean_images.shape[0]
        t = torch.randint(0, num_train_timesteps, (bs,), device=device).long()

        # Step 4: Add Noise to Clean Images
        # Fetching scheduler from the pipeline
        noisy_images = image_pipe.scheduler.add_noise(clean_images, noise, t)

        # Step 5: Predict (Full) Noise from (Partially) Noisy Images
        # Fetching model from the pipeline
        noise_pred = image_pipe.unet(noisy_images, t, return_dict=False)[0]

        # Step 6: Compute Loss
        loss = loss_fn(noise_pred, noise)

        # Regular PyTorch training loop stuff
        loss.backward(loss)
        losses.append(loss.item())
        optimizer.step()
        optimizer.zero_grad()

    loss_last_epoch = sum(losses[-len(train_dataloader) :]) / len(train_dataloader)
    print(f"Epoch:{epoch+1}, loss: {loss_last_epoch}")

In [None]:
plt.plot(losses)

Given the size of the model, it may take  around 20 minutes to fine-tune it on Google Colab. Instead, we'll be loading the fine-tuned model in a couple of sections.

But, if you're running it and waiting for it to finish, you should expect to see losses like this:

![](https://github.com/dvgodoy/DiffusionModels101_ODSC_Europe2023/blob/main/images/diffusion_finetuning_mnist_losses.png?raw=true)

### Pushing to Hub

You don't need to run these cells - I've kept them here so you can see how to push a fine-tuned pipeline to HuggingFace's Hub, in case you'd like to share it with others.

In [None]:
from huggingface_hub import notebook_login

notebook_login()

In [None]:
from huggingface_hub import HfApi, create_repo

hub_model_id = "dvgodoy/ddpm-cifar10-32-mnist"
create_repo(hub_model_id)
api = HfApi()
api.upload_folder(
    folder_path="ddpm-cifar10-32-mnist/scheduler", path_in_repo="", repo_id=hub_model_id
)
api.upload_folder(folder_path="ddpm-cifar10-32-mnist/unet", path_in_repo="", repo_id=hub_model_id)
api.upload_file(
    path_or_fileobj="ddpm-cifar10-32-mnist/model_index.json",
    path_in_repo="model_index.json",
    repo_id=hub_model_id,
)

In [None]:
from huggingface_hub import ModelCard

content = f"""
---
license: mit
tags:
- pytorch
- diffusers
- unconditional-image-generation
- diffusion-models-class
---

# Diffusion Models 101

This model is a diffusion model for unconditional image generation of MNIST digits fine-tuned on Google's ddpm-cifar10-32 model

## Usage

```python
from diffusers import DDPMPipeline

pipeline = DDPMPipeline.from_pretrained('{hub_model_id}')
image = pipeline().images[0]
image
```
"""

card = ModelCard(content)
card.push_to_hub(hub_model_id)

### Loading From Hub

Once the model is pushed to the Hub, you can load it like any other model from there.

To save you time, let's just load the resulting model and pipeline instead of running the training loop above.

In [None]:
from diffusers import DDPMPipeline, DDIMScheduler
image_pipe = DDPMPipeline.from_pretrained('dvgodoy/ddpm-cifar10-32-mnist')

Then we can generate images in the usual way:

In [None]:
image_pipe.to(device)
images = image_pipe(batch_size=8, num_inference_steps=40).images

In [None]:
plot_images(images)

On the up side, these images are not CIFAR-ish at all! On the down side, they do not quite look like MNIST digits yet...

Well, it actually depends on whom you ask :-)

![](https://github.com/dvgodoy/DiffusionModels101_ODSC_Europe2023/blob/main/images/paracetamol.png?raw=true)

### Generating Images

Please bear with me as we bring back the manual loop for image generation. You'll see why we're doing this shortly. Just like in the training loop, we're taking both scheduler and model directly from the pipeline.

In [None]:
noise_scheduler = image_pipe.scheduler
model = image_pipe.unet

torch.manual_seed(33)
sample = torch.randn(8, 3, 32, 32).to(device)

for i, t in tqdm(enumerate(noise_scheduler.timesteps)):
    # Ensures schedulers are interchangeable
    model_input = noise_scheduler.scale_model_input(sample, t)

    with torch.no_grad():
        epsilon = model(sample, t).sample

    sample = noise_scheduler.step(epsilon, t, sample).prev_sample

Then, let's generate some images as sanity check:

In [None]:
plot_images(sample)

Cool, they still look like handwritten digits from a doctor :-)

Or, perhaps, they look like falling characters from "The Matrix"?

We loaded a pretrained model and fine-tuned it to generate MNIST digit images, however, we can further adjust the model. In the next post in the Diffusion Model series, we’ll learn how to guide the model to generate images with a specific characteristic.