# Diffusion Model Tutorial

This is an example code to train a diffusion model.
Please download this notebook and run it in your own Google Colab.

Instructions:
- Open this notebook in Colab
- Make sure you are running on GPU (top right corner --> Change runtime type)
- Follow along the snippets below



Install diffusion model package

In [None]:
pip install denoising_diffusion_pytorch

# Setup your data

You have 2 options to setup your data folder.
1. Load from you google drive.
2. Create a temporary folder here in a colab session. This folder will be deleted when you leave the session.

You can use some example data from:
1. COVID CT dataset: https://github.com/UCSD-AI4H/COVID-CT/tree/master/Images-processed
2. Brain MRI: https://www.kaggle.com/datasets/ashfakyeafi/brain-mri-images
3. Flowers dataset: https://www.kaggle.com/datasets/l3llff/flowers


Upload your images to your folder with the preferred options above.

**Note:** You need a minimum of 100 images for this example code to work.

# Option 1
If you prefer to load the data from your google drive follow the instructions below
- Edit the line with the path to your folder, this will be your root directory.
- Edit the *data_dir* and *model_dir* location. *Model_dir* will be created for you under your predefined root directory.
- Run the code snippet


In [None]:
from google.colab import drive
drive.mount('/content/drive')

%cd /content/drive/My Drive/
data_dir = 'data_covid'
model_dir = 'model_covid'

## Option 2
If you choose this option, you can create a directory here on colab (click the left panel).
- Create a directory to store your data
- Upload your images there
- Edit the line with the path to your folder, this will be your root directory.
- Edit the *data_dir* and *model_dir* location. *Model_dir* will be created for you automatically, you only need to set the name.
- Run the code snippet


In [None]:
data_dir = 'data_covid'
model_dir = 'model_covid'

# Diffusion Model Training
Now, let's move on to the diffussion code

Import the Denoising Diffusion Probabilistic Model (DDPM) library

In [None]:
from denoising_diffusion_pytorch import Unet, GaussianDiffusion, Trainer

## Create the UNet model
This is the model that handles the reverse diffusion process


In [None]:
model = Unet(
    dim = 64,
    dim_mults = (1, 2, 4, 8)
).cuda()

## Create the Diffusion scheduler

This is where you defined the noise scheduler. In the default code below, the noising process is done gradually for 1000 timesteps.
The sampling is performed for 250 timesteps using Denoising Diffusion Implicit Sampling (DDIM).

Set the image size depending on your preference.
In this example, images will be resampled to 64x64.

In [None]:
img_size = 32
diffusion = GaussianDiffusion(
        model,
        image_size = img_size,
        timesteps = 1000,
        sampling_timesteps = 250,
    ).cuda()

## Setup the trainer controller

The trainer handles the hyperparameters needed for the model training.
Typically you need around 10-30k iterations for a small amount of data.
Model is being saved and sampled every 1000 iterations.

Note: Every time a model is beign saved, it will utilize some of your disk space. Monitor your disk usage carefully.

In [None]:
trainer = Trainer(
    diffusion,
    data_dir,
    train_batch_size = 32,
    train_lr = 8e-5,
    train_num_steps = 10000,         # total training steps
    gradient_accumulate_every = 2,    # gradient accumulation steps
    ema_decay = 0.995,                # exponential moving average decay
    results_folder = model_dir,
    num_samples=16,
    calculate_fid=False,
    save_and_sample_every = 1000,
)

## Run the training

You can check the saved model (.pt files) and samples on the *model_dir* you defined above.


In [None]:
trainer.train()