# Diffusion Models in PyTorch

While Diffusion Models have not yet been democratized to the same degree as other older architectures/approaches in Machine Learning, there are still implementations available for use. The easiest way to use a Diffusion Model in PyTorch is to use the denoising-diffusion-pytorch package, which implements an image diffusion model like the one discussed in this article. To install the package, simply type the following command in the terminal:

In [1]:
pip install denoising_diffusion_pytorch

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


# Minimal Example

To train a model and generate images, we first import the necessary packages:

In [2]:
import torch
from denoising_diffusion_pytorch import Unet, GaussianDiffusion

Next, we define our network architecture, in this case a U-Net. The dim parameter specifies the number of feature maps before the first down-sampling, and the dim_mults parameter provides multiplicands for this value and successive down-samplings:

In [3]:
model = Unet(
    dim = 64,
    dim_mults = (1, 2, 4, 8)
)

Now that our network architecture is defined, we need to define the Diffusion Model itself. We pass in the U-Net model that we just defined along with several parameters - the size of images to generate, the number of timesteps in the diffusion process, and a choice between the L1 and L2 norms.

In [4]:
diffusion = GaussianDiffusion(
    model,
    image_size = 128,
    timesteps = 5,   # number of steps
    loss_type = 'l1'    # L1 or L2
)

Now that the Diffusion Model is defined, it's time to train. We generate random data to train on, and then train the Diffusion Model in the usual fashion:

In [5]:
training_images = torch.randn(8, 3, 128, 128)
loss = diffusion(training_images)
loss.backward()

Once the model is trained, we can finally generate images by using the sample() method of the diffusion object. Here we generate 4 images, which are only noise given that our training data was random:

In [6]:
sampled_images = diffusion.sample(batch_size = 4)

sampling loop time step:   0%|          | 0/5 [00:00<?, ?it/s]

In [7]:
print(sampled_images)

tensor([[[[1.0000, 0.4834, 0.0000,  ..., 0.9119, 0.0000, 1.0000],
          [0.0184, 1.0000, 0.1645,  ..., 0.1238, 0.0000, 0.7808],
          [0.7471, 0.5532, 0.6766,  ..., 0.7086, 0.0000, 0.2469],
          ...,
          [0.9669, 0.9141, 0.5833,  ..., 1.0000, 0.3052, 0.0324],
          [0.5637, 0.0000, 0.0000,  ..., 0.4756, 1.0000, 1.0000],
          [0.9001, 0.0302, 0.9728,  ..., 0.0000, 0.3630, 0.0000]],

         [[0.3369, 0.7538, 0.7942,  ..., 0.3643, 0.0000, 0.4676],
          [1.0000, 0.0713, 0.0000,  ..., 0.3987, 0.0413, 0.0000],
          [0.3557, 0.7201, 1.0000,  ..., 1.0000, 1.0000, 0.9021],
          ...,
          [0.0568, 0.1472, 0.0142,  ..., 0.9298, 1.0000, 0.0000],
          [0.3364, 0.1858, 0.8880,  ..., 0.8333, 0.3876, 0.0000],
          [0.0000, 1.0000, 1.0000,  ..., 0.7697, 0.0000, 0.3056]],

         [[1.0000, 0.9562, 1.0000,  ..., 0.8870, 0.4773, 0.8323],
          [0.8999, 0.9249, 1.0000,  ..., 0.8293, 0.8894, 0.5286],
          [1.0000, 1.0000, 0.5996,  ..., 0