Skip to content

d1pankarmedhi/diffusion

Repository files navigation

Diffusion Model

A PyTorch implementation of the Diffusion Model for image generation.

PyTorch Python

The project implements a diffusion model that gradually converts random noise into meaningful images through an iterative denoising process.

denoising

The complete diffusion process can be divided into two phases: Forward and Reverse Diffusion.

image image

🕸️ Forward Diffusion (Adding Noise to image)

This process involves adding random noise to the input image $x_0$, progressively corrupting it in a step-by-step process, for $t = 1, ..., T$, arriving at $x_t$. This process is also referred to as a Markov Chain, where the state at each step depends on the previous step.

At each step $t$, we add a little bit of Gaussian noise. $\beta_t$ is a small value of noise added to the image, and

$\alpha_t = 1 - \beta_t$

We keep adding noise over time, defining a cumulative product

$\overline{\alpha_t}$ = $\sum_{s=1}^{t} \alpha_s$

and over time, after $t$ steps, only a part of the image survives.

Thus, the direct formula for the noisy image can be expressed as:

$x_t = \sqrt{\overline{\alpha}} x_0 + \sqrt{1 - \overline{\alpha_t}}ϵ$

where,

  • $x_t$ = noisy image at step t
  • $x_0$ = original image
  • $ϵ \sim N(0, I)$ = random Gaussian noise

In short,

  • $\sqrt{\overline{\alpha}}x_0$ = how much clean image survives
  • $\sqrt{1 - \overline{\alpha_t}}ϵ$ = how much noise is mixed in the image.

Also, referred to as the forward diffusion equation.

forward-diffusion

Fig: Forward Diffusion

🦋 Reverse Diffusion (Predicting & removing noise)

In this process, the model learns to undo the forward diffusion process by learning to remove the noise step by step, to reconstruct the original data.

The model starts with pure noise $x_t$ (noisy image) and learns to transform it into a coherent image $x_0$.

Here, a neural network, such as UNet (Ronneberger et al. (2015)) (or Transformer), learns to predict the noise added at each step in the forward process. It learns to predict what noise to remove at each step.

image

Fig: UNet Architecture

Iteratively, the model learns to remove the predicted noise from the image at each time step, gradually refining the input into a fine output image.

Metrics

Trained for only 50 epochs, with timesteps 1000.

FID: 25.81 
IS: 4.06  
KID: 0.0009

Note: FID requires a sample size of 10,000 - 50,000, which is not feasible given the resource constraints. The evaluation was performed with 1000 samples, where KID becomes a more prominent metric.

image

License

See LICENSE file for details.

About

PyTorch implementation of Conditional Diffusion model with UNet

Topics

Resources

License

Stars

Watchers

Forks

Languages