Diffusion Model

A PyTorch implementation of the Diffusion Model for image generation.

The project implements a diffusion model that gradually converts random noise into meaningful images through an iterative denoising process.

The complete diffusion process can be divided into two phases: Forward and Reverse Diffusion.

🕸️ Forward Diffusion (Adding Noise to image)

This process involves adding random noise to the input image $x_0$, progressively corrupting it in a step-by-step process, for $t = 1, ..., T$, arriving at $x_t$. This process is also referred to as a Markov Chain, where the state at each step depends on the previous step.

At each step $t$, we add a little bit of Gaussian noise. $\beta_t$ is a small value of noise added to the image, and

$\alpha_t = 1 - \beta_t$

We keep adding noise over time, defining a cumulative product

$\overline{\alpha_t}$ = $\sum_{s=1}^{t} \alpha_s$

and over time, after $t$ steps, only a part of the image survives.

Thus, the direct formula for the noisy image can be expressed as:

$x_t = \sqrt{\overline{\alpha}} x_0 + \sqrt{1 - \overline{\alpha_t}}ϵ$

where,

$x_t$ = noisy image at step t
$x_0$ = original image
$ϵ \sim N(0, I)$ = random Gaussian noise

In short,

$\sqrt{\overline{\alpha}}x_0$ = how much clean image survives
$\sqrt{1 - \overline{\alpha_t}}ϵ$ = how much noise is mixed in the image.

Also, referred to as the forward diffusion equation.

Fig: Forward Diffusion

🦋 Reverse Diffusion (Predicting & removing noise)

In this process, the model learns to undo the forward diffusion process by learning to remove the noise step by step, to reconstruct the original data.

The model starts with pure noise $x_t$ (noisy image) and learns to transform it into a coherent image $x_0$.

Here, a neural network, such as UNet (Ronneberger et al. (2015)) (or Transformer), learns to predict the noise added at each step in the forward process. It learns to predict what noise to remove at each step.

Fig: UNet Architecture

Iteratively, the model learns to remove the predicted noise from the image at each time step, gradually refining the input into a fine output image.

Metrics

Trained for only 50 epochs, with timesteps 1000.

FID: 25.81 
IS: 4.06  
KID: 0.0009

Note: FID requires a sample size of 10,000 - 50,000, which is not feasible given the resource constraints. The evaluation was performed with 1000 samples, where KID becomes a more prominent metric.

License

See LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
diffusion		diffusion
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
generate.py		generate.py
inference.py		inference.py
main.py		main.py
pyproject.toml		pyproject.toml
train.py		train.py
utils.py		utils.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Diffusion Model

🕸️ Forward Diffusion (Adding Noise to image)

🦋 Reverse Diffusion (Predicting & removing noise)

Metrics

License

About

Uh oh!

Languages

License

d1pankarmedhi/diffusion

Folders and files

Latest commit

History

Repository files navigation

Diffusion Model

🕸️ Forward Diffusion (Adding Noise to image)

🦋 Reverse Diffusion (Predicting & removing noise)

Metrics

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Languages