Skip to content

A minimal implementation of Denoising Diffusion Probabilistic Models (DDPM) built entirely from scratch in PyTorch. The model learns to generate MNIST handwritten digit images through an iterative denoising process.

Notifications You must be signed in to change notification settings

CooL-Legend/DDPM-Pytorch

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DDPM from Scratch - Unconditional Image Generation

A minimal implementation of Denoising Diffusion Probabilistic Models (DDPM) built entirely from scratch in PyTorch. The model learns to generate images through an iterative denoising process. Currently supports MNIST (28x28 grayscale) and CIFAR-10 (32x32 RGB).

How It Works

DDPM works in two phases:

Forward process (training): Gradually add Gaussian noise to real images over T=1000 timesteps until they become pure noise.

Reverse process (inference): Starting from pure noise, the trained UNet predicts and removes the noise step-by-step, recovering a clean image.

Pure Noise (t=999) --> ... --> Partially Denoised (t=500) --> ... --> Clean Image (t=0)

Project Structure

.
├── configs/
│   ├── config_mnist.py         # Hyperparameters for MNIST
│   └── config_cifar10.py       # Hyperparameters for CIFAR-10
├── models/
│   └── unet.py                 # UNet architecture with time embeddings & attention
├── pipelines/
│   └── ddpm_scheduler.py       # Linear noise scheduler (forward + reverse diffusion)
├── utils/
│   ├── mnist_dataset.py        # CSV dataset loader for MNIST
│   ├── mnist_training.py       # MNIST training loop
│   ├── cifar10_dataset.py      # Folder-based image dataset loader for CIFAR-10
│   └── cifar10_training.py     # CIFAR-10 training loop
├── train.py                    # Entry point for training
├── sample_mnist.py             # Generate a single MNIST image
├── sample_cifar10.py           # Generate a single CIFAR-10 image
├── app.py                      # Gradio web app for interactive generation
├── environment.yml             # Conda environment file
└── README.md

Architecture

UNet

The denoising network follows the standard UNet encoder-decoder structure:

  • Encoder (DownBlocks): 3 blocks with channels [32, 64, 128, 256], each containing ResNet layers, self-attention, and optional spatial downsampling
  • Bottleneck (MidBlocks): 2 blocks with channels [256, 256, 128] with ResNet + self-attention
  • Decoder (UpBlocks): 3 blocks mirroring the encoder with skip connections, ResNet layers, self-attention, and upsampling

Time conditioning: Sinusoidal positional embeddings encode the diffusion timestep, projected through an MLP and injected into every ResNet block.

Noise Scheduler

Uses a linear beta schedule from beta_start=0.0001 to beta_end=0.02 across 1000 timesteps.

  • add_noise(x0, noise, t) -- Forward process: x_t = sqrt(alpha_bar_t) * x_0 + sqrt(1 - alpha_bar_t) * noise
  • sample_prev_timestep(xt, noise_pred, t) -- Reverse step using the DDPM mean formula with posterior variance

Setup

Prerequisites

  • Python 3.9+
  • PyTorch 2.0+ (with CUDA or MPS support recommended)

Install Dependencies

conda env create -f environment.yml
conda activate ddpm

Prepare MNIST Data

  1. Download the MNIST CSV files from Kaggle: MNIST in CSV
  2. Create a data/ directory in the project root and place the downloaded CSV files there:
mkdir -p data
# Move the downloaded files into the data directory
mv /path/to/mnist_train.csv data/train.csv
mv /path/to/mnist_test.csv data/test.csv   # optional, not used for training

Your project should look like:

UnconditionalDDPM/
├── data/
│   └── train.csv       <-- required
├── configs/
├── models/
└── ...

The expected CSV format is:

label,1x1,1x2,...,28x28
7,0,0,...,0
2,0,0,...,255

First column is the digit label (unused for unconditional training), remaining 784 columns are pixel values (0-255) for a 28x28 image. The config at configs/config_mnist.py points to data/train.csv by default -- update csv_path there if your file is named differently.

Prepare CIFAR-10 Data

  1. Download the CIFAR-10 images dataset from Kaggle: CIFAR-10 Images (or any source that provides CIFAR-10 as individual PNG files).
  2. Place all training images (.png files) into a data/train/ directory in the project root:
mkdir -p data/train
# Move/copy all CIFAR-10 PNG images into data/train/
cp /path/to/cifar10-images/*.png data/train/

Your project should look like:

UnconditionalDDPM/
├── data/
│   └── train/
│       ├── 0.png
│       ├── 1.png
│       ├── 2.png
│       └── ...          <-- 32x32 RGB PNG images
├── configs/
├── models/
└── ...

The dataset loader reads every image file in the folder, converts it to RGB, and normalizes pixel values to [-1, 1]. The config at configs/config_cifar10.py points to data/train by default -- update folder_path there if your images are elsewhere.

Usage

Train

To train on a specific dataset, update the import in train.py to use the desired config and training module:

MNIST:

# train.py should import from configs.config_mnist and utils.mnist_training
python train.py

Trains the UNet for 40 epochs on 1000 randomly sampled images from the CSV. Checkpoints are saved to mnist/ddpm_ckpt.pth.

CIFAR-10:

# train.py should import from configs.config_cifar10 and utils.cifar10_training
python train.py

Trains the UNet for 20 epochs on 40,000 images from the image folder. Checkpoints are saved to cifar10/cifar10_ckpt.pth.

Training progress is printed per epoch:

Epoch 1/40: 100%|██████████| 15/15 [00:05<00:00]
Finished epoch:1 | Loss : 0.8226
Finished epoch:2 | Loss : 0.5288
...
Done Training ...

Generate a Single Image

MNIST:

python sample_mnist.py

Generates one digit image by running the full 1000-step reverse diffusion from pure noise. Output saved to default/sample/generated_sample.png.

CIFAR-10:

python sample_cifar10.py

Generates one 32x32 RGB image via 1000-step reverse diffusion. Output saved to cifar10/sample/cifar10_sample.png.

Interactive Web App

python app.py

Opens a Gradio interface at http://127.0.0.1:7860 where you can:

  • Set a random seed (or -1 for random)
  • Generate a single digit and view the result
  • See the denoising progression as a horizontal strip (noise to clean image)

Configuration

Each dataset has its own config file under configs/.

MNIST (configs/config_mnist.py)

Parameter Value Description
num_timesteps 1000 Number of diffusion steps
beta_start / beta_end 0.0001 / 0.02 Linear noise schedule range
im_channels 1 Grayscale
im_size 28 Image resolution (28x28)
down_channels [32, 64, 128, 256] Feature channels per encoder level
time_emb_dim 128 Timestep embedding dimension
num_heads 2 Attention heads per block
batch_size 64 Training batch size
num_epochs 40 Training epochs
lr 0.0001 Adam learning rate
subset_size 1000 Number of images to sample from CSV

CIFAR-10 (configs/config_cifar10.py)

Parameter Value Description
num_timesteps 1000 Number of diffusion steps
beta_start / beta_end 0.0001 / 0.02 Linear noise schedule range
im_channels 3 RGB
im_size 32 Image resolution (32x32)
down_channels [32, 64, 128, 256] Feature channels per encoder level
time_emb_dim 128 Timestep embedding dimension
num_heads 2 Attention heads per block
num_down_layers 2 Layers per downsample block (deeper than MNIST)
batch_size 64 Training batch size
num_epochs 20 Training epochs
lr 0.0001 Adam learning rate
subset_size 40000 Number of images to use from folder

References

About

A minimal implementation of Denoising Diffusion Probabilistic Models (DDPM) built entirely from scratch in PyTorch. The model learns to generate MNIST handwritten digit images through an iterative denoising process.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages