DDPM from Scratch - Unconditional Image Generation

A minimal implementation of Denoising Diffusion Probabilistic Models (DDPM) built entirely from scratch in PyTorch. The model learns to generate images through an iterative denoising process. Currently supports MNIST (28x28 grayscale) and CIFAR-10 (32x32 RGB).

How It Works

DDPM works in two phases:

Forward process (training): Gradually add Gaussian noise to real images over T=1000 timesteps until they become pure noise.

Reverse process (inference): Starting from pure noise, the trained UNet predicts and removes the noise step-by-step, recovering a clean image.

Pure Noise (t=999) --> ... --> Partially Denoised (t=500) --> ... --> Clean Image (t=0)

Project Structure

.
├── configs/
│   ├── config_mnist.py         # Hyperparameters for MNIST
│   └── config_cifar10.py       # Hyperparameters for CIFAR-10
├── models/
│   └── unet.py                 # UNet architecture with time embeddings & attention
├── pipelines/
│   └── ddpm_scheduler.py       # Linear noise scheduler (forward + reverse diffusion)
├── utils/
│   ├── mnist_dataset.py        # CSV dataset loader for MNIST
│   ├── mnist_training.py       # MNIST training loop
│   ├── cifar10_dataset.py      # Folder-based image dataset loader for CIFAR-10
│   └── cifar10_training.py     # CIFAR-10 training loop
├── train.py                    # Entry point for training
├── sample_mnist.py             # Generate a single MNIST image
├── sample_cifar10.py           # Generate a single CIFAR-10 image
├── app.py                      # Gradio web app for interactive generation
├── environment.yml             # Conda environment file
└── README.md

Architecture

UNet

The denoising network follows the standard UNet encoder-decoder structure:

Encoder (DownBlocks): 3 blocks with channels [32, 64, 128, 256], each containing ResNet layers, self-attention, and optional spatial downsampling
Bottleneck (MidBlocks): 2 blocks with channels [256, 256, 128] with ResNet + self-attention
Decoder (UpBlocks): 3 blocks mirroring the encoder with skip connections, ResNet layers, self-attention, and upsampling

Time conditioning: Sinusoidal positional embeddings encode the diffusion timestep, projected through an MLP and injected into every ResNet block.

Noise Scheduler

Uses a linear beta schedule from beta_start=0.0001 to beta_end=0.02 across 1000 timesteps.

add_noise(x0, noise, t) -- Forward process: x_t = sqrt(alpha_bar_t) * x_0 + sqrt(1 - alpha_bar_t) * noise
sample_prev_timestep(xt, noise_pred, t) -- Reverse step using the DDPM mean formula with posterior variance

Setup

Prerequisites

Python 3.9+
PyTorch 2.0+ (with CUDA or MPS support recommended)

Install Dependencies

conda env create -f environment.yml
conda activate ddpm

Prepare MNIST Data

Download the MNIST CSV files from Kaggle: MNIST in CSV
Create a data/ directory in the project root and place the downloaded CSV files there:

mkdir -p data
# Move the downloaded files into the data directory
mv /path/to/mnist_train.csv data/train.csv
mv /path/to/mnist_test.csv data/test.csv   # optional, not used for training

Your project should look like:

UnconditionalDDPM/
├── data/
│   └── train.csv       <-- required
├── configs/
├── models/
└── ...

The expected CSV format is:

label,1x1,1x2,...,28x28
7,0,0,...,0
2,0,0,...,255

First column is the digit label (unused for unconditional training), remaining 784 columns are pixel values (0-255) for a 28x28 image. The config at configs/config_mnist.py points to data/train.csv by default -- update csv_path there if your file is named differently.

Prepare CIFAR-10 Data

Download the CIFAR-10 images dataset from Kaggle: CIFAR-10 Images (or any source that provides CIFAR-10 as individual PNG files).
Place all training images (.png files) into a data/train/ directory in the project root:

mkdir -p data/train
# Move/copy all CIFAR-10 PNG images into data/train/
cp /path/to/cifar10-images/*.png data/train/

Your project should look like:

UnconditionalDDPM/
├── data/
│   └── train/
│       ├── 0.png
│       ├── 1.png
│       ├── 2.png
│       └── ...          <-- 32x32 RGB PNG images
├── configs/
├── models/
└── ...

The dataset loader reads every image file in the folder, converts it to RGB, and normalizes pixel values to [-1, 1]. The config at configs/config_cifar10.py points to data/train by default -- update folder_path there if your images are elsewhere.

Usage

Train

To train on a specific dataset, update the import in train.py to use the desired config and training module:

MNIST:

# train.py should import from configs.config_mnist and utils.mnist_training
python train.py

Trains the UNet for 40 epochs on 1000 randomly sampled images from the CSV. Checkpoints are saved to mnist/ddpm_ckpt.pth.

CIFAR-10:

# train.py should import from configs.config_cifar10 and utils.cifar10_training
python train.py

Trains the UNet for 20 epochs on 40,000 images from the image folder. Checkpoints are saved to cifar10/cifar10_ckpt.pth.

Training progress is printed per epoch:

Epoch 1/40: 100%|██████████| 15/15 [00:05<00:00]
Finished epoch:1 | Loss : 0.8226
Finished epoch:2 | Loss : 0.5288
...
Done Training ...

Generate a Single Image

MNIST:

python sample_mnist.py

Generates one digit image by running the full 1000-step reverse diffusion from pure noise. Output saved to default/sample/generated_sample.png.

CIFAR-10:

python sample_cifar10.py

Generates one 32x32 RGB image via 1000-step reverse diffusion. Output saved to cifar10/sample/cifar10_sample.png.

Interactive Web App

python app.py

Opens a Gradio interface at http://127.0.0.1:7860 where you can:

Set a random seed (or -1 for random)
Generate a single digit and view the result
See the denoising progression as a horizontal strip (noise to clean image)

Configuration

Each dataset has its own config file under configs/.

MNIST (`configs/config_mnist.py`)

Parameter	Value	Description
`num_timesteps`	1000	Number of diffusion steps
`beta_start` / `beta_end`	0.0001 / 0.02	Linear noise schedule range
`im_channels`	1	Grayscale
`im_size`	28	Image resolution (28x28)
`down_channels`	[32, 64, 128, 256]	Feature channels per encoder level
`time_emb_dim`	128	Timestep embedding dimension
`num_heads`	2	Attention heads per block
`batch_size`	64	Training batch size
`num_epochs`	40	Training epochs
`lr`	0.0001	Adam learning rate
`subset_size`	1000	Number of images to sample from CSV

CIFAR-10 (`configs/config_cifar10.py`)

Parameter	Value	Description
`num_timesteps`	1000	Number of diffusion steps
`beta_start` / `beta_end`	0.0001 / 0.02	Linear noise schedule range
`im_channels`	3	RGB
`im_size`	32	Image resolution (32x32)
`down_channels`	[32, 64, 128, 256]	Feature channels per encoder level
`time_emb_dim`	128	Timestep embedding dimension
`num_heads`	2	Attention heads per block
`num_down_layers`	2	Layers per downsample block (deeper than MNIST)
`batch_size`	64	Training batch size
`num_epochs`	20	Training epochs
`lr`	0.0001	Adam learning rate
`subset_size`	40000	Number of images to use from folder

References

Denoising Diffusion Probabilistic Models (Ho et al., 2020)
The Annotated Diffusion Model (Hugging Face)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DDPM from Scratch - Unconditional Image Generation

How It Works

Project Structure

Architecture

UNet

Noise Scheduler

Setup

Prerequisites

Install Dependencies

Prepare MNIST Data

Prepare CIFAR-10 Data

Usage

Train

Generate a Single Image

Interactive Web App

Configuration

MNIST (`configs/config_mnist.py`)

CIFAR-10 (`configs/config_cifar10.py`)

References

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
configs		configs
models		models
pipelines		pipelines
utils		utils
.gitignore		.gitignore
README.md		README.md
app.py		app.py
environment.yml		environment.yml
sample_cifar10.py		sample_cifar10.py
sample_mnist.py		sample_mnist.py
train.py		train.py

CooL-Legend/DDPM-Pytorch

Folders and files

Latest commit

History

Repository files navigation

DDPM from Scratch - Unconditional Image Generation

How It Works

Project Structure

Architecture

UNet

Noise Scheduler

Setup

Prerequisites

Install Dependencies

Prepare MNIST Data

Prepare CIFAR-10 Data

Usage

Train

Generate a Single Image

Interactive Web App

Configuration

MNIST (configs/config_mnist.py)

CIFAR-10 (configs/config_cifar10.py)

References

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

MNIST (`configs/config_mnist.py`)

CIFAR-10 (`configs/config_cifar10.py`)

Packages