<a href="https://colab.research.google.com/github/ajavid34/guided-diffusion-sxela/blob/main/fine_tuning_openai_diffusion_model.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

A simple colab to fine-tune openai diffusion models.


Feel free to ask questions in this post's comments: https://www.patreon.com/posts/66246423

by [Alex Spirin](https://twitter.com/devdef)

![visitors](https://visitor-badge.glitch.me/badge?page_id=sxela_finetune_openai_colab)

## Setup (run once per session)

This mounts your google drive for easier storage

In [1]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


This downloads the training code and installs it

In [2]:
%cd /content
!git clone https://github.com/ajavid34/guided-diffusion-sxela
%cd /content/guided-diffusion-sxela
!pip install -e .

/content
Cloning into 'guided-diffusion-sxela'...
remote: Enumerating objects: 157, done.[K
remote: Counting objects: 100% (22/22), done.[K
remote: Compressing objects: 100% (22/22), done.[K
remote: Total 157 (delta 10), reused 0 (delta 0), pack-reused 135 (from 1)[K
Receiving objects: 100% (157/157), 120.08 KiB | 17.15 MiB/s, done.
Resolving deltas: 100% (79/79), done.
/content/guided-diffusion-sxela
Obtaining file:///content/guided-diffusion-sxela
  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting nvidia-cuda-nvrtc-cu12==12.4.127 (from torch->guided-diffusion==0.0.0)
  Downloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.4.127 (from torch->guided-diffusion==0.0.0)
  Downloading nvidia_cuda_runtime_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.4.127 (from torch->guided-diffusion==0.0.0)
  Downloading nvidia_cuda_cupti_cu12-12.4.1

#Train (tune) BEDROOM model :D
Needs 16gb GPU RAM

Works in colab pro and on kaggle

Download a pre-trained LSUN BEDROOM model that we will be tuning on our dataset

In [3]:
!wget https://openaipublic.blob.core.windows.net/diffusion/march-2021/lsun_uncond_100M_1200K_bs128.pt -P /content/

--2025-05-28 21:53:03--  https://openaipublic.blob.core.windows.net/diffusion/march-2021/lsun_uncond_100M_1200K_bs128.pt
Resolving openaipublic.blob.core.windows.net (openaipublic.blob.core.windows.net)... 57.150.97.129
Connecting to openaipublic.blob.core.windows.net (openaipublic.blob.core.windows.net)|57.150.97.129|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 454799209 (434M) [application/octet-stream]
Saving to: ‘/content/lsun_uncond_100M_1200K_bs128.pt’


2025-05-28 21:55:24 (3.09 MB/s) - ‘/content/lsun_uncond_100M_1200K_bs128.pt’ saved [454799209/454799209]



In [4]:
!echo "Downloading and preparing dataset..."
!# Download Oxford 102 flowers dataset
!wget https://www.robots.ox.ac.uk/~vgg/data/flowers/102/102flowers.tgz

# Extract and organize images
!tar -xzf 102flowers.tgz
!mkdir -p your_images
!cp jpg/*.jpg your_images/

Downloading and preparing dataset...
--2025-05-28 21:55:24--  https://www.robots.ox.ac.uk/~vgg/data/flowers/102/102flowers.tgz
Resolving www.robots.ox.ac.uk (www.robots.ox.ac.uk)... 129.67.94.2
Connecting to www.robots.ox.ac.uk (www.robots.ox.ac.uk)|129.67.94.2|:443... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: https://thor.robots.ox.ac.uk/flowers/102/102flowers.tgz [following]
--2025-05-28 21:55:25--  https://thor.robots.ox.ac.uk/flowers/102/102flowers.tgz
Resolving thor.robots.ox.ac.uk (thor.robots.ox.ac.uk)... 129.67.95.98
Connecting to thor.robots.ox.ac.uk (thor.robots.ox.ac.uk)|129.67.95.98|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 344862509 (329M) [application/octet-stream]
Saving to: ‘102flowers.tgz’


2025-05-28 21:55:47 (16.0 MB/s) - ‘102flowers.tgz’ saved [344862509/344862509]



## Tune

For gigachads.
We're going to do what's called a pro-gamer move (or not): tune a small model, trained on bedrooms, on our own dataset. Just because we can and it's much faster than training from scratch.

Don't forget to change the paths:
You need to change DATASET_PATH to point to your dataset images folder, and CHECKPOINT_PATH - to point to a folder you'd like to save progress to.

For, example here /content/drive/MyDrive/deep_learning/guided-diffusion-sxela/ - this path points to a location, where all the training checkpoints will be saved

and /content/YourDatasetHere/ - this path points to your dataset, i.e. a folder with images (no captions needed)




We will be using this model together with CLIP inside DiscoDiffusion, so we can train less, stop early and let CLIP do the heavy lifting.

This will run almost forever, but you should start checking your results at around ~50k iterations. Good results begin to appear at 100-200k iterations, depending on your dataset.

Validating means opening your CHECKPOINT_PATH folder, taking the ema_0.9999_(some number of steps).pt file with the highest number (the latest one), going to this version of DiscoDiffusion here
https://github.com/Sxela/DiscoDiffusion-Warp/blob/main/Disco_Diffusion_v5_2_Warp_custom_model.ipynb and setting this: diffusion-model - custom, custom_path - path to that ema file from the previous step (if you saved it on google drive - then just point it there), and set width_height to 256x256, then run DD as usual


In [8]:
MODEL_FLAGS="--image_size 256 --num_channels 128 --num_res_blocks 2 --num_heads 1 --learn_sigma True --use_scale_shift_norm False --attention_resolutions 16"
DIFFUSION_FLAGS="--diffusion_steps 1000 --noise_schedule linear --rescale_learned_sigmas False --rescale_timesteps False --use_scale_shift_norm False"
TRAIN_FLAGS="--lr 2e-5 --batch_size 4 --save_interval 2000 --log_interval 50 --resume_checkpoint /content/lsun_uncond_100M_1200K_bs128.pt"
DATASET_PATH="./your_images/" #change to point to your dataset path
OUTPUT_PATH="/content/drive/MyDrive/deep_learning/guided-diffusion-sxela/" #models will be saved here, change to your drive folder or else
%cd /content/guided-diffusion-sxela
!python scripts/image_train.py --data_dir $DATASET_PATH $MODEL_FLAGS $DIFFUSION_FLAGS $TRAIN_FLAGS --logdir $OUTPUT_PATH

#if you are using vanilla openai repo, then you will ned to run this:
#!OPENAI_LOGDIR=$OUTPUT_PATH python scripts/image_train.py --data_dir $DATASET_PATH $MODEL_FLAGS $DIFFUSION_FLAGS $TRAIN_FLAGS

/content/guided-diffusion-sxela
  @th.cuda.amp.custom_fwd
  @th.cuda.amp.custom_bwd
set output to  /content/drive/MyDrive/deep_learning/guided-diffusion-sxela/
Logging to /content/drive/MyDrive/deep_learning/guided-diffusion-sxela/
creating model and diffusion...
creating data loader...
training...
loading model from checkpoint: /content/lsun_uncond_100M_1200K_bs128.pt...
-------------------------
| grad_norm  | 0.0195   |
| loss       | 0.00481  |
| loss_q1    | 0.00847  |
| loss_q2    | 0.00115  |
| mse        | 0.00477  |
| mse_q1     | 0.00841  |
| mse_q2     | 0.00114  |
| param_norm | 683      |
| samples    | 4        |
| step       | 0        |
| vb         | 3.6e-05  |
| vb_q1      | 6.12e-05 |
| vb_q2      | 1.08e-05 |
-------------------------
saving model 0...
saving model 0.9999...
Traceback (most recent call last):
  File "/content/guided-diffusion-sxela/scripts/image_train.py", line 86, in <module>
    main()
  File "/content/guided-diffusion-sxela/scripts/image_train.py

In [10]:
# Classifier architecture flags
CLASSIFIER_FLAGS="--image_size 256 --classifier_attention_resolutions 32,16,8 --classifier_depth 2 --classifier_width 128 --classifier_pool attention --classifier_use_fp16 True"

# Training flags
TRAIN_FLAGS="--lr 3e-4 --batch_size 16 --save_interval 500 --log_interval 100 --iterations 1500 --anneal_lr True --weight_decay 0.05"

# ECT (Entropy-Constraint Training) flags
ECT_FLAGS="--ect_weight 0.1 --ect_divergence JS --mi_weight 0.01 --mi_divergence JS"

# Entropy configuration flags
ENTROPY_FLAGS="--entropy_type renyi --entropy_alpha 2.0"

# Dataset and paths
DATASET_PATH="./your_images/"  # Your ImageNet or dataset path
OUTPUT_PATH="/content/drive/MyDrive/deep_learning/guided-diffusion-ect/"  # Output directory

# For training the noise-aware classifier with ECT
%cd /content/guided-diffusion-sxela
!python scripts/classifier_train.py \
    --data_dir $DATASET_PATH \
    --noised True \
    $CLASSIFIER_FLAGS \
    $TRAIN_FLAGS \
    $ECT_FLAGS \
    $ENTROPY_FLAGS

/content/guided-diffusion-sxela
  @th.cuda.amp.custom_fwd
  @th.cuda.amp.custom_bwd
Logging to /tmp/openai-2025-05-28-22-11-48-458238
creating model and diffusion...
creating data loader...
creating optimizer...
training classifier model with ECT...
[rank0]: Traceback (most recent call last):
[rank0]:   File "/content/guided-diffusion-sxela/scripts/classifier_train.py", line 355, in <module>
[rank0]:     main()
[rank0]:   File "/content/guided-diffusion-sxela/scripts/classifier_train.py", line 261, in main
[rank0]:     forward_backward_log(data)
[rank0]:   File "/content/guided-diffusion-sxela/scripts/classifier_train.py", line 244, in forward_backward_log
[rank0]:     log_loss_dict(diffusion, sub_t, losses)
[rank0]:   File "/content/guided-diffusion-sxela/guided_diffusion/train_util.py", line 293, in log_loss_dict
[rank0]:     for sub_t, sub_loss in zip(ts.cpu().numpy(), values.detach().cpu().numpy()):
[rank0]:                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

In [5]:
!pip install mpi4py

Collecting mpi4py
  Downloading mpi4py-4.0.3.tar.gz (466 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/466.3 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m466.3/466.3 kB[0m [31m32.7 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Installing backend dependencies ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Building wheels for collected packages: mpi4py
  Building wheel for mpi4py (pyproject.toml) ... [?25l[?25hdone
  Created wheel for mpi4py: filename=mpi4py-4.0.3-cp311-cp311-linux_x86_64.whl size=4441922 sha256=63d173acfbce9d31b46990872502291690d26d76d34dda526c37d461aa638514
  Stored in directory: /root/.cache/pip/wheels/5c/56/17/bf6ba37aa971a191a8b9eaa188bf5ec855b8911c1c56fb1f84
Successfully built mpi4py
Installing collected packages: mpi4py
Successfully installed 

## Sampling
The best way to sample your model in real-life conditions is to plug it into DiscoDiffusion.


Grab your latest ema checkpoint, open this colab here - https://github.com/Sxela/DiscoDiffusion-Warp/blob/main/Disco_Diffusion_v5_2_Warp_custom_model.ipynb

and change model settings > custom model path to your ema checkpoint's location, as described in the previous cell.

You can still sample using vanilla openai code, just plug your checkpoint in the cell below

Don't forget to change all the paths

In [None]:
checkpoint_path = 'input some checkpoint path here' #use ema checkpoint
OUTPUT_PATH="/content/drive/MyDrive/deep_learning/guided-diffusion-sxela/"
!python scripts/image_sample.py --num_samples 1 --model_path $checkpoint_path $MODEL_FLAGS $DIFFUSION_FLAGS --timestep_respacing ddim100 --logdir $OUTPUT_PATH

#if you are using vanilla openai repo, then you will ned to run this:
#!OPENAI_LOGDIR=/content/drive/MyDrive/deep_learning/guided-diffusion-sxela/samples/  python scripts/image_sample.py --num_samples 1 --model_path $checkpoint_path $MODEL_FLAGS $DIFFUSION_FLAGS --timestep_respacing ddim100

In [None]:
import numpy as np
import PIL

sample_path = 'some sample path'
im = np.load(sample_path)
PIL.Image.fromarray(im.f.arr_0[0])

#Train (tune) 256x256 vanilla DD model
Only if you have a beefy GPU with more than 16gb RAM

For lvl 50 AI bosses,
Will not fit into colab pro, only in colab pro+ with A100 gpu


Download a pre-trained openai 256x256 model (the one used in DiscoDiffusion) that we will be tuning on our dataset

In [None]:
#download model checkpoint
!wget https://openaipublic.blob.core.windows.net/diffusion/jul-2021/256x256_diffusion_uncond.pt -P /content/
#if you wish to tune the 512x512 finetuned model from DD, you need to download it and change image size and checkpoint path later here:
#!wget https://huggingface.co/lowlevelware/512x512_diffusion_unconditional_ImageNet/resolve/main/512x512_diffusion_uncond_finetune_008100.pt

## Tune

Don't forget to change the paths:
You need to change DATASET_PATH to point to your dataset images folder, and CHECKPOINT_PATH - to point to a folder you'd like to save progress to.

For, example here /content/drive/MyDrive/deep_learning/guided-diffusion-sxela/ - this path points to a location, where all the training checkpoints will be saved

and /content/YourDatasetHere/ - this path points to your dataset, i.e. a folder with images (no captions needed)




We will be using this model together with CLIP inside DiscoDiffusion, so we can train less, stop early and let CLIP do the heavy lifting.

This will run almost forever, but you should start checking your results at around ~50k iterations. Good results begin to appear at 100-200k iterations, depending on your dataset.

Validating means opening your CHECKPOINT_PATH folder, taking the ema_0.9999_(some number of steps).pt file with the highest number (the latest one), going to this version of DiscoDiffusion here
https://github.com/Sxela/DiscoDiffusion-Warp/blob/main/Disco_Diffusion_v5_2_Warp_custom_model.ipynb and setting this: diffusion-model - custom, custom_path - path to that ema file from the previous step (if you saved it on google drive - then just point it there),

you'll need to set custom model settings to this:

    model_config.update({
        'attention_resolutions': '32, 16, 8',
        'class_cond': False,
        'diffusion_steps': diffusion_steps,
        'rescale_timesteps': True,
        'timestep_respacing': timestep_respacing,
        'image_size': 256,
        'learn_sigma': True,
        'noise_schedule': 'linear',
        'num_channels': 256,
        'num_head_channels': 64,
        'num_res_blocks': 2,
        'resblock_updown': True,
        'use_checkpoint': use_checkpoint,
        'use_fp16': True,
        'use_scale_shift_norm': True,
    })

In [None]:
MODEL_FLAGS="--attention_resolutions 32,16,8 --class_cond False --diffusion_steps 1000 --image_size 256 --learn_sigma True --noise_schedule linear --num_channels 256 --num_head_channels 64  --num_res_blocks 2 --resblock_updown True --use_fp16 True --use_scale_shift_norm True"
TRAIN_FLAGS="--lr 2e-5 --batch_size 4 --save_interval 1000 --log_interval 50 --resume_checkpoint /content/256x256_diffusion_uncond.pt"
DATASET_PATH="/content/YourDatasetHere/" #change to point to your dataset path
OUTPUT_PATH="/content/drive/MyDrive/deep_learning/guided-diffusion/"
%cd /content/guided-diffusion
!python scripts/image_train.py --data_dir $DATASET_PATH $MODEL_FLAGS $TRAIN_FLAGS --logdir $OUTPUT_PATH

#if you are using vanilla openai repo, then you will ned to run this:
# !OPENAI_LOGDIR=$OUTPUT_PATH python scripts/image_train.py --data_dir $DATASET_PATH $MODEL_FLAGS $TRAIN_FLAGS

Sample from model

## Sampling
The best way to sample your model in real-life conditions is to plug it into DiscoDiffusion.


Grab your latest ema checkpoint, open this colab here - https://github.com/Sxela/DiscoDiffusion-Warp/blob/main/Disco_Diffusion_v5_2_Warp_custom_model.ipynb

and change settings like described in the previous cell

You can still sample using vanilla openai code, just plug your checkpoint in the cell below

Don't forget to change all the paths

In [None]:
checkpoint_path = 'input some checkpoint path here' #use ema checkpoint
OUTPUT_PATH="/content/drive/MyDrive/deep_learning/guided-diffusion-sxela/"
!python scripts/image_sample.py --num_samples 1 --model_path $checkpoint_path $MODEL_FLAGS --timestep_respacing ddim100 --logdir $OUTPUT_PATH

#if you are using vanilla openai repo, then you will ned to run this:
#!OPENAI_LOGDIR=/content/drive/MyDrive/deep_learning/guided-diffusion-sxela/samples/  python scripts/image_sample.py --num_samples 1 --model_path $checkpoint_path $MODEL_FLAGS --timestep_respacing ddim100

Show results

In [None]:
import numpy as np
import PIL

sample_path = 'some sample path'
im = np.load(sample_path)
PIL.Image.fromarray(im.f.arr_0[0])

# Train from scratch (smaller model than vanilla DD, but larger than LSUN)
For lvl 1 AI crooks like me, should fit into colab pro

Train a smaller model that will fit definitely into colab pro.

Don't forget to change the paths:
You need to change DATASET_PATH to point to your dataset images folder, and CHECKPOINT_PATH - to point to a folder you'd like to save progress to.

For, example here /content/drive/MyDrive/deep_learning/guided-diffusion-sxela/ - this path points to a location, where all the training checkpoints will be saved

and /content/YourDatasetHere/ - this path points to your dataset, i.e. a folder with images (no captions needed)




We will be using this model together with CLIP inside DiscoDiffusion, so we can train less, stop early and let CLIP do the heavy lifting.

This will run almost forever, but you should start checking your results at around ~50k iterations. Good results begin to appear at 100-200k iterations, depending on your dataset.

Validating means opening your CHECKPOINT_PATH folder, taking the ema_0.9999_(some number of steps).pt file with the highest number (the latest one), going to this version of DiscoDiffusion here
https://github.com/Sxela/DiscoDiffusion-Warp/blob/main/Disco_Diffusion_v5_2_Warp_custom_model.ipynb and setting this: diffusion-model - custom, custom_path - path to that ema file from the previous step (if you saved it on google drive - then just point it there),

you'll need to set custom model settings to this:

    model_config.update({
        'attention_resolutions': '32, 16, 8',
        'class_cond': False,
        'diffusion_steps': diffusion_steps,
        'rescale_timesteps': True,
        'timestep_respacing': timestep_respacing,
        'image_size': 256,
        'learn_sigma': True,
        'noise_schedule': 'linear',
        'num_channels': 128,
        'num_heads': 4,
        'num_res_blocks': 2,
        'resblock_updown': True,
        'use_checkpoint': use_checkpoint,
        'use_fp16': True,
        'use_scale_shift_norm': True,
    })

In [None]:
MODEL_FLAGS="--attention_resolutions 32,16,8 --class_cond False --diffusion_steps 1000 --image_size 256 --learn_sigma True --noise_schedule linear --num_channels 128 --num_heads 4  --num_res_blocks 2 --resblock_updown True --use_fp16 True --use_scale_shift_norm True"
TRAIN_FLAGS="--lr 2e-5 --batch_size 4 --save_interval 1000 --log_interval 50"
DATASET_PATH="/content/YourDatasetHere/" #change to point to your dataset path
OUTPUT_PATH="/content/drive/MyDrive/deep_learning/guided-diffusion-sxela/"
%cd /content/guided-diffusion-sxela
!python scripts/image_train.py --data_dir $DATASET_PATH $MODEL_FLAGS $TRAIN_FLAGS --logdir $OUTPUT_PATH

#if you are using vanilla openai repo, then you will ned to run this:
# !OPENAI_LOGDIR=$OUTPUT_PATH python scripts/image_train.py --data_dir $DATASET_PATH $MODEL_FLAGS $TRAIN_FLAGS

### Sampling
The best way to sample your model in real-life conditions is to plug it into DiscoDiffusion.


Grab your latest ema checkpoint, open this colab here - https://github.com/Sxela/DiscoDiffusion-Warp/blob/main/Disco_Diffusion_v5_2_Warp_custom_model.ipynb

and change settings like described in the previous cell

In [None]:
checkpoint_path = 'input some checkpoint path here' #use ema checkpoint
OUTPUT_PATH="/content/drive/MyDrive/deep_learning/guided-diffusion-sxela/"
!python scripts/image_sample.py --num_samples 1 --model_path $checkpoint_path $MODEL_FLAGS  --timestep_respacing ddim100 --logdir $OUTPUT_PATH

#if you are using vanilla openai repo, then you will ned to run this:
#!OPENAI_LOGDIR=/content/drive/MyDrive/deep_learning/guided-diffusion-sxela/samples/  python scripts/image_sample.py --num_samples 1 --model_path $checkpoint_path $MODEL_FLAGS  --timestep_respacing ddim100

Show results

In [None]:
import numpy as np
import PIL

sample_path = 'some sample path'
im = np.load(sample_path)
PIL.Image.fromarray(im.f.arr_0[0])