Label-Noise Robust Diffusion Models (TDSM) (ICLR 2024)

This repo contains an official PyTorch implementation for the paper "Label-Noise Robust Diffusion Models" in ICLR 2024.

Byeonghu Na, Yeongmin Kim, HeeSun Bae, Jung Hyun Lee, Se Jung Kwon, Wanmo Kang, and Il-Chul Moon

This paper proposes Transition-aware weighted Denoising Score Matching (TDSM) objective for training conditional diffusion models with noisy labels.

$The training procedure of the proposed approach. The solid black arrows indicate the forward propagation, and the dashed red arrows represent the gradient signal flow. The filled circle operation denotes the dot product operation, and the dashed operation represents the L2 loss. The noisy-label classifier $\tilde{\mathbf{h}}_{\boldsymbol{\phi}^*}$ can be obtained by the cross-entropy loss on the noisy labeled dataset $\tilde{D}$.$

Requirements

The requirements for this code are the same as those outlined for EDM.

In our experiment, we utilized 8 NVIDIA Tesla P40 GPUs, employing CUDA 11.4 and PyTorch 1.12 for training.

Datasets

Datasets follow the same format used in StyleGAN and EDM, where are stored as uncompressed ZIP archives containing uncompressed PNG files, accompanied by a metadata file dataset.json for label information.

Noisy Labeled Dataset

For the benchmark datasets, we add arguments to adjust the noise type and noise rate. You can change --noise_type ('sym', 'asym') and --noise_rate (0 to 1).

For example, the script to contruct the CIFAR-10 dataset under 40% symmetric noise is:

python dataset_tool.py --source=downloads/cifar10/cifar-10-python.tar.gz \
    --dest=datasets/cifar10_sym_40-32x32.zip --noise_type=sym --noise_rate=0.4

Additionally, we provide the noisy labeled datasets that we used by this link.

First download each dataset ZIP archive, then replace dataset.json file in the ZIP archive with the corresponding json file.

Training

Classifiers

You can train new classifiers using train_classifier.py. For example:

torchrun --standalone --nproc_per_node=1 train_classifier.py --outdir=classifier-runs \
    --data=datasets/cifar10_sym_40-32x32.zip --cond=1 --arch=ddpmpp --batch 1024