Skip to content
/ IDN Public

AAAI 2021: Beyond Class-Conditional Assumption: A Primary Attempt to Combat Instance-Dependent Label Noise

Notifications You must be signed in to change notification settings

chenpf1025/IDN

Repository files navigation

Beyond Class-Conditional Assumption: A Primary Attempt to Combat Instance-Dependent Label Noise.

This is the official repository for the paper Beyond Class-Conditional Assumption: A Primary Attempt to Combat Instance-Dependent Label Noise. (AAAI 2021). In this paper, one of the contributions is to provide rigorous motivations for studying instance-dependent label noise.

@inproceedings{chen2021beyond,
  title={Beyond Class-Conditional Assumption: A Primary Attempt to Combat Instance-Dependent Label Noise.},
  author={Chen, Pengfei and Ye, Junjie and Chen, Guangyong and Zhao, Jingwei and Heng, Pheng-Ann},
  booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
  year={2021}
}

0. Requirements

  • python 3.6+
  • torch 1.2+

1. Instance-Dependent Noise (IDN)

1.1. Noisy labels used in this paper

In our experiments, we generated noisy labels of IDN for MNIST and CIFAR-10. Here we release the related files.

data/CIFAR10/label_noisy/dependent0.1.csv
data/CIFAR10/label_noisy/dependent0.2.csv
data/CIFAR10/label_noisy/dependent0.3.csv
data/CIFAR10/label_noisy/dependent0.4.csv
data/MNIST/label_noisy/dependent0.1.csv
data/MNIST/label_noisy/dependent0.2.csv
data/MNIST/label_noisy/dependent0.3.csv
data/MNIST/label_noisy/dependent0.4.csv

If you are developing novel methods, you are encouraged to use these files for a fair comparison with the results reported in our paper. The index in the .csv file is consistent with the default dataset in torchvision. For example, to get a CIFAR-10 dataset with 40% IDN, you can use the following scripts in you code.

from torchvision import datasets
train_dataset_noisy = datasets.CIFAR10(root, train=True, download=True, transform=transform)
targets_noisy = list(pd.read_csv('./data/CIFAR10/label_noisy/dependent0.4.csv')['label_noisy'].values.astype(int))
train_dataset_noisy.targets = targets_noisy

To get a MNIST dataset with 40% IDN, you can use the following scripts in you code.

from torchvision import datasets
train_dataset_noisy = datasets.MNIST(root, train=True, download=True, transform=transform)
targets_noisy = torch.Tensor(pd.read_csv('./data/MNIST/label_noisy/dependent0.4.csv')['label_noisy'].values.astype(int))
train_dataset_noisy.targets = targets_noisy

1.2. Synthetizing IDN

If you prefer to synthetize IDN, e.g., to synthetize 45% IDN for CIFAR-10, you can use the following commands.

python cifar10_gen_dependent.py --noise_rate 0.45 --gen

The command will train a model on clean CIFAR-10, yield the average of softmax output, and then synthetize IDN. After you running the command for the first time, the averaged softmax output is saved and you can directly generate IDN of any other ratio by loading it, e.g.,

python cifar10_gen_dependent.py --noise_rate 0.35 --gen --load

If you need to write a script to synthetize IDN for a new dataset, you can refer to the file mnist_gen_dependent.py and cifar10_gen_dependent.py.

2. Combating IDN using SEAL

2.1. MNIST

For SEAL, we use 10 iterations. We can run the commands one-by-one as follows.

python train_mnist.py --noise_rate 0.2 --SEAL 0 --save
python train_mnist.py --noise_rate 0.2 --SEAL 1 --save
...
python train_mnist.py --noise_rate 0.2 --SEAL 10 --save

The initial iteration is equivalent to training using the cross-entropy (CE) loss. To run experiments on different noise fractions, we can choose --noise_rate in {0.1,0.2,0.3,0.4}.

2.2. CIFAR-10

For SEAL, we use 3 iterations. We can run the commands one-by-one as follows.

python train_cifar10.py --noise_rate 0.2 --SEAL 0 --save
python train_cifar10.py --noise_rate 0.2 --SEAL 1 --save
python train_cifar10.py --noise_rate 0.2 --SEAL 2 --save
python train_cifar10.py --noise_rate 0.2 --SEAL 3 --save

The initial iteration is equivalent to training using the cross-entropy (CE) loss. To run experiments on different noise fractions, we can choose --noise_rate in {0.1,0.2,0.3,0.4}.

2.3. Clothing1M

By default, the training requirements 4 GPUs. For SEAL, we use 3 iterations. We can run the commands one-by-one as follows.

python train_clothing.py --SEAL 0 --save
python train_clothing.py --SEAL 1 --save
python train_clothing.py --SEAL 2 --save
python train_clothing.py --SEAL 3 --save

The initial iteration is equivalent to training using the cross-entropy (CE) loss

To run SEAL on top of DMI, we first use the official implementation of DMI to obtained a model, then use the following commands one-by-one.

python train_clothing_dmi.py --SEAL 1 --save
python train_clothing_dmi.py --SEAL 2 --save
python train_clothing_dmi.py --SEAL 3 --save

About

AAAI 2021: Beyond Class-Conditional Assumption: A Primary Attempt to Combat Instance-Dependent Label Noise

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages