InstanceGM: Instance-Dependent Noisy Label Learning via Graphical Modelling (IEEE/CVF WACV 2023 Round 1)
@InProceedings{Garg_2023_WACV,
author = {Garg, Arpit and Nguyen, Cuong and Felix, Rafael and Do, Thanh-Toan and Carneiro, Gustavo},
title = {Instance-Dependent Noisy Label Learning via Graphical Modelling},
booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)},
month = {January},
year = {2023},
pages = {2288-2298}
}
- Abstract
Noisy labels are unavoidable yet troublesome in the ecosystem of deep learning because models can easily overfit them. There are many types of label noise, such as symmetric, asymmetric and instance-dependent noise (IDN), with IDN being the only type that depends on image information. Such dependence on image information makes IDN a critical type of label noise to study, given that labelling mistakes are caused in large part by insufficient or ambiguous information about the visual classes present in images. Aiming to provide an effective technique to address IDN, we present a new graphical modelling approach called InstanceGM, that combines discriminative and generative models. The main contributions of InstanceGM are: i) the use of the continuous Bernoulli distribution to train the generative model, offering significant training advantages, and ii) the exploration of a state-of-the-art noisy-label discriminative classifier to generate clean labels from instance-dependent noisy-label samples. InstanceGM is competitive with current noisy-label learning approaches, particularly in IDN benchmarks using synthetic and real-world datasets, where our method shows better accuracy than the competitors in most experiments.
Figure 2 from InstanceGM . The proposed InstanceGM trains the Classifiers to output clean labels for instance-dependent noisy-label samples. We first warmup our two classifiers (Classifier-{11,12}) using the classification loss, and then with classification loss we train the GMM to separate clean and noisy samples with the semi-supervised model MixMatch from the DivideMix stage. Additionally, another set of encoders (Encoder-{1,2}) are used to generate the latent image features as depicted in the graphical model from Fig. 1. Furthermore, for image reconstruction, the decoders (Decoder-{1,2}) are used by utilizing the continuous Bernoulli loss, and another set of classifiers (Classifier-{21,22}) helps to identify the original noisy labels using the standard cross-entropy loss
-
The above graphical model (left-section) is adopted from CausalNL
Our code is heavily based on the mentioned two repos
- All the libraries used can be found in the requirements file
For adding artifical Instance-Dependent noise in CIFAR10/100, we use the code from Part-dependent Label Noise. Please check the tools.py
file in our repository
To run this project, you will need to add the following libraries from requirements file
pip install -r requirements.txt
bash cifar10.sh
$ python instanceGM.py --r 0.5
- r is the noise rate
For installing docker on your system please follow official Docker Documentation
- To run it on CIFAR-10 (this dataset is already inside docker image), run the following command from your terminal
docker run --gpus 1 -ti arpit2412/instancegm:cifar /bin/bash -c "cd /src && source activate instanceGM && python instanceGM.py --r 0.5"
-
The above command conatins gpu support and automatically pull the docker image from docker hub if not found locally, and run it after activating the environment
-
To change the noise rate change the argument --r, be default it's 0.5
- To run it on CIFAR-100 (this dataset is already inside docker image), run the following command from your terminal
docker run --gpus 1 -ti arpit2412/instancegm:cifar /bin/bash -c "cd /src && source activate instanceGM && python instanceGM.py --num_class 100 --data_path ./cifar-100 --dataset cifar100 --r 0.5"
- To change the noise rate change the argument --r, be default it's 0.5, and changing the settings from CIFAR10 to CIFAR100
- In order to run Animal10N you must have dataset stored in your local machine and then we can mount that folder to docker image using
-v
parameter while running InstanceGM
wandb docker run --gpus 1 -v absolute_path_of_animal10N/:/src/animal10N/ -ti instancegm /bin/bash -c "cd ./src && source activate instanceGM && python instanceGM_animal10N.py --saved False"
-
Please replace
absolute_path_of_animal10N
with your absolute path of Animal10N dataset -
To record the progress with all the loss curves, accuracy curves and sample images, we used wandb. If you are using it for first time it might ask you for wandb credentials
-
Initially when running Animal10N for first time it would save the dataset labels that's why saved is False (by default) but if you are running again you can use the previously saved label and data information by changing
--saved True
parameter in the above command -
CIFAR10/CIFAR100 configurations are followed to run this
-
In order to run Red Mini-ImageNet you must have dataset stored in your local machine and then we can mount that folder to docker image using
-v
parameter while running InstanceGM -
Dataset link: https://google.github.io/controlled-noisy-web-labels/download.html
-
Directory structure of Red Mini-ImageNet is mentioned in
redMini.txt
wandb docker run --gpus 1 -v absolute_path_of_redMini/:/src/red_blue/ -ti instancegm /bin/bash -c "cd ./src && source activate instanceGM && python instanceGM_redMini.py"
-
Please replace
absolute_path_of_redMini
with your absolute path of Red Mini-ImageNet dataset -
To record the progress with all the loss curves, accuracy curves and sample images, we used wandb. If you are using it for first time it might ask you for wandb credentials
-
Following the literature the noise rates considered were 0.2, 0.4, 0.6, 0.8 (default is 0.2). Can be easily changed adding --r like
python instanceGM_redMini.py --r 0.4
in above command -
CIFAR10/CIFAR100 configurations are followed to run this
- In order to run Clothing1M you must have dataset stored in your local machine and then we can mount that folder to docker image using
-v
parameter while running InstanceGM
wandb docker run --gpus 1 -v absolute_path_of_clothing1M/clothing1M:/src/clothing1M/ -ti instancegm /bin/bash -c "cd ./src && source activate instanceGM && python instanceGM_clothing1M.py"
-
Please replace
absolute_path_of_clothing1M/clothing1M
with your absolute path of Clothing1M dataset -
To record the progress with all the loss curves, accuracy curves and sample images, we used wandb. If you are using it for first time it might ask you for wandb credentials
-
Following the literature, pretrained model is used for ResNet, so it might download some pretrained weights automatically
- Pull image from docker hub
docker pull arpit2412/instancegm:cifar
- If the pull is successfull then following command should list the image
docker image ls
- All the files are present in src folder in docker image. To check all the files:
docker run -ti arpit2412/instancegm:cifar /bin/bash
cd src
ls
- If you wanna build the image from the files procided in the github repository
docker build -f Dockerfile_train -t docker_instancegm .
- Cifar100
- Red Mini-ImageNet
- Animal-10N
@InProceedings{Garg_2023_WACV,
author = {Garg, Arpit and Nguyen, Cuong and Felix, Rafael and Do, Thanh-Toan and Carneiro, Gustavo},
title = {Instance-Dependent Noisy Label Learning via Graphical Modelling},
booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)},
month = {January},
year = {2023},
pages = {2288-2298}
}
This work is licensed under a Custom License. Non-commercial use is permitted without restrictions, while commercial users must contact the copyright holder for licensing permission.