Skip to content

AaltoML/calibrated-dnn

Repository files navigation

Fixing Overconfidence in Dynamic Neural Networks

This repository is the official implementation of the methods in the publication:

  • Lassi Meronen, Martin Trapp, Andrea Pilzer, Le Yang, and Arno Solin (2024). Fixing overconfidence in dynamic neural networks. In IEEE Winter Conference on Applications of Computer Vision (WACV). [arXiv preprint]

Installing required Python packages in a virtual environment

  • Start by installing Python version 3.7.4
  • Create and start a virtual environment named MSDNet:
python -m venv MSDNet
source MSDNet/bin/activate
  • Install the required packages into the newly created virtual environment:
python -m pip install -r requirements.txt

Obtaining the CIFAR-100, ImageNet, and Caltech-256 data sets

CIFAR-100

  • The CIFAR-100 and Caltech-256 data sets will be automatically downloaded by the training script if you don't already have them (also CIFAR-10 if you wish to experiment on that). Note that on Caltech-256 the dowloaded image folders may contain some additional non-image files that need to be manually removed from the folders for the training scripts to run.

ImageNet

  • The ImageNet data set can be downloaded at image-net.org. You should download the Training images (Task 1 & 2)(138GB) and Validation images (all tasks)(6.3GB) from the ILSVRC2012 version.
  • After this you should have ILSVRC2012_img_train.tar and ILSVRC2012_img_val.tar. Extract ILSVRC2012_img_train.tar once to obtain a folder ILSVRC2012_img_train containing 1000 .tar subfolders.
  • Make sure ILSVRC2012_img_train and ILSVRC2012_img_val.tar are in the same folder, we refer to this as /path_to_imagenet. Go to this folder to perform the following data set extraction.
  • The complete training set can be extracted from the .tar files for example by doing the following:
mkdir train && cd train
find /path_to_imagenet/ILSVRC2012_img_train -name "*.tar" | while read NAME ; do SUBSTRING=$(echo $NAME| cut -d'/' -f 7) ; mkdir -p "${SUBSTRING%.tar}"; tar -xf "${NAME}" -C "${SUBSTRING%.tar}"; done
cd ..
  • The validation set can be extracted as follows:
mkdir val && cd val && tar -xf /path_to_imagenet/ILSVRC2012_img_val.tar -C /path_to_imagenet/val
wget -qO- https://raw.githubusercontent.com/soumith/imagenetloader.torch/master/valprep.sh | bash
cd ..
  • After these steps you should have folders /path_to_imagenet/train and /path_to_imagenet/val which both contain 1000 subfolders, each containing the sample images for one class.

Training MSDNet backbone for CIFAR100

  • To train the small model for CIFAR-100, run the following command:
python main.py --data-root /path_to_CIFAR100/ --data cifar100 \
	--save /savepath/MSDNet/cifar100_4  \
    --arch msdnet --batch-size 64 --epochs 300 --nBlocks 4 \
    --stepmode lin_grow --step 1 --base 1 --nChannels 16 --use-valid \
    -j 1 --var0 2.0 --laplace_temperature 1.0
  • To run the medium and large models, you need to change the --nBlocks argument to 6 and 8 respectively. Note that you also need to change the path in --save to not overwrite previously trained models.

Training MSDNet backbone for ImageNet

  • To train the small model for ImageNet, run the following command:
python main.py --data-root /path_to_imagenet --data ImageNet \
	--save /savepath/MSDNet/imagenet_base4 \
    --arch msdnet --batch-size 256 --epochs 90 --nBlocks 5 \
    --stepmode even --step 4 --base 4 --nChannels 32 --use-valid \
    --growthRate 16 --grFactor 1-2-4-4 --bnFactor 1-2-4-4 \
    -j 4 --gpu 0 --var0 2.0 --laplace_temperature 1.0
  • To run the medium and large models, you need to change both the --step and --base arguments to 6 for the medium model, and to 7 for the large model. Note that you also need to change the path in --save to not overwrite previously trained models.

Training MSDNet backbone for Caltech-256

  • To train the small model for Caltech-256, run the following command:
python main.py --data-root /path_to_caltech --data caltech256 \
	--save /savepath/MSDNet/caltech_base4 \
    --arch msdnet --batch-size 128 --epochs 180 --nBlocks 5 \
    --stepmode even --step 4 --base 4 --nChannels 32 --use-valid \
    --growthRate 16 --grFactor 1-2-4-4 --bnFactor 1-2-4-4 \
    -j 4 --gpu 0 --var0 2.0 --laplace_temperature 1.0
  • To run the medium and large models, you need to change both the --step and --base arguments to 6 for the medium model, and to 7 for the large model. Note that you also need to change the path in --save to not overwrite previously trained models.

Compute Laplace approximation separately after training

  • The Laplace approximation is precomputed automatically at the end of training. However, if you wish to separately recompute the Laplace approximation, you can do so as follows:
python main.py --data-root /path_to_CIFAR100/ --data cifar100 \
	--save /savepath/MSDNet/cifar100_4 \
    --arch msdnet --batch-size 64 --epochs 300 --nBlocks 4 \
    --stepmode lin_grow --step 1 --base 1 --nChannels 16 --use-valid \
    -j 1 --compute_only_laplace --resume /savepath/MSDNet/cifar100_4/save_models/model_best_acc.pth.tar \
    --var0 2.0
  • Note that you need to change the arguments --save and --resume to the correct path that contains the trained model that you want to calculate the Laplace approximation for.
  • The example command is for CIFAR-100 but the same can be done for ImageNet or Caltech-256, by adding the --compute_only_laplace and --resume /path_to_saved_model/save_models/model_best_acc.pth.tar arguments to the ImageNet or Caltech-256 training command.

Test vanilla MSDNet on CIFAR-100

  • To test the small vanilla MSDNet model on CIFAR-100, run the following:
python main.py --data-root /path_to_CIFAR100/ --data cifar100 --save /savepath/MSDNet/cifar100_4 \
    --arch msdnet --batch-size 64 --epochs 300 --nBlocks 4 --stepmode lin_grow --step 1 --base 1 \
    --nChannels 16 --use-valid -j 1 --evalmode dynamic \
    --evaluate-from /savepath/MSDNet/cifar100_4/save_models/model_best_acc.pth.tar
  • Note that the --save and --evaluate-from arguments have to be the correct paths to the saved model directory.
  • For medium and large models you need to again change the --nBlocks argument accordingly, as well as the paths in --save and --evaluate-from.

Test vanilla MSDNet on ImageNet

  • To test the small vanilla MSDNet model on ImageNet, run the following:
python main.py --data-root /path_to_imagenet --data ImageNet --save /savepath/MSDNet/imagenet_base4 \
    --arch msdnet --batch-size 256 --epochs 90 --nBlocks 5 --stepmode even --step 4 --base 4 \
    --nChannels 32 --use-valid -j 4 --gpu 0 --evalmode dynamic \
    --growthRate 16 --grFactor 1-2-4-4 --bnFactor 1-2-4-4 \
    --evaluate-from /savepath/MSDNet/imagenet_base4/save_models/model_best_acc.pth.tar
  • Here again the medium and large models require changing the --step and --base arguments as described in the model training, and the paths in --save and --evaluate-from need to be set correctly to utilize the correct saved model that you want to evaluate.

Test vanilla MSDNet on Caltech-256

  • To test the small vanilla MSDNet model on Caltech-256, run the following:
python main.py --data-root /path_to_caltech --data caltech256 --save /savepath/MSDNet/caltech_base4 \
    --arch msdnet --batch-size 128 --epochs 180 --nBlocks 5 --stepmode even --step 4 --base 4 \
    --nChannels 32 --use-valid -j 4 --gpu 0 --evalmode dynamic \
    --growthRate 16 --grFactor 1-2-4-4 --bnFactor 1-2-4-4 \
    --evaluate-from /savepath/MSDNet/caltech_base4/save_models/model_best_acc.pth.tar
  • Here again the medium and large models require changing the --step and --base arguments as described in the model training, and the paths in --save and --evaluate-from need to be set correctly to utilize the correct saved model that you want to evaluate.

Test models that use Laplace and/or model-internal ensembling (MIE)

  • To use Laplace approximation in the evaluation of a model, add the following arguments to the model testing commands:
--laplace --laplace_temperature 1.0 --var0 2.0 --n_mc_samples 50 --optimize_temperature --optimize_var0
  • To use MIE in the evaluation of a model, add the argument --MIE to the model testing commands.
  • To test with both Laplace and MIE (our model) add both of the above mentioned arguments into the testing command.

License

This software is provided under the MIT License.

About

Repository containing code for the paper: Fixing Overconfidence in Dynamic Neural Networks

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages