Skip to content


Switch branches/tags

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time

The official PyTorch implementation of:
Pitfalls of In-Domain Uncertainty Estimation and Ensembling in Deep Learning, ICLR'20

OpenReview / arXiv / Poster video (5 mins) / Blog / bibtex

Poster video (5 mins)

ICLR 2020 Poster Presentation by Dmitry Molchanov

Environment Setup

The following allows to create and to run a python environment with all required dependencies using miniconda:

conda env create -f condaenv.yml
conda activate megabayes

Logs, Plots, Tables, Pre-trained weights

At the notebooks folder we provide:

  • Saved logs with all computed results
  • Examples of ipython notebooks to reproduce plots, tables, and compute the deep ensemble equivalent (DEE) score

Pre-trained weights of of some models are available at link: Deep Ensembles on ImageNet (~10G) and Deep Ensembles on CIFARs (~62G), etc. The weights can be also download with a command line interface by yadisk-direct:

pip3 install wldhx.yadisk-direct

% ImageNet
curl -L $(yadisk-direct -o

curl -L $(yadisk-direct -o

Pre-trained weights for other models can be provided on the request---make an issue if you need some specific models.


The evaluation of ensembling methods can be done using the scripts from the ens folder, which contains a separate script for each ensembling method. The scrips have the following interface:

ipython -- ens/ens-<method>.py -h
usage: [-h] --dataset DATASET [--data_path PATH]
                     [--models_dir PATH] [--aug_test] [--batch_size N]
                     [--num_workers M] [--fname FNAME]

optional arguments:
  -h, --help         show this help message and exit
  --dataset DATASET  dataset name CIFAR10/CIFAR100/ImageNet (default: None)
  --data_path PATH   path to a data-folder (default: ~/data)
  --models_dir PATH  a dir that stores pre-trained models (default: ~/megares)
  --aug_test         enables test-time augmentation (default: False)
  --batch_size N     input batch size (default: 256)
  --num_workers M    number of workers (default: 10)
  --fname FNAME      comment to a log file name (default: unnamed)
  • All scripts assume that pytorch-ensembles is the current working directory (cd pytorch-ensembles).
  • The scripts will write .csv logs in pytorch-ensembles/logs in the following format rowid, dataset, architecture, ensemble_method, n_samples, metric, value, info.
  • The notebooks folder contains ipython notebooks to reproduce the tables and plots using these logs
  • The scripts will write final log-probs of every method in '.npy' format to pytorch-ensembles/.megacache folder.
  • The interface for K-FAC-Laplace differs and is described below.


ipython -- ens/  --dataset=CIFAR10/CIFAR100/ImageNet
ipython -- ens/ --dataset=CIFAR10/CIFAR100/ImageNet
ipython -- ens/     --dataset=CIFAR10/CIFAR100/ImageNet
ipython -- ens/   --dataset=CIFAR10/CIFAR100
ipython -- ens/     --dataset=CIFAR10/CIFAR100/ImageNet
ipython -- ens/ --dataset=CIFAR10/CIFAR100
ipython -- ens/      --dataset=CIFAR10/CIFAR100/ImageNet
ipython -- ens/    --dataset=CIFAR10/CIFAR100

Training models

Deep Ensemble members, Variational Inference (CIFAR-10/100)

All the models trained on CIFAR use a single GPU for training. Examples of training commands:

bash train/ \
--dataset CIFAR10/CIFAR100 \
--arch VGG16BN/PreResNet110/PreResNet164/WideResNet28x10 \
--method regular/vi

Fast Geometric Ensembling, SWA-Gaussian, Snapshot Ensembles, and Cyclical SGLD (CIFAR-10/100)

Examples of training commands:

bash train/ CIFAR10 WideResNet28x10 1 ~/weights ~/datasets
bash train/ CIFAR10 WideResNet28x10 1 ~/weights ~/datasets
bash train/ CIFAR10 WideResNet28x10 1 ~/weights ~/datasets SSE
bash train/ CIFAR10 WideResNet28x10 1 ~/weights ~/datasets cSGLD

Script parameters: dataset, architecture name, training run id, root directory for saving snapshots (created automatically), root directory for datasets (downloaded automatically)

K-FAC Laplace (CIFAR-10/100, ImageNet)

Given a checkpoint, ens/ builds the Laplace approximation and produces the results of the approximate posterior averaging. Use keys --scale_search and --gs_low LOW --gs_high HIGH --gs_num NUM to find the optimal posterior noise scale on the validation set. We have used the following noise scales (also listed in Table 3, Appendix D in the paper):

Architecture CIFAR-10 CIFAR-10-aug CIFAR-100 CIFAR-100-aug
VGG16BN 0.042 0.042 0.100 0.100
PreResNet110 0.213 0.141 0.478 0.401
PreResNet164 0.120 0.105 0.285 0.225
WideResNet28x10 0.022 0.018 0.022 0.004

For ResNet50 on ImageNet, the optimal scale found was 2.0 with test-time augmentation and 6.8 without test-time augmentation.

Refer to `ens/' for the full list of arguments and default values. Example use:

ipython -- ens/ --file CHECKPOINT --data_path DATA --dataset CIFAR10 --model PreResNet110 --scale 0.213

Deep Ensemble members, Variational Inference, Snapshot Ensembles (ImageNet)

Examples of training commands:

bash train/ --method regular/sse/fge/vi

We strongly recommend using multi-gpu training for Snapshot Ensembles.


Parts of this code are based on the following repositories:


If you found this code useful, please cite our paper

  title={Pitfalls of In-Domain Uncertainty Estimation and Ensembling in Deep Learning},
  author={Ashukha, Arsenii and Lyzhov, Alexander and Molchanov, Dmitry and Vetrov, Dmitry},
  journal={arXiv preprint arXiv:2002.06470},