The official source code to Better Uncertainty Calibration via Proper Scores for Classification and Beyond (NeurIPS'22).
Also available on OpenReview.
The classification experiments are all done in Python, while the variance regression ones are in Julia. This is because the regression case is built upon Wiedmann et al 2021. We split up the (install) description of the figures in these two categories: Classification (ECE simulation, CIFAR-10/100, ImageNet), and variance regression (extended Friedman 1, Residiential Housing). This way, you do not have to install Julia if you are only interested in the classification case, or vice versa.
After cloning this repository, create a conda environment via the provided yaml file.
For this, install Anaconda and run conda env create -f condaenv.yml
(like here).
Then, activate the enironment with conda activate unc_cal
.
Open and run the jupyter notebook ECE bias ground truth simulation.ipynb
.
The plot is directly displayed in the last cell.
The simulation is light-weight and can be finished quickly on a typical laptop.
Indeed, we ran the simulation on an M1 MacBook Pro 2021 in minutes.
This section is about the real-world classification calibration experiments.
The logits are pretrained from Kull et al 2019 and Rahimi et al 2020.
For quality of life and backup redundancy, we also provide them in this Google Drive folder. To download the folder, simply run gdown https://drive.google.com/drive/u/1/folders/10XVg_anBCWmjzjh_Hb-A7GYcgjHLypax -O logits --folder
.
All results will be located in the results/
folder.
To run all of the experiments, execute bash run_experiments.sh
.
We used a CPU with 80 threads, which required several days to finish the experiments.
Since the runtime can be infeasible for some CPUs, we also provide our result files in this Google drive folder.
To download the folder, simply run gdown https://drive.google.com/drive/folders/1pP3RhgIdTXpKLArmiyECTo94VcAf0zll -O results --folder
.
Alternatively, you can lower the parameter start_rep
to lower runtime.
This will increase the variance of the results.
The command can also be executed multiple times (results are appended instead of overwritten; we ran it twice).
The seeds are set according to the local time and differ for each rerun.
The sample size should be large enough such that the seed does not matter.
Setting seeds manually is supported and ours are stored in our result files.
There are two options:
To receive all the plots in the paper (and even more), execute
python plotting.py
.
This takes a while (3-6 minutes).
Contrary, running the notebooks allows producing and inspecting each plot individually.
This section is exclusively about the regression calibration experiments. Note, that the DSS score is called PMCC in the code due to technical debt. Ideally, this is fixed some time in the future. Further, we used major parts of the code from Wiedmann et al 2021. But, this also means the regression experiments are done in Julia and require new dependencies being installed compared to the classification experiments.
We used the same dependencies as Wiedmann et al 2021 and the following instructions are copied from there.
(Start of copy)
The experiments were performed with Julia 1.5.3. You can download the official binaries from the Julia webpage.
For increased reproducibility, the nix
environment in this folder provides a fixed Julia
binary:
- Install nix.
- Navigate to this folder and activate the environment by running
in a terminal.
nix-shell
(End of copy)
Again, we ran the experiments on an M1 MacBook Pro 2021, where a run required up to an hour.
First, download the dataset from http://archive.ics.uci.edu/ml/datasets/Residential+Building+Data+Set
and place the csv file into data/
.
Then, open and run the jupyter notebook recalibration_friedman1.ipynb
.
To plot the figures, open the jupyter notebook plotting_recal_regr.ipynb
and set the variable task
to ResBuild
, then run all the cells.
The plots are directly shown in the notebook.
Open and run the jupyter notebook recalibration_friedman1.ipynb
.
To plot the figures, open the jupyter notebook plotting_recal_regr.ipynb
and set the variable task
to friedman1_var
, then run all the cells.
The plots are directly shown in the notebook.
If you found this code useful, please cite our paper
@inproceedings{
gruber2022better,
title={Better Uncertainty Calibration via Proper Scores for Classification and Beyond},
author={Sebastian Gregor Gruber and Florian Buettner},
booktitle={Advances in Neural Information Processing Systems},
editor={Alice H. Oh and Alekh Agarwal and Danielle Belgrave and Kyunghyun Cho},
year={2022},
url={https://openreview.net/forum?id=PikKk2lF6P}
}