# A review of uncertainty quantification in deep learning

[Paper](https://www.sciencedirect.com/science/article/pii/S1566253521001081)

[Slides](https://danjacobellis.github.io/FTML/uncertainty_quantification.slides.html)

<script>
    document.querySelector('head').innerHTML += '<style>.slides { zoom: 1.75 !important; }</style>';
</script>

<center> <h1>
Uncertainty quantification in deep learning
</h1> </center>

## Uncertainty quantification
* According to [US Dept of energy (2009)](https://science.osti.gov/-/media/ascr/pdf/program-documents/docs/Nnsa_grand_challenges_report.pdf), uncertainty from many sources should be considered
  * Stochastic measurement error
  * Limitations of theoretical models
  * Numerical representations of models
  * Approximations
  * Human error
  * Ignorance

## Predictive uncertainty

* Consider a statistical learning model that makes predictions $\hat{y}$ based on previously seen data $(x_{\text{train}},y_{\text{train}})$
* The model's predictions will have some error $e = y_{\text{GT}}-\hat{y}$
* We can never know the actual error $e$
* However, we can try to characterize our confidence in $\hat{y}$

## Aleatoric Uncertainty

* Variation that is consistant across repititions of an experiement
* Often possible to characterize the distribution accurately

<p style="text-align:center;">
<img src="_images/aleatoric.png" width=600 height=600 class="center">
</p>

![](img/aleatoric.png)

## Epistemic Uncertainty

* Lack of knowledge
* Imperfect model or model parameters
* Difficult to chacterize the distribution

<p style="text-align:center;">
<img src="_images/aleatoric_epistemic.jpg" width=500 height=500 class="center">
</p>

![](img/aleatoric_epistemic.jpg)

## Uncertainty propagation in forward problem

* Example: Determine uncertainty in restistance from measurements of voltage and current.

$$R = h(V,I)= \frac{V}{I}$$
$$f_R(r) = f_{V, I}\left(h^{-1}(r)\right) \left|\text{det}(\mathbf J\{h^{-1} \}) \right|$$

$$\sigma_R=R\sqrt{\left(\frac{\sigma_V}{V}\right)^2 +\left(\frac{\sigma_I}{I} \right)^2}$$

## Uncertainty in inverse problem

* Consider an acoustic propagation model governed by the wave equation $c^2 \nabla^2 p = \frac{\partial^2 p}{\partial t^2}$
  * We can check if the parameters fit the data using the forward model
  * Many combinations of parameter values will fit the data

<p style="text-align:center;">
<img src="_images/measured_modeled.png" width=600 height=600 class="center">
</p>

![](img/measured_modeled.png)

## Sources of uncertainty in deep learning
* Aleatoric
    * Limited computational resources
    * Limited training data
* Epistemic
    * Data collection process
    * Accuracy of training data
    * Distribution drift

## Bayesian neural networks

<p style="text-align:center;">
<img src="_images/BNN.png" width=600 height=600 class="center">
</p>

![](img/BNN.png)

## Bayesian neural networks

[BNN](https://arxiv.org/pdf/2007.06823.pdf)

<p style="text-align:center;">
<img src="_images/bnn_arch.png" width=600 height=600 class="center">
</p>

![](img/bnn_arch.png)

## Monte Carlo dropout

* Monte carlo sampling can be used to obtain posterior in BNN
  * Extremely expensive. Limited to shallow networks.
* Dropout is a common regularization technique in NNs
  * randomly drop units to prevent excessive codependence
  * [Dropout training approximates Bayesian inference](https://arxiv.org/pdf/1506.02142.pdf)

<p style="text-align:center;">
<img src="_images/bayes_seg_net.png" width=600 height=600 class="center">
</p>

[bayesian segnet](https://arxiv.org/pdf/1511.02680.pdf)
![](img/bayes_seg_net.png)

## Variational inference

* Frame the Bayesian inference problem as an optimization problem
* Approximate posterior distribution over the weights of the NN
* Minimize KL divergence between variational distribution and true posterior

<p style="text-align:center;">
<img src="_images/vi_scal_uncer.png" width=600 height=600 class="center">
</p>

[Scalable Uncertainty](https://arxiv.org/pdf/2003.03396.pdf)
![](img/vi_scal_uncer.png)

## Variational autoencoders

* Maps high-dimensional data to low-dimensional latent variables
* Provides a generative model that can be used for UQ

[UQ using generative models](https://arxiv.org/pdf/1910.10046.pdf)

<p style="text-align:center;">
<img src="_images/uq_gen_mnist.png" width=600 height=600 class="center">
</p>

![](img/uq_gen_mnist.png)

## Bayes by backprop

* Minimize variational free energy

![](img/bbb.png)

[weight uncertainty](https://arxiv.org/pdf/1505.05424.pdf)

## Laplacian approximations

* Build a gaussian distribution around the true posterior
  * Use a taylor expansion around the MAP

![](img/laplace_sd.png)

[laplacian](https://openreview.net/pdf?id=Skdvd2xAZ)

## Ensemble techniques

* An ensemble of models can enchance predictive performance
* How can we use an ensemble to generate uncertainty estimates?

<p style="text-align:center;">
<img src="_images/ensemble.png" width=600 height=600 class="center">
</p>

![](img/ensemble.png)

[brain](https://arxiv.org/pdf/1807.07356.pdf)

![](img/ensemble_brain.png)

## Ensemble techniques
![](img/michigan.png)

[air pollution](https://arxiv.org/pdf/1911.04061.pdf)