<h1 style="text-align: center;">Introduction to Bayesian Neural Network</h1>
<p style="text-align: center;"> A short version of <a href="https://towardsdatascience.com/why-you-should-use-bayesian-neural-network-aaf76732c150">Yeung WONG's blog post</a> with Son Hai Le's modifications </p>
<a href="https://towardsdatascience.com/why-you-should-use-bayesian-neural-network-aaf76732c150">
    <figure>
        <center>
            <img src="reference/comparison_snn_bnn.png" alt="comparison_snn_bnn">
        </center>
    </figure>
</a>

**Goal of this document**
- Explain different types of uncertainties
- Compare the standard neural network ([SNN](http://www.deeplearningbook.org)) and the Bayesian neural network ([BNN](https://ieeexplore.ieee.org/document/9756596))
- Pros and cons of BNN

**What are different compared to the original** [Yeung WONG's blog post](https://towardsdatascience.com/why-you-should-use-bayesian-neural-network-aaf76732c150)?
- Make the document more succinct
- Add more references/hyperlinks
- Add the BNN workflow explanation

*Note: follow [hyperlinks]() to see more details*

# 1 Sources of uncertainty

## 1.1 [Aleatory uncertainty](https://www.sciencedirect.com/science/article/abs/pii/S0167473008000556)
- Also known as statistical uncertainty
- Irreducible uncertainty
- Caused by the inherent randomness of the system
- Examples: 
    - Weather forecasting: inherent randomness of weather
    - Earthquakes: unpredictable due to inherent randomness
    - Genetics: individual variation is aleatory uncertainty
- In deep learning: uncertainty of the model outputs
- Figure 2:
    - The black line: the prediction
    - The orange area: the aleatory uncertainty

<figure>
    <center>
    <img src="reference/aleatory.png" alt="aleatory">
    <figcaption> 
    <a href="https://towardsdatascience.com/why-you-should-use-bayesian-neural-network-aaf76732c150"> Figure 2. Example of the aleatory uncertainty </a> <figcaption>
<figure> 

## 1.2 [Epistemic uncertainty](https://www.sciencedirect.com/science/article/abs/pii/S0167473008000556)
- Also known as systematic uncertainty
- Reducible uncertainty
- Caused by the lack of knowledge about the system
- Examples:
    - Medical diagnosis: limited knowledge can lead to uncertainty
    - Climate change: complexity of climate system leads to uncertainty
    - Legal proceedings: complexity of law and limited information lead to uncertainty
- In deep learning: uncertainty of the model weights
- Figure 3:
    - Variation of model weights for each training

<figure>
    <center>
    <img src="reference/epistemics.png" alt="epistemic">
    <figcaption> 
    <a href="https://towardsdatascience.com/why-you-should-use-bayesian-neural-network-aaf76732c150"> Figure 3. Example of the epistemic uncertainty </a> <figcaption>
<figure> 


# 2 Comparison between SNN and BNN

## 2.1 What is SNN?
- A neural network with fixed weights and outputs
- Deterministic weights and outputs
- Uncertainty in inputs

## 2.2 What is BNN?
- A combination of SNN and [Bayesian inference](https://ieeexplore.ieee.org/document/9756596): 
    * Treat the weights and outputs of a neural network as random variables
    * Find their marginal distributions that best fit the data
- The ultimate goal of BNN:
    * Quantify the uncertainty introduced by the models in terms of outputs and weights so as to explain the trustworthiness of the prediction

## 2.3 Key differences between SNN and BNN

Table 1:
- Goal:
    * SNN: optimization (find one optimal to represent a weight)
    * BNN: marginalization (treat each weight as a variable and find its distribution)
- Method:
    * SNN: differentiation ([gradient descent](http://www.deeplearningbook.org))
    * BNN: [Markov chain Monte Carlos (MCMC)](https://doi.org/10.1093/biomet/57.1.97), [variational inference (VI)](https://doi.org/10.1080/01621459.2017.1285773),  [normalizing flows](https://arxiv.org/abs/1505.05770v6)
- Estimate:
    * SNN: [maximum likelihood estimators (MLE)](https://ieeexplore.ieee.org/document/9756596)
    * BNN: [maximum a Posteriori (MAP)](https://ieeexplore.ieee.org/document/9756596),  [full/approximate predictive distribution](https://mlg.eng.cam.ac.uk/zoubin/papers/icml05snelson.pdf) 

<center>
<a href="https://towardsdatascience.com/why-you-should-use-bayesian-neural-network-aaf76732c150"> Table 1. Key differences between SNN and BNN </a>

|             | SNN         | BNN           |
| :---:       |    :----:   |      :---:    |
| Goal     | Optimization      | Marginalization   |
| Weight   | A single set        | Probabilistic distribution      |
| Method   | Differentiation <br> (Gradient descent)      | MCMC, VI, or normalizing flows      |
| Estimate  | MLE    | MAPE <br> Full/approximate predictive distribution      |
<center>

## 2.4 Workflow of BNN
Figure 5:
- Design a BNN:
    * Choose NN architecture:
        * A functional model
    * Choose a stochastic model
        * A prior distribution over the possible model parametrization $p(\theta)$
        * A prior confidence in the predictive power of the model $p(y|x,\theta)$
        * The model parametrization: the hypothesis &H&
        * The training dataset: $D$
- Train a BNN: use Bayesian inference [Bayesian inference](https://authors.library.caltech.edu/13793/1/MACnc92b.pdf)
- Use a BNN to quantify the uncertainty on its predictions: 
    * Given $p(\theta|D)$, use a Monte Carlo method to approximate the marginal probability distribution $p(y|x,D)$ as follows:
       * $p(y|x,D) = \int_\theta p(y|x,\theta')p(\theta'|D)d\theta'$

<figure>
    <center>
    <img src="reference/bnn_flow.png" alt="bnn_flow">
    <figcaption> <a href="https://ieeexplore.ieee.org/document/9756596"> Figure 5.  Workflow to design (a), train (b) and use a BNN for predictions (c) </a> <figcaption>
<figure> 

# 3 Pros and cons of BNN
## 3.1 Pros
- Get a more robust model:
    * Can find the distribution of the weights
    * Can avoid the overfitting problem by addressing the regularization properties
- Get a prediction interval:
    * Automatically calculate the uncertainties associated with the prediction when dealing with unknown targets

## 3.2 Cons
- Demand maths and statistics knowledge:
    * Require to have a strong background in statistical distributions so as to apply the appropriate prior and posterior functions
- Require more computational resources:
    * Require more computational resources to calculate the posterior distribution

# 4 Recommendations for further reading
- [[Journal paper] Bayesian Neural Networks: A Tutorial for Deep Learning Users](https://ieeexplore.ieee.org/document/9756596)
- [[Journal paper] Bayesian Neural Networks: A Probabilistic Perspective](https://arxiv.org/abs/1801.07710)
- [[Blog post] Why you should use Bayesian Neural Network?](https://towardsdatascience.com/why-you-should-use-bayesian-neural-network-aaf76732c150)
- [[Blog post] 8 Terms You Should Know about Bayesian Neural Network](https://towardsdatascience.com/8-terms-you-should-know-about-bayesian-neural-network-467a16266ea0)
- [[Blog post] Bayesian Neural Networks](https://www.cs.ox.ac.uk/people/yarin.gal/website/blog_3d801aa532c1ce.html)
- [[Blog post] Implementation of Bayesian Neural Networks with TensorFlow Probability](https://towardsdatascience.com/bayesian-neural-networks-with-tensorflow-probability-fbce27d6ef6)
- [[Blog post] Building probabilistic Bayesian neural network models with TensorFlow Probability](https://keras.io/examples/keras_recipes/bayesian_neural_networks/)
- [[Colab]  Building probabilistic Bayesian neural network models with TensorFlow Probability](https://colab.research.google.com/github/keras-team/keras-io/blob/master/examples/keras_recipes/ipynb/bayesian_neural_networks.ipynb)