# Risk and Uncertainty in Deep Learning

> see: https://gdmarmerola.github.io/risk-and-uncertainty-deep-learning/  
> see its code at: https://www.kaggle.com/code/gdmarmerola/risk-and-uncertainty-in-deep-learning

Neural networks have been pushing what is possible in a lot of domains and are becoming a standard tool in industry. As they start being a vital part of business decision making, methods that try to open the neural network "black box" are becoming increasingly popular. [LIME](https://github.com/marcotcr/lime), [SHAP](https://github.com/slundberg/shap) and [Embeddings](https://distill.pub/2019/activation-atlas/) are nice ways to explain what the model learned and why it makes the decisions it makes. On the other hand, instead of trying to explain what the model learned, we can also try to get insights about what the model **does not know**, which implies estimating two different quantities: **risk** and **uncertainty**. [Variational Inference](https://arxiv.org/abs/1505.05424), [Monte Carlo Dropout](https://arxiv.org/abs/1506.02142) and [Bootstrapped Ensembles](https://arxiv.org/abs/1602.04621) are some examples of research in this area. 

At first glance, risk and uncertainty may seem to be the same thing, but in reality they are, in some cases, orthogonal concepts. **Risk** stands for the intrinsic volatility over the outcome of a decision: when we roll a dice, for instance, we always **risk** getting a bad outcome, even if we precisely know the possible outcomes. **Uncertainty**, on the other hand, stands for the confusion about what the possible outcomes are: if someone gives us a strange dice we have never used before, we'll have to roll it for a while before we can even **know** what to expect about its outcomes. Risk is a fixed property of our problem, which can't be cleared by collecting more data, while uncertainty is a property of our beliefs, and can be cleared with more data. Actually we can have **uncertainty** over our belief of what the **risk** actually is! 

If this seems strange at first, don't worry: this topic has been the object of [heated discussions](https://twitter.com/ianosband/status/1014466510885216256) among experts in the area recently. The main question is if a given model estimates the **risk** (*aleatoric uncertainty*) or **uncertainty** (*epistemic uncertainty*). Some references make this discussion very interesting. I put some of them at the end of the post, for your reference. 

In our case, we'll focus on a simple example to illustrate the how the concepts are different and how to use a neural network to estimate them at the same time.

# Why is this relevant?

Risk measures the volatility in making a decision, while uncertainty measures the condifence in our belief about this volatility. Both measures are essential for decision making, particularly in decisions that put a lot of resources at stake. For instance, suppose you use a model to sell your house. A model with a good risk estimate will tell you the volatility in closing price given a specified tolerance of days on-market. Then, a model with a good uncertainty estimate will tell you how your closing price volatility estimate is reliable, given the amount of data you have. In this post, we'll simulate a synthetic case for you to understand and run a model that can do both estimates at the same time. 


# 1. Data

We'll use the same data generating process of my [last post](https://gdmarmerola.github.io/intro-randomized-prior-functions/), borrowed from [Blundell et. al (2015)](https://arxiv.org/pdf/1505.05424.pdf). I add some heteroskedastic noise and use a gaussian distribution to generate $X$, so that risk and uncertainty are bigger when we get far from the origin. The process will look like this:

$$ y = x + 0.3 \cdot{} sin(2π(x + \epsilon)) + 0.3 \cdot{} sin(4 \cdot{} \pi \cdot{}(x + \epsilon)) + \epsilon$$

where $\epsilon \sim \mathcal{N}(0, 0.01 + 0.1 \cdot x^2)$ and $x \sim N(0.0,1.0)$:

In [1]:
# plotting inline
%matplotlib inline

# importing necessary modules
import keras
import random
import numpy as np
import pandas as pd
import scipy.stats as sp
import matplotlib.pyplot as plt
import tensorflow as tf
from tqdm import tqdm
from tensorflow.keras.models import Sequential, Model
from tensorflow.keras.layers import Dense, Activation, concatenate, Input, Embedding
from tensorflow.keras.layers import Reshape, Concatenate, BatchNormalization, Dropout, Add, Lambda
from tensorflow.keras.layers import add
from tensorflow.keras.optimizers import Adam, RMSprop
from keras.wrappers.scikit_learn import KerasClassifier, KerasRegressor
from sklearn.ensemble import BaggingRegressor
from copy import deepcopy
from keras_tqdm import TQDMNotebookCallback

# turning off automatic plot showing, and setting style
plt.ioff()
plt.style.use('bmh')

ModuleNotFoundError: No module named 'matplotlib'

In [2]:
# setting seed 
np.random.seed(10)

# generating big and small datasets
X = np.clip(np.random.normal(0.0, 1.0, 1000).reshape(-1,1), -3, 3)

# let us generate a grid to check how models fit the data
x_grid = np.linspace(-5, 5, 1000).reshape(-1,1)

# defining the function - noisy
noise = lambda x: sp.norm(0.00, 0.01 + (x**2)/10)
target_toy = lambda x: (x + 0.3*np.sin(2*np.pi*(x + noise(x).rvs(1)[0])) + 
                        0.3*np.sin(4*np.pi*(x + noise(x).rvs(1)[0])) + 
                        noise(x).rvs(1)[0] - 0.5)

# defining the function - no noise
target_toy_noiseless = lambda x: (x + 0.3*np.sin(2*np.pi*(x)) + 0.3*np.sin(4*np.pi*(x)) - 0.5)

# runnning the target
y = np.array([target_toy(e) for e in X])
y_noiseless = np.array([target_toy_noiseless(e) for e in x_grid])

NameError: name 'np' is not defined

This problem is good for measuring both risk and uncertainty. Risk gets bigger where the intrinsic noise from the data generating process is larger, which in this case is away from the origin, due to our choice of $\epsilon \sim \mathcal{N}(0, 0.01 + 0.1 \cdot x^2)$. Uncertainty gets bigger where there's less data, which is also away from the origin, due to the distribution of $x$ being a normal  $x \sim N(0.0,1.0)$. 

So, let us start to build a risk and uncertainty estimating model for this data! The first step is to use a vanilla neural network to estimate expected values.