# Calculating Uncertainty Values from Reward Samples!

##### In this notebook, we want to demonstrate through the process of determining uncertainty values using reward samples. Uncertainty values are crucial for understanding the variability or confidence in the data we observe.

In [2]:
import torch
from utils import balanced_entropy, epistemic_uncertainty, aleatoric_uncertainty

## Getting Started with Our Reward Samples

##### Let's assume that we have a set of reward samples, specifically the MC-dropout samples of R1 - R2. These samples will be our starting point for calculating uncertainty. We'll explain what R1 and R2 represent and how we can use them to gain insights into the uncertainty of our rewards.

In [2]:
reward_gap_samples = torch.tensor(
    [[-0.5071,  0.3939,  0.6309,  0.5251,  1.2876,  1.1699,  0.5775,  1.1834,
       0.1770,  0.1188, -0.5556,  0.1510,  1.7552,  1.0578,  0.8473,  0.0886,
       1.3317,  1.1074, -0.2550,  0.0272,  0.9689,  0.8330,  0.5621,  0.2605,
      -0.0145]]) # we use 25 samples

## Balanced Entropy

#### As highlighted in the paper, we compute the Balanced Entropy under the conditions defined by a sigmoid regime. Consequently, our Balanced Entropy is derived by employing the formula outlined below:

$$U_{\text{BalEnt}}\left(\mathbf{P}\right) :=
\frac{\mathbb{E} P_{y_c\succ y_r}h\left( P_{y_c\succ y_r}^+ \right) + \mathbb{E} P_{y_c\prec y_r}h\left( P_{y_c\prec y_r}^+ \right) + H\left(\mathbb{E}\mathbf{P}\right)}{H\left(\mathbb{E}\mathbf{P}\right) +\log 2 }$$

#### To accomplish this, we require the sample mean and standard deviation for the R1-R2 samples.

In [None]:
sample_mean = torch.mean(reward_gap_samples, dim=1)
sample_std = torch.std(reward_gap_samples, dim=1)
sample_mean, sample_std

#### Subsequently, by applying the Trapezoidal rule, we concluded the calculation of Balanced Entropy.

In [4]:
balanced_entropy(sample_mean, sample_std)

tensor([-0.0047])

## Epistemic Uncertainty and Aleatoric Uncertainty

#### Epistemic Uncertainty and Aleatoric Uncertainty can be directly calculated using logits. Therefore, we first apply a Sigmoid transform to compute the logits.

In [5]:
logits = torch.nn.functional.logsigmoid(reward_gap_samples.to(torch.float32))
logits = torch.log(torch.stack([torch.exp(logits), 1.-torch.exp(logits)+1e-128], dim=2))

### Epistemic Uncertainty

#### Epistemic Uncertainty is expressed by the following equation:

$$U_{\text{Epistemic}}\left(\mathbf{P}\right):=H\left(\mathbb{E}\mathbf{P}\right)+\mathbb{E}\left(\sum_{i \in I}P_i\log P_i \right)$$

In [6]:
epistemic_uncertainty(logits)

tensor([0.0374])

# Aleatoric Uncertainty

#### Aleatoric Uncertainty is expressed by the following equation:

$$U_{\text{Aleatoric}}\left(\mathbf{P}\right):=-\mathbb{E}\left(\sum_{i \in I}P_i\log P_i \right)$$

In [7]:
aleatoric_uncertainty(logits)

tensor([0.6245])