## Deep Evidential Regression (Amini et al., 2020) - Summary

### 1.  Introduction: Why do we need uncertainty estimation in regression?

Neural Networks are increasingly being used in places where mistakes can be dangerous or expensive, such as autonomous driving or medical tasks. 

In these situations, it’s not enough to only receive a prediction. In order to minimize mistakes as much as possible, the model should not only output a prediction, but also say how certain it is about said prediction. #

Having reliable uncertainty estimates helps prevent wrong decisions, detect unusual data, and improve the safety of machine learning systems.

### 2. What is Deep Evidential Regression? 

Deep Evidential Regression is a method where a neural network predicts the parameters of a Normal-Inverse-Gamma distribution instead of predicting a single output. 

This higher-order distribution represents “evidence” for the prediction and allows the model to estimate both aleatoric and epistemic uncertainty in one forward pass, without sampling or ensembles.

### 3. Two types of uncertainties in regression models 

**Aleatoric Uncertainty:** noise in the data; it is caused by randomness or measurement noise that even a perfect model can’t eliminate. An example for that would be a noisy sensor reading distance. 

**Epistemic uncertainty:** the model simply doesn’t know enough; it is caused by a lack of knowledge, which decreases with more data. This uncertainty is high when the model sees something unfamiliar or out-of-distribution. An example would be training the model on indoor-only images and then suddenly showing it a snowy mountain. 

These two uncertainties behave differently and matter in different ways, so a good system should be able to estimate both. 

### 4. Proposal of Amini et al. 

The main idea of the paper is to let a neural network predict not only a single output value, but the parameters of a Normal-Inverse-Gamma (NIG) distribution.
This distribution is a higher-order distribution that represents uncertainty about the parameters of a Gaussian likelihood (its mean and variance). By learning this distribution, the model can estimate both aleatoric and epistemic uncertainty in a principled way.

For each regression target, the network outputs four values:

- **γ**: predicted mean
- **υ**: strength of belief about the mean
- **α**: evidence related to the variance
- **β**: scale parameter for the variance

Using these parameters, the model can compute:

- the final prediction
- the aleatoric uncertainty
- the epistemic uncertainty

The key idea is that the NIG distribution captures uncertainty about the likelihood parameters themselves, not just about the data.
This allows the model to express how confident it is about its own prediction without using sampling, dropout, or model ensembles.

Because the model learns how much evidence it has for its predictions, it can increase uncertainty when it makes mistakes or when it sees out-of-distribution data.

### 5. Loss Functions

As explained earlier, in Deep Evidential Regression, the neural network not only outputs a single prediction, but also the parameters of a Normal-Inverse-Gamma distribution. Because of this, the loss function needs to do two things at the same time: make the prediction fit the data + control how much evidence or uncertainty the model produces. 

Because the model outputs a Normal-Inverse-Gamma (NIG) distribution, the resulting likelihood over the target becomes a Student-t distribution. The Student-t distribution is similar to a Gaussian but has heavier tails, which makes it better at handling uncertainty and outliers. This is why the loss function uses the Student-t Negative Log-Likelihood (NLL).

The paper combines two different loss components into one: 

**1. Student-t Negative Log-Likelihood (NLL)**  

This measures how well the predicted distribution matches the true target value.
Because the NIG distribution induces a Student-t likelihood, this term naturally models the data noise and is responsible for learning aleatoric uncertainty.

**2. Evidence Regularizer**  
This penalizes the model when it assigns a large amount of evidence (high confidence) to a prediction that is far from the true value.
It encourages the network to reduce evidence and increase epistemic uncertainty whenever it encounters unfamiliar or difficult inputs.

The total loss is simply:
The final loss combines both parts:

$$
L = L_{\text{NLL}} + \lambda \cdot L_R
$$

where the regularizer is:

$$
L_R = |y - \gamma| \cdot (2\upsilon + \alpha)
$$

This loss encourages the model to make accurate predictions while also expressing meaningful uncertainty.

 

In [None]:
# imports needed
import matplotlib.pyplot as plt
import torch
from torch import nn

In [None]:
# Evidential Regression Loss (Amini et al., 2020)
# Implements the Student-t NLL + Evidence Regularizer


def evidential_loss(y, gamma, v, alpha, beta, lam=1.0):
    """Implements the loss from Deep Evidential Regression:
    L = LNLL + λ * LR

    Parameters:
    y      : ground truth values
    gamma  : predicted mean
    v      : evidence for mean
    alpha  : evidence for variance (> 1)
    beta   : scale parameter (> 0)
    lam    : regularization weight λ
    """
    # 1. Student-t Negative Log Likelihood (LNLL) --> Equation (8) on the paper
    two_bv = 2 * beta * (1 + v)

    LNLL = (
        0.5 * torch.log(torch.pi / v)
        - alpha * torch.log(two_bv)
        + (alpha + 0.5) * torch.log(v * (y - gamma) ** 2 + two_bv)
        + torch.lgamma(alpha)
        - torch.lgamma(alpha + 0.5)
    )

    # 2. Evidence Regularizer LR
    # LR = |y - γ| * (2v + α) --> Equation (9) on the paper

    evidence = 2 * v + alpha
    LR = torch.abs(y - gamma) * evidence

    # 3. Combined Loss (the one we need)

    loss = LNLL + lam * LR

    return loss.mean()

### 6. Comparison with Sensoy et al. (2018)

The Sensoy et al. paper introduces evidential learning for classification using a Dirichlet distribution and a KL-based evidence regularized. Amini et al. extends the evidential idea to regression, using Normal-Inverse-Gamma distribution and a new evidence penalty to capture both aleatoric and epistemic uncertainty. 

Together, these two papers form the foundation of modern evidential deep learning for both classification and regression. 


### 7. Implementation Example (PyTorch)

Below is a minimal PyTorch example showing how the Deep Evidential Regression loss can be used to train a simple model on a toy regression dataset.

In [None]:
# tiny toy dataset
import torch.nn.functional as F

x = torch.linspace(-3, 3, 200).unsqueeze(1)
y = x**3 + 0.3 * torch.randn_like(x)


# minimal model
class EvidentialNet(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc = nn.Linear(1, 4)

    def forward(self, x):
        out = self.fc(x)
        gamma = out[:, 0:1]
        v = F.softplus(out[:, 1:2])
        alpha = F.softplus(out[:, 2:3]) + 1
        beta = F.softplus(out[:, 3:4])
        return gamma, v, alpha, beta


model = EvidentialNet()
opt = torch.optim.Adam(model.parameters(), lr=1e-3)

# short training loop
for _ in range(1500):
    opt.zero_grad()
    gamma, v, alpha, beta = model(x)
    loss = evidential_loss(y, gamma, v, alpha, beta, lam=1e-2)
    loss.backward()
    opt.step()

# visualize result
plt.scatter(x, y, s=10)
plt.plot(x, model(x)[0].detach(), color="red")
plt.show()

### 8. Project Insights 

Amini et al. (2020) provide the theoretical and practical foundation for evidential regression in probly. Their NIG-based uncertainty model, loss function, and evidence regularizer are exactly what our project needs to use to implement fast, sampling-free uncertainty estimation for continuous outputs.

### 9. Summary

Deep Evidential Regression gives neural networks a way to predict both a value and how certain they are about it. By predicting the parameters of a Normal-Inverse-Gamma distribution, the model learns aleatoric and epistemic uncertainty in a single forward pass. The combination of the Student-T likelihood and the evidence regularizer ensures that the model becomes confident only when it should. Overall, this method provides a simple and efficient way to add uncertainty estimation to regression models without relying on sampling or ensembles.