#### Loss functions

Code loss functions related with neural network for different tasks

Some reference: https://neptune.ai/blog/pytorch-loss-functions



### Negative Likelihood Estimation


a good intro can be found [link](https://dasha.ai/en-us/blog/log-loss-function). In the multi-class context, the negative log loss is defined as

$$
\text{Log Loss} = -\frac{1}{q} \sum_{i=1}^q \sum_{j=1}^l y_{ij} \log(a_{ij})
$$

Where:
- $q$ is the total number of samples
- $l$ is the total number of classes
- $a_{ij}$ is the probability assigned by the algorithm for the $i$-th sample belonging to the $j$-th class
- $y_{ij}$ is 1 if the $i$-th sample belongs to the $j$-th class, and 0 otherwise

This formulation is equivalent to multiplying the probability distribution with a one-hot encoded matrix, where:
- Number of rows = Number of samples
- Number of columns = Number of classes

In [15]:
# log likelihood estimation function
import torch
from torch import nn

# input prob should be 2*2
input_prob = torch.tensor([[0.1, 0.9], [0.2, 0.8]])
target = torch.tensor([0, 1])
loss = nn.NLLLoss()
output = loss(torch.log(input_prob), target)
print(output)


# calculate using numpy
import numpy as np
np_input_prob = input_prob.numpy()
loss_np_sum = -np.log(np_input_prob[0, 0]) - np.log(np_input_prob[1, 1])
print(loss_np_sum/2)



tensor(1.2629)
1.262864351272583


### Cross Entropy Loss

Cross Entropy loss is closely related to minimizing negative log likelihood. It's a fundamental concept in information theory and machine learning, particularly useful for classification tasks.

The Cross Entropy between two probability distributions $p$ and $q$ is defined as:

$$
H(p,q) = -\sum_x p(x) \log q(x)
$$

Where:
- $p(x)$ is the ground truth probability distribution (typically a one-hot encoded representation of labels)
- $q(x)$ is the predicted probability distribution from the model

In the context of machine learning:
- $p$ represents the true label distribution (often a one-hot vector for classification tasks)
- $q$ represents the model's predicted probabilities

Key points:
1. Cross Entropy measures the dissimilarity between two probability distributions.
2. Minimizing Cross Entropy is equivalent to maximizing the likelihood of the true labels under the model's predictions.
3. for multi-class prediction, min cross entropy = max negative log likelihood

In [29]:
import torch
from torch import nn
import numpy as np

# PyTorch implementation
loss_fn = nn.CrossEntropyLoss()
# Input should be raw scores (logits), not probabilities
input_logits = torch.tensor([[0.1, 0.9], [0.2, 0.8]])
target = torch.tensor([0, 1])
output = loss_fn(input_logits, target)
print("PyTorch CrossEntropyLoss:", output.item())


# NumPy implementation
np_input_prob = torch.softmax(input_logits, dim=1).numpy()
np_target = target.numpy()
selected_probs = []

# get one hot encoded array * probability
for i in range(len(np_target)):
    selected_probs.append(np_input_prob[i, np_target[i]])
selected_probs = np.array(selected_probs)


loss_np = -np.log(selected_probs).mean()
print("NumPy Cross-Entropy:", loss_np)


# For comparison, let's also use NLLLoss with log probabilities
log_probs = torch.log_softmax(input_logits, dim=1)
nll_loss = nn.NLLLoss()
nll_output = nll_loss(log_probs, target)
print("PyTorch NLLLoss with log_softmax:", nll_output.item())

PyTorch CrossEntropyLoss: 0.8042942881584167
NumPy Cross-Entropy: 0.80429435
PyTorch NLLLoss with log_softmax: 0.8042942881584167


In [21]:
np_input_prob[np.arange(len(np_target)), np_target]

array([0.3100255, 0.6456563], dtype=float32)

In [24]:
np_input_prob[np.arange(len(np_target)), np_target]

array([0.3100255, 0.6456563], dtype=float32)

In [28]:
np_target

array([0, 1])

#### Dice score

* This is especially helpful for image segmentation

In [1]:
import torch

#### KL divergence

* KL quantify how much one probability distribution differs from another probability distribution

Applications:
* loss func in VAE
* InfoGAN


$KL (p||q) = \sum_x {  p(x) log(\frac{p(x)}{q(x)})  }$

In [None]:
#%%
# KL ( p||Q )  =  \sum { P(x)log(P(x))  /  Q(x) }

events = ['red', 'green', 'blue']
p = [0.10, 0.40, 0.50]
q = [0.80, 0.15, 0.05]

# calculate the kl divergence
from math import log2

def kl_divergence(p, q):
	return sum(p[i] * log2(p[i]/q[i]) for i in range(len(p)))


a = sum(p[i] * log2(p[i]/q[i]) for i in range(len(p)))
print(a)