[Home](../index.html) > [PyTorch](index.html) > Loss Functions

In [1]:
import numpy as np

In [2]:
import torch
import torch.nn as nn
import torch.nn.functional as F

## L1 Loss
[Doc](https://pytorch.org/docs/stable/nn.html?highlight=nllloss#torch.nn.L1Loss)
Given two vectors $\hat{\mathbf{y}}$ and $\mathbf{y}$, this function calculates the loss as $\vert \hat{\mathbf{y}} - \mathbf{y} \vert$

I can specify if I want to average the losses or sum them up.

In [4]:
loss_fn = nn.L1Loss(reduction='mean')
y_hat = torch.tensor([1., 2., 3.])
y = torch.tensor([1.2, 2.2, 3.2])
loss = loss_fn(y_hat, y)
print(loss)

tensor(0.2000)


## MSE Loss
[Doc](https://pytorch.org/docs/stable/nn.html?highlight=nllloss#torch.nn.MSELoss). Given two vectors $\hat{\mathbf{y}}$ and $\mathbf{y}$, this function calcualtes the loss as $(\hat{\mathbf{y}} - \mathbf{y})^2$.

I can specify if want to average the losses.

In [5]:
loss_fn = nn.MSELoss(reduction='elementwise_mean')
y_hat = torch.tensor([1., 2., 3.])
y = torch.tensor([1.2, 2.2, 3.2])
loss = loss_fn(y_hat, y)
print(loss)

tensor(0.0400)


### NLLLoss
[Doc](). The Negative Log Likelihood Loss. The inputs to the loss function are -

  * A matrix of the log probabilities of size m x c where each element is $l_k^{(i)} = log(p_k^{(i)})$ 
  * The target class
  
The target class is **not** encoded as a one-hot vector. It is supposed to be a vector of size m where each element $y^{(i)}$ is an integer from 0 to c-1. This function does not care how the probabilities (and therefore the log probabilities) were calculated. These would be model dependant.

This function will then do the following for each instance (dropping the superscript (i) for clarity)) -

  1. Create a one-hot-encoded vector from the target value.
  2. Calculate the negative log-likelihood $-\mathcal L = -\sum_{k=1}^c y_k l_k$. 

And finally, take the average across the entire mini-batch.
  
The first two steps are simply selecting the log probability of the target class for each example.  

In [6]:
# Minibatch size m = 2
# Number of classes c = 3
l = torch.tensor([[1., 2., 3.],
                  [4., 5., 6.]])
y = torch.tensor([0, 1])

# This will select 1 from the first row of l and 5 from its second row, negate the values,
# and return the average of these two numbers = -3
loss_fn = nn.NLLLoss(reduction='elementwise_mean')
loss_fn(l, y)

tensor(-3.)

## CrossEntropyLoss
[Doc](https://pytorch.org/docs/stable/nn.html?highlight=nllloss#torch.nn.CrossEntropyLoss). The inputs to the loss function are -

  * A matrix of the logits of size m x c where each element is $h_k^{(i)}$
  * The target class
  
The target class is **not** encoded as a one-hot vector. It is supposed to be a vector of size M where each element $y^{(i)}$ is an integer from 0 to c-1. 

This function will then do the following for each instance (dropping the superscript (i) for clarity) -

  1. Create a one-hot encoded vector based on the target class.
  2. Calculate the softmax probabilities for each class.
$$
p_k = \frac {e^{h_k}}{\sum_{j=1}^c e^{h_j}}
$$
  3. Calculate the negative log likelihood $-\mathcal L = -\sum_{k=1}^c y_k log(p_k)$.
  
And finally take the average across the entire mini-batch.

This function is a bit different from other loss functions in which it will calculate the softmax probabilities. Most other loss functions just need the probabilities, they don't really care how these probabilities were calculated.

In [7]:
# Minibatch size m = 2
# Number of classes k = 3
h = torch.tensor([[1., 2., 3.],
                  [4., 5., 6.]])
y = torch.tensor([0, 1])

loss_fn = nn.CrossEntropyLoss(reduction='elementwise_mean')
loss_fn(h, y)

tensor(1.9076)

In [8]:
# Lets do this by hand
p = F.softmax(h, dim=1)
l = torch.log(p)
print(p)
print(l)

# Log likelihood of the first row is the 0th element because y[0] = 0
print(l[0, 0])

# Log likelihood of the second row is the 1st element because y[1] = 1
print(l[1, 1])

# The final loss is the average of the negative log likelihoods
loss = ((-l[0, 0]) + (-l[1, 1])) / 2
print(loss)

tensor([[0.0900, 0.2447, 0.6652],
        [0.0900, 0.2447, 0.6652]])
tensor([[-2.4076, -1.4076, -0.4076],
        [-2.4076, -1.4076, -0.4076]])
tensor(-2.4076)
tensor(-1.4076)
tensor(1.9076)


## Binary Cross-Entropy
Can be used for both binary classification and multi-class classification. There are two modules for this, the `BCELoss` and `BCEWithLogitsLoss`.

### Binary Classification with `BCELoss`

[Doc](https://pytorch.org/docs/stable/nn.html#torch.nn.BCELoss). The inputs to the loss function are -

  * A vector of probabilities of length m where each element is $p^{(i)}$
  * The target class

Mathematically speaking it makese sense to have the target class be either $0$ or $1$, but this function does not really care whether this is true or not. The function does not care how the probabilities were calculated. That is model dependant.

This function will calculate the negative log likelihood of each instance as $-\mathcal L^{(i)} = -\left[y^{(i)}log(p^{(i)}) + (1-y^{(i)})log(1-p^{(i)}) \right]$ and then aggregate across the mini-batch.

In [9]:
y = torch.tensor([1., 0., 1.])
p = torch.tensor([0.9, 0.7, 0.8])
loss_fn = nn.BCELoss(reduction='elementwise_mean')
loss = loss_fn(p, y)
print(loss)

# By hand
l = [None, None, None]
l[0] = np.log(0.9)
l[1] = np.log(0.3)
l[2] = np.log(0.8)
loss = - sum(l)/3
print(loss)

tensor(0.5108)
0.5108256237659907


### Multi-label Classification with `BCELoss`
The inputs to the loss function are -

  * A matrix of probabilities of size m x c, where each element $p_k^{(i)}$ is the probability of the $i$th instance having the $k$th label.
  * A matrix of target classes of size m x c, where each element $y_k^{(i)}$ is $1$ if the $i$th instance has the $k$th label and $0$ otherwise.
 
Mathematically speaking it makese sense for the output matrix to be comprised only of $0$ and $1$s, but this function does not really care about that. Further it does not care how the probabilities were calculated. That is model dependant.

This function will calculate the negative log likelihood of each instance as (dropping the superscript i)-

$$
- \mathcal L = - \frac1c \sum_{k=1}^c \left[y_k\;log(p_k) + (1-y_k)\;log(1-p_k) \right]
$$

And finally take the average across the mini-batch.

In [10]:
y = torch.tensor([[1., 1., 0.],
                  [0., 1., 1.]])
p = torch.tensor([[0.9, 0.8, 0.2],
                  [0.7, 0.8, 0.6]])
loss_fn = nn.BCELoss(reduction='elementwise_mean')
loss = loss_fn(p, y)
print(loss)

# By hand
l = [None, None]
l[0] = (np.log(0.9) + np.log(0.8) + np.log(0.8))/3
l[1] = (np.log(0.3) + np.log(0.8) + np.log(0.6))/3
loss = - sum(l) / 2
print(loss)

tensor(0.4149)
0.41493159961539705


### Binary Classification with `BCEWithLogitsLoss`

[Doc](https://pytorch.org/docs/stable/generated/torch.nn.BCEWithLogitsLoss.html). Inputs to the loss function are - 
  * A vector of affines with length $m$ where each element is $h^{(i)}$.
  * The target class

This function applies the sigmoid function to the affines and then calculates the familiar BCE loss as follows:

$$
p = \frac{1}{1 + e^{-h}} \\
- \mathcal L = - \left[y\;log p + (1-y)\;log (1-p) \right]
$$

And then aggregate across the minibatch. The loss function can be initialized to use a weightage for the positive examples. Lets say it is set to $3$, then each positive example behaves as if there were $3$ positive examples. This is not shown in the demo below.

In [38]:
y = torch.tensor([1., 0., 1.])
h = torch.Tensor([2.21, 0.85, 1.4])
loss_fn = torch.nn.BCEWithLogitsLoss()
loss = loss_fn(h, y)
print(loss)

# By hand
p = torch.sigmoid(h)
print(p)
loss = torch.nn.BCELoss()(p, y)
print(loss)

tensor(0.5101)
tensor([0.9011, 0.7006, 0.8022])
tensor(0.5101)


## KL Divergence Loss

In [None]:
y = torch.tensor([])