# Loss Functions Part 2

> "In this part of the multi-part series on the loss functions we'll be taking a. look at MSE, MAE, Hinge Loss, Triplet Loss and Huber Loss. We'll also look at the code for these Loss functions in PyTorch and some examples of how to use them"

- toc: true
- branch: fastbook/lessons
- badges: true
- comments: true
- image: images/triplet_loss.png
- categories: [loss_functions]
- hide: false
- author: Akash Mehra

In [17]:
#collapse-hide
import torch
from torch import nn
from torch.autograd import Variable
import numpy as np
precision=4
np.set_printoptions(precision=precision)

In this post, I'd like to ensure that we're able to code the loss classes ourselves as well, that means that we'll need to be able to:

1. Translate the equations to Python code for `forward` pass.
2. Know how to calculate the gradients for a few loss functions, so that when we call `backward` pass the gradients get accumulated.

Note that in presence of `autodiff` or `autograd` we won't need to do such a thing and this is purely for learning perspective, better implementations are offered by Deep Learning Libraries like `PyTorch`, `Tensorflow` etc.

So, we'll define a class called `Module` that we'll use and override its `forward` and `backward` *abstract* functions 

In [16]:
from abc import ABC, abstractmethod

class Module(ABC):
    def __init__(self, reduction='mean'):
        self.grad_input = None
        self.reduction = reduction

    def __call__(self, input, target):
        return self.forward(input, target)
    
    @abstractmethod
    def forward(*args, **kwargs):
        pass
    
    @abstractmethod
    def backward(*args, **kwargs):
        pass
    
    def update_grad_input(self, grad):
        if self.grad_input is not None:
            self.grad_input += grad
        else:
            self.grad_input = grad
    
    def zero_grad(self):
        if self.grad_input is not None:
            self.grad_input = np.zeros_like(self.grad_input)

After having setup our *base* class, we can now look at individual Loss functions and start coding them.

## Mean Squared Error

Mean Squared Error (MSE) measures the average of the squares of the errors for an estimator. The MSE of an estimator $\hat{\theta}$​ wrt an unknown parameter $\theta$​ is defined as $MSE(\hat{\theta})  =E[(\hat{\theta} - \theta)^2]$​​​. It can also be written as the sum of  *Variance* and the squared *Bias* of the estimator. In case of an *unbiased* estimator the *MSE* and *variance* are equivalent. Below is a quick way to see how the MSE and the Variance and Bias are related to each other.

$$
E[X] = \mu
$$

$$
Var(X) = E[(X - \mu)^2] = E[X^2] + \mu^2 - 2\mu E[X] \implies E[X^2] - E[X]^2
$$

using the following result:

$$
Var(X-Y) = Var(X) + Var(Y) - 2Cov(X,Y)
$$

$$
Var(\mu) = 0
$$

$$
Cov(X,\mu) = E[X-E[X]]E[\mu-E[\mu]] = 0 \implies Var(X-\mu) = Var(X)
$$

We arrive at:

$$
MSE(\hat{\theta}) = E[(\hat{\theta} - \theta)^2] \implies Var(\hat{\theta} - \theta) + E[\hat{\theta} - \theta]^2
$$

$$
MSE(\hat{\theta}) =  Var[\hat{\theta}] + Bias^2[\hat{\theta}]
$$


We can see from above that as *Variance,* *MSE* also heavily weights the outliers. It's squaring each term which weights large error terms more heavily than smaller ones. An alternative in that becomes *Mean Absolute Error* which is the topic of discussion of next section.

### MSE Loss in numpy

We're using our implementation of the `Module` class. We'll define `forward` and `backward` functions for *MSE* Loss

In [18]:
class MSELoss(Module):
    def __init__(self, reduction='mean'):
        super(MSELoss, self).__init__(reduction)
    
    def forward(self, input, target):
        red = getattr(np, self.reduction)
        return np.asarray([red((target-input)**2),])
    
    def backward(self, input, target):
        N = np.multiply.reduce(input.shape)
        grad = (2/N * (input-target))
        self.update_grad_input(grad)

In [4]:
reduction = 'mean'
SEED=42
rs = np.random.RandomState(SEED)
input = rs.randn(3,2,4,5)
target = rs.randn(3,2,4,5)

In [5]:
mse = MSELoss(reduction)
loss = mse(input, target)
loss

array([1.7061])

In [6]:
mse.backward(input, target)

Let's test our implementation against *PyTorch* and see if we've implemented things correctly or not.

### MSE Loss in PyTorch

Using the same input and target `numpy.ndarray` as `torch.tensor`

In [19]:
criterion = nn.MSELoss(reduction=reduction)
inp = Variable(torch.from_numpy(input), requires_grad=True)
loss = criterion(inp, torch.from_numpy(target))
loss.backward()

In [20]:
np.allclose(mse.grad_input, inp.grad.numpy())

True

## Mean Absolute Error Loss (MAE)

As pointed out earlier the *MSE* Loss suffers in the presence of outliers and heavily weights them. *MAE* on the other hand is more robust in that scenario. It is defined as the average of the absolute differences between the predictions and the target values.

$$
MAE(\hat{\theta}) = (E[|\hat{\theta} - \theta|])
$$

It is also knows as the *L1Loss* as it is measuring the *L1* distance between two *vectors*

### MAE/L1 Loss in numpy

We'll use our implementation of the Module here as well and define the `forward` and `backward` function for `L1Loss`

In [9]:
class L1Loss(Module):
    def __init__(self, reduction='mean'):
        super(L1Loss, self).__init__(reduction)
    
    def forward(self, input, target):
        red = getattr(np, self.reduction)
        return np.asarray([red((np.abs(target-input))),])
    
    def backward(self, input, target):
        N = np.multiply.reduce(input.shape)
        diff = input - target
        mask_lg = (diff > 0) * 1.0
        mask_sm = (diff < 0) * -1.0
        mask_zero = (diff == 0 ) * 0.0
        grad = 1/N * (mask_lg + mask_sm + mask_zero)
        self.update_grad_input(grad)

In [21]:
reduction = 'mean'
SEED=42
rs = np.random.RandomState(SEED)
input = rs.randn(3,2,4,5)
target = rs.randn(3,2,4,5)

In [22]:
l1loss = L1Loss(reduction)
l1loss(input, target)

array([1.0519])

In [23]:
l1loss.backward(input, target)

As before, let's test our implementation against *PyTorch* and see if we've implemented things correctly or not.

### MAE/L1 Loss in PyTorch

In [24]:
criterion = nn.L1Loss(reduction='mean')
inp = Variable(torch.from_numpy(input), requires_grad=True)
loss = criterion(inp, torch.from_numpy(target))
loss

tensor(1.0519, dtype=torch.float64, grad_fn=<L1LossBackward>)

In [25]:
loss.backward()

In [26]:
np.allclose(l1loss.grad_input, inp.grad)

True