<a href="https://colab.research.google.com/github/bacdam91/mxnet-tutorial/blob/master/gluon/loss/Loss_Function.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Loss Function

In [33]:
!pip install mxnet



In [0]:
from mxnet.gluon import nn, loss as gloss
from mxnet import nd, autograd
from matplotlib import pyplot as plt
import mxnet as mx

## Loss functions for regression

The two common loss functions for regression tasks include:
1. $l_1$ loss, and
2. $l_2$ loss

### The $l_1$ loss function

The $l_1$ loss function calculates the mean absolute error between $label$ and $pred$.

> $L = \sum_{i}|label_{i} - pred_{i}|$

When using the ```L1Loss``` in MXNet, we will need to pass in two parameters, ```pred``` and ```label```, where both parameters have the same size. The output of this loss function is a loss tensor with shape of ```batch_size```, Dimensions other than ```batch_axis``` are averaged out.

In [0]:
l1loss = gloss.L1Loss()

In [0]:
import numpy as np
batch_size = 3

In [37]:
preds_array = [[1, 3, 2, 3], 
               [3, 1, 1, 3],
               [2, 3, 1, 2]]

labels_array = [[4, 4, 4, 4],
                [4, 4, 4, 4],
                [4, 4, 4, 4]]

preds = nd.array(preds_array)
labels = nd.array(labels_array)
print("Predictions:", preds)
print("Labels:", labels)

Predictions: 
[[1. 3. 2. 3.]
 [3. 1. 1. 3.]
 [2. 3. 1. 2.]]
<NDArray 3x4 @cpu(0)>
Labels: 
[[4. 4. 4. 4.]
 [4. 4. 4. 4.]
 [4. 4. 4. 4.]]
<NDArray 3x4 @cpu(0)>


In [38]:
output = l1loss(preds, labels)
output


[1.75 2.   2.  ]
<NDArray 3 @cpu(0)>

### Manual working out of $l_1$ loss function

Let's manually workout the example from above so that we can have a solid understanding of the $l_1$ loss function and how ```mxnet``` process the input and generate the output.

Let's first have a look at the first row of the predictions and work out its $l_1$ loss. 

#### Row #1

<table>
    <tr>
        <td><strong>Preds</strong></td>
        <td>1</td>
        <td>3</td>
        <td>2</td>
        <td>1</td>
    </tr>
    <tr>
        <td><strong>Labels</strong></td>
        <td>4</td>
        <td>4</td>
        <td>4</td>
        <td>4</td>
        <td><strong>Total</strong></td>
        <td><strong>Average</strong></td>
    </tr>
    <tr>
        <td><strong>|Preds - Labels|<strong></td>
        <td>3</td>
        <td>1</td>
        <td>2</td>
        <td>1</td>
        <td>7</td>
        <td>1.75</td>
    </tr>
</table>

All of our labels have values of 4. For each column in the table above, we will find the absolute difference between the our predictions and the truth labels. After that, we will find the sum of the absolute difference, which is 7, and divide by 4 (the number of elements per row), which is 1.75 and equals to that produced by the function.

We can do the same working out for the remaining two rows.

#### Row #2

<table>
    <tr>
        <td><strong>Preds</strong></td>
        <td>3</td>
        <td>1</td>
        <td>1</td>
        <td>3</td>
    </tr>
    <tr>
        <td><strong>Labels</strong></td>
        <td>4</td>
        <td>4</td>
        <td>4</td>
        <td>4</td>
        <td><strong>Total</strong></td>
        <td><strong>Average</strong></td>
    </tr>
    <tr>
        <td><strong>|Preds - Labels|<strong></td>
        <td>1</td>
        <td>3</td>
        <td>3</td>
        <td>1</td>
        <td>8</td>
        <td>2</td>
    </tr>
</table>

#### Row #3:

<table>
    <tr>
        <td><strong>Preds</strong></td>
        <td>2</td>
        <td>3</td>
        <td>1</td>
        <td>2</td>
    </tr>
    <tr>
        <td><strong>Labels</strong></td>
        <td>4</td>
        <td>4</td>
        <td>4</td>
        <td>4</td>
        <td><strong>Total</strong></td>
        <td><strong>Average</strong></td>
    </tr>
    <tr>
        <td><strong>|Preds - Labels|<strong></td>
        <td>2</td>
        <td>1</td>
        <td>3</td>
        <td>2</td>
        <td>8</td>
        <td>2</td>
    </tr>
</table>


### The $l_2$ loss function

The $l_2$ loss function calculates the mean squared error between $label$ and $pred$.

> $L = \frac{1}{2}\sum_{i}|label_{i} - pred_{i}|^{2}$

When using the ```L2Loss``` in MXNet, we will need to pass in two parameters, ```pred``` and ```label```, where both parameters have the same size. The output of this loss function is a loss tensor with shape of ```batch_size```, Dimensions other than ```batch_axis``` are averaged out.

In [0]:
l2loss = gloss.L2Loss()

In [8]:
output = l2loss(preds, labels)
output


[1.875 2.5   2.25 ]
<NDArray 3 @cpu(0)>

### Manual working out of $l_2$ loss function

Again, let's manually workout the example from above so that we can have a solid understanding of the $l_2$ loss function and how ```mxnet``` process the input and generate the output.

Let's first have a look at the first row of the predictions and work out its $l_2$ loss. 

#### Row #1

<table>
    <tr>
        <td><strong>Preds</strong></td>
        <td>1</td>
        <td>3</td>
        <td>2</td>
        <td>1</td>
    </tr>
    <tr>
        <td><strong>Labels</strong></td>
        <td>4</td>
        <td>4</td>
        <td>4</td>
        <td>4</td>
        <td><strong>Total</strong></td>
        <td><strong>Average</strong></td>
    </tr>
    <tr>
        <td><strong>|Preds - Labels|<sup>2</sup><strong></td>
        <td>9</td>
        <td>1</td>
        <td>4</td>
        <td>1</td>
        <td>15</td>
        <td>1.875</td>
    </tr>
</table>

We are using the same ```preds``` and ```labels``` as with $l_1$, so all of our labels have values of 4. For each column in the table above, we will find the square of the absolute difference between the our predictions and the truth labels. After that, we will find the sum of the absolute difference, which is 15, and divide by 2 (as per equation) then divide again by 4 (the number of elements per row), which is 1.875 and equals to that produced by the function.

We can do the same working out for the remaining two rows.

#### Row #2

<table>
    <tr>
        <td><strong>Preds</strong></td>
        <td>3</td>
        <td>1</td>
        <td>1</td>
        <td>3</td>
    </tr>
    <tr>
        <td><strong>Labels</strong></td>
        <td>4</td>
        <td>4</td>
        <td>4</td>
        <td>4</td>
        <td><strong>Total</strong></td>
        <td><strong>Average</strong></td>
    </tr>
    <tr>
        <td><strong>|Preds - Labels|<sup>2</sup><strong></td>
        <td>1</td>
        <td>9</td>
        <td>9</td>
        <td>1</td>
        <td>10</td>
        <td>2.5</td>
    </tr>
</table>

#### Row #3:

<table>
    <tr>
        <td><strong>Preds</strong></td>
        <td>2</td>
        <td>3</td>
        <td>1</td>
        <td>2</td>
    </tr>
    <tr>
        <td><strong>Labels</strong></td>
        <td>4</td>
        <td>4</td>
        <td>4</td>
        <td>4</td>
        <td><strong>Total</strong></td>
        <td><strong>Average</strong></td>
    </tr>
    <tr>
        <td><strong>|Preds - Labels|<sup>2</sup><strong></td>
        <td>4</td>
        <td>1</td>
        <td>9</td>
        <td>4</td>
        <td>18</td>
        <td>2.25</td>
    </tr>
</table>
