# Losses

- Regression losses
- Classification losses

## Environment setup

In [1]:
import platform

print(f"Python version: {platform.python_version()}")
assert platform.python_version_tuple() >= ("3", "6")

import numpy as np

Python version: 3.7.5


In [2]:
import sklearn

print(f"scikit-learn version: {sklearn.__version__}")
assert sklearn.__version__ >= "0.22"  # For plotting API

from sklearn.metrics import log_loss

scikit-learn version: 0.22.1


## Regression losses

### Mean Absolute Error

Aka *l1 or Manhattan norm*.

$$\mathrm{MAE}(\boldsymbol{\pmb{\theta}}) = \frac{1}{m}\sum_{i=1}^m |\mathcal{h}_\theta(\mathbf{x}^{(i)}) - y^{(i)}| = \frac{1}{m}{\lVert{h_\theta(\pmb{X}) - \pmb{y}}\rVert}_1$$

### Mean Squared Error

Most sensible to outliers.

$$\mathrm{MSE}(\boldsymbol{\pmb{\theta}}) = \frac{1}{m}\sum_{i=1}^m (\mathcal{h}_\theta(\mathbf{x}^{(i)}) - y^{(i)})^2 = \frac{1}{m}{{\lVert{h_\theta(\pmb{X}) - \pmb{y}}\rVert}_2}^2$$

### Root Mean Squared Error

Aka *l2 or Euclidean norm*). The default choice.

$$\mathrm{RMSE}(\boldsymbol{\pmb{\theta}}) = \sqrt{\frac{1}{m}\sum_{i=1}^m (\mathcal{h}_\theta(\mathbf{x}^{(i)}) - y^{(i)})^2} = \frac{1}{m}{\lVert{h_\theta(\pmb{X}) - \pmb{y}}\rVert}_2$$

## Classification losses

### Binary Crossentropy

Aka *logistic loss* or *negative log likelyhood*. 

$$\mathcal{L}(\boldsymbol{\pmb{\theta}}) = -\frac{1}{m}\sum_{i=1}^m \left(y^{(i)} \log_e(y'^{(i)}) + (1-y^{(i)}) \log_e(1-y'^{(i)})\right)$$

In [5]:
# Define expected results
y_true = [0, 0, 1, 1]

# [.9, .1] means 90% probability that the first sample has label 0: prediction is 0.1
y_pred = [[0.9, 0.1], [0.8, 0.2], [0.3, 0.7], [0.01, 0.99]]
# -(ln(0.9) + ln(0.8) + ln(0.7) + ln(0.99))/4
print(log_loss(y_true, y_pred))

# Perfect prediction
y_pred = [[1, 0], [1, 0], [0, 1], [0, 1]]
print(log_loss(y_true, y_pred))

# Awful prediction
y_pred = [[0.1, 0.9], [0.15, 0.85], [0.83, 0.17], [0.95, 0.05]]
print(log_loss(y_true, y_pred))

0.1738073366910675
9.992007221626415e-16
2.241848548341448


### Categorical Crossentropy

$$\mathcal{L}(\boldsymbol{\pmb{\theta}}) = -\frac{1}{m}\sum_{i=1}^m\sum_{k=1}^K y^{(i)}_k \log_e(y'^{(i)}_k)$$

Equivalent to _Binary Crossentropy_ when $K = 2$.