# What Are Loss Functions?
In neural networks, loss functions help optimize the performance of the model. They are usually used to measure some penalty that the model incurs on its predictions, such as the deviation of the prediction away from the ground truth label. Loss functions are usually differentiable across their domain (but it is allowed that the gradient is undefined only for very specific points, such as x = 0, which is basically ignored in practice). In the training loop, they are differentiated with respect to parameters, and these gradients are used for your backpropagation and gradient descent steps to optimize your model on the training set.

Loss functions are also slightly different from metrics. While loss functions can tell you the performance of our model, they might not be of direct interest or easily explainable by humans. This is where metrics come in. Metrics such as accuracy are much more useful for humans to understand the performance of a neural network even though they might not be good choices for loss functions since they might not be differentiable.

# Types 

1. Probabilistic losses

- BinaryCrossentropy class (Log Loss)
- CategoricalCrossentropy class
- SparseCategoricalCrossentropy class
- Poisson class
- KLDivergence class
- kl_divergence function 

2.  Regression losses

- MeanSquaredError class
- MeanAbsoluteError class
- MeanAbsolutePercentageError class
- MeanSquaredLogarithmicError class
- CosineSimilarity class
- Huber class
- LogCosh class

3. Hinge losses for "maximum-margin" classification

-  Hinge class
-  SquaredHinge class
-  CategoricalHinge class


In [2]:
# Input Labels
import tensorflow as tf
import numpy as np

# Regression Losses

1. Means Squared Error (MSE):

MSE tells, how close a regression line from predicted points. And this is done simply by taking distance from point to the regression line and squaring them. The squaring is a must so it’ll remove the negative sign problem.

In [19]:

y_true = [[10., 10.],
          [0., 0.]]
# Predicted Labels
y_pred = [[10., 10.], 
          [1., 0.]]
#Mean Sqaured Error Loss
mse = tf.keras.losses.MeanSquaredError()
mse(y_true, y_pred).numpy()

0.25

In [15]:
# implement with scratch

In [20]:
def mae(y_predicted, y_true):
    total_error = 0
    for yp, yt in zip(y_predicted, y_true):
        total_error += abs(yp - yt)
    mae = total_error/len(y_predicted)
    return mae 

def mae_np(y_predicted, y_true):
    y_pred = np.array(y_pred)
    y_true = np.array(y_true)

    return np.mean(np.abs(y_predicted-y_true))


mae_np(y_pred, y_true) , mae(y_pred , y_true)

(0.25, array([0.5, 0. ]))

2. Mean Absolute Error:

MAE simply calculated by taking distance from point to the regression line. The MAE is more sensitive to outliers. So before using MAE confirm that data doesn’t contain outliers.



In [3]:

# Input Labels
y_true = [[10., 20.],
          [30., 40.]]
# Predicted Labels
y_pred = [[10., 20.], 
          [30., 0.]]

mae = tf.keras.losses.MeanAbsoluteError()
mae(y_true, y_pred).numpy()

10.0

3. Cosine Similarity Loss:

Cosine similarity is a measure of similarity between two non-zero vectors. This loss function calculates the cosine similarity between labels and predictions.

It’s just a number between 1 and -1
when it’s a negative number between -1 and 0 then, 0 indicates orthogonality, and values closer to -1 show greater similarity.


In [24]:

# Input Labels
y_true = [[10., 20.],
          [30., 40.]]
# Predicted Labels
y_pred = [[10., 20.], 
          [30., 40.]]

cosine_loss = tf.keras.losses.CosineSimilarity(axis=1)
cosine_loss(y_true, y_pred).numpy()

1.0

4. Huber Loss:

The Huber loss function is quadratic for small values and linear for larger values,

* For each value of X the error = y_true-y_pred

* Loss = 0.5 * X^2                                   if |X| <= d
* Loss = 0.5 * d^2 + d (|X| - d)               if |X| > d

In [27]:
# Input Labels
y_true = [[10., 20.],
          [30., 40.]]
# Predicted Labels
y_pred = [[-10., 20.], 
          [30., 0.]]

hub_loss = tf.keras.losses.Huber()
hub_loss(y_true, y_pred).numpy()

14.75

6. LogCosh Loss:

The LogCosh loss computes the log of the hyperbolic cosine of the prediction error.

In [31]:

y_true = [[0., 1.], [0., 1.]] 
y_pred = [[0., 1.], [1., 1.]] 
logcosh_loss = tf.keras.losses.LogCosh() 
logcosh_loss(y_true, y_pred).numpy()


0.1084452

# Probabilistic Loss Functions:


1. Binary Cross-Entropy Loss:

Binary cross-entropy is used to compute the cross-entropy between the true labels and predicted outputs. It’s used when two-class problems arise like cat and dog classification [1 or 0].

In [3]:
y_true = [[0.,1.], [0.,0.]] 
y_pred= [[0.5,0.4], [0.6,0.3]] 
binary_cross_entropy = tf.keras.losses.BinaryCrossentropy()
binary_cross_entropy(y_true=y_true,y_pred=y_pred).numpy()


0.7206007

In [7]:
# custom building
# loss = -mean( yt * log(yp) + (1-yt)*log(1-yp)  )
# as log(0) = -inf 
# log(1-1) = -inf 
# therefor we convert the 0 and 1 to its nearest number btw 0 & 1

def log_loss(y_true, y_predicted):
    epsilon = 1e-15
    y_predicted_new = [max(i,epsilon) if i==0 else min(i,1-epsilon) for i in y_predicted] # 0 => 0.00000000000000001
    # y_predicted_new = [min(i,1-epsilon) for i in y_predicted_new] # 1 => 0.99999999999999999
    
    y_predicted_new = np.array(y_predicted_new)
    y_true = np.array(y_true)
    
    return -np.mean(y_true*np.log(y_predicted_new)+(1-y_true)*np.log(1-y_predicted_new))

y_true = [0,1,0,0]
y_pred = [0.5,0.4,0.6,0.3]

log_loss(y_true, y_pred)


0.7206008970617469

2. Categorical Crossentropy Loss:

The Categorical crossentropy loss function is used to compute loss between true labels and predicted labels.
It’s mainly used for multiclass classifi

In [8]:
y_true = [[0, 1, 0],
          [0, 0, 1]]
y_pred = [[0.05, 0.95, 0.56], 
          [0.1, 0.4, 0.1]]

categorical_cross_entropy = tf.keras.losses.CategoricalCrossentropy()
categorical_cross_entropy(y_true=y_true,y_pred=y_pred).numpy()

1.1438693

3. Sparse Categorical Crossentropy Loss:

It is used when there are two or more classes present in our classification task. similarly to categorical crossentropy. But there is one minor difference, between categorical crossentropy and sparse categorical crossentropy that’s in sparse categorical cross-entropy labels are expected to be provided in integers.

Rather than using Sparse Categorical crossentropy we can use one-hot-encoding and convert the above
problem into categorical crossentropy


In [33]:
y_true = [1, 2]
#Predicted Lables
y_pred = [[0.05, 0.95, 0],
          [0.1, 0.8, 0.1]]
#Implementation of Sparse Categorical Crossentropy
tf.keras.losses.sparse_categorical_crossentropy(y_true,y_pred).numpy()



array([0.05129344, 2.3025851 ], dtype=float32)

4.  Poisson loss:

The poison loss is the mean of elements of tensor. we can calculate poison loss like 
* y_pred – y_true*log(y_true)

In [10]:
#input Labels
y_true = [[0., 1.],
          [0., 0.]]
#Predicted Lables
y_pred = [[1., 1.],
          [1., 0.]]

# Using 'auto'/'sum_over_batch_size.
p = tf.keras.losses.Poisson()
p(y_true, y_pred).numpy()

0.75

5. Kullback-Leibler Divergence Loss:

Also, called KL divergence, it’s calculated by doing a negative sum of probability of each event P and then multiplying it by the log of the probability of an event.

KL(P || Q) = – sum x in X P(x) * log(Q(x) / P(x))

In [11]:
#input Labels 
y_true = [[0, 1], [0, 0]] 
# Predicted Lables
y_pred = [[0.7, 0.8], [0.4, 0.8]] #KL divergen loss
kl = tf.keras.losses.KLDivergence() 
kl(y_true, y_pred).numpy()

0.11156943

# Hinge Losses for ‘Maximum – Margin’ Classification:

11. Hinge Loss
It’s mainly used for problems like maximum-margin most notably for support vector machines.
In Hinge loss values are expected to be -1 or 1. In the case of binary i.e. 0 or 1 it’ll get converted into -1 and

In [12]:
y_true = [[0., 1.], [0., 0.]] 
y_pred = [[0.5, 0.4], [0.4, 0.5]] 
h =tf.keras.losses.Hinge() 
h(y_true, y_pred).numpy()

1.25

2. Squared Hinge Loss:
* The Square Hinge loss is just square of hinge loss.


In [13]:
y_true = [[0., 1.], [0., 0.]] 
y_pred = [[0.5, 0.4], [0.4, 0.5]] 
h =tf.keras.losses.SquaredHinge() 
h(y_true, y_pred).numpy()

1.705

3. Categorical Hinge Loss:
* It calculates the categorical hing loss between y_true and y_pred labels.


In [14]:
y_true = [[0., 1.], [0., 0.]] 
y_pred = [[0.5, 0.4], [0.4, 0.5]] 
h =tf.keras.losses.CategoricalHinge() 
h(y_true, y_pred).numpy()

1.3