# Custom loss function

Keras allows us to use custom objects, one of them is ability to implement custom loss functions.

In [None]:
import tensorflow as tf
import numpy as np
from tensorflow import keras
from tensorflow.keras.layers import Dense, Input
from tensorflow.keras.models import Model
import matplotlib.pyplot as plt

## Huber loss function

Huber loss could be used for regression tasks, just like MSE.

It is less sensitive to outliers in data because it is quadratic for small values and linear for the big ones.

https://en.wikipedia.org/wiki/Huber_loss

$L_\delta(a) = \left\{ \begin{array}{ccc} \frac{1}{2}a^2 & for |a|\le \delta\\
\delta(|a| - \frac{1}{2}\delta), & otherwise\end{array} \right\}$

It is important to set good value for $\delta$ threshold.

In [None]:
def plain_huber_loss(error, delta):
    # is an absolute value of an error smaller than threshold?
    is_small_error = np.abs(error) <= delta
    # quadratic branch for smaller errors
    small_error_loss = 0.5 * np.square(error)
    # linear branch for greater errors
    big_error_loss = delta * (np.abs(error) - (0.5*delta))
    # returns values in this manner - (condition, true, false)
    return np.where(is_small_error, small_error_loss, big_error_loss)

Exploring results with different delta thresholds.

In [None]:
# create array of values from <-4, 4) iterate by 0.1
x = np.arange(-4, 4, 0.1)

# plot loss values
plt.title('Huber loss')
plt.plot(x, plain_huber_loss(x, 0.1), label='δ = 0.1')
plt.plot(x, plain_huber_loss(x, 0.5), label='δ = 0.5')
plt.plot(x, plain_huber_loss(x, 1), label='δ = 1')
plt.legend()
plt.grid()
plt.show()

## Testing data

To keep example simple as possible, I will create array of numbers from -1 to 9 as **x** and set relation with **y** by $y = 2x-1$. So for `x=20` we should get `y=39`.

In [None]:
# inputs - arange takes parameters start, stop, step, so iterate by 1 from -1 to 10
x = np.arange(-1, 10, 1).astype('float32')
print(f'x: {x}')
# label formula
y = x * 2 - 1
print(f'y: {y}')

## Model architecture

It is just a linear function, so we are OK with neural network with just one fully connected unit with linear activation.

In [None]:
input_layer = Input(shape=(1,))
output_layer = Dense(1)(input_layer)

## Using build in loss function

For start we just set up model for using mean squared error loss function using stochastic gradient descent optimizer and we will train model for 500 epochs.

In [None]:
%%time
model_mse_loss = Model(inputs=input_layer, outputs=output_layer)
model_mse_loss.summary()
model_mse_loss.compile(optimizer='sgd', loss='mse')
history = model_mse_loss.fit(x, y, epochs=500, verbose=0)

Plotting loss function during the training

In [None]:
plt.title('MSE loss')
plt.plot(history.history['loss'], label='MSE loss')
plt.legend()
plt.ylim([0, 2])
plt.grid()
plt.show()

Predicting test value.

In [None]:
model_mse_loss.predict([20.0])

Get coefficients for **weight** and **bias**.

In [None]:
model_mse_loss.get_weights()

## Define custom Huber loss

For the first case of custom loss we will hard code threshold.

We will use TensorFlow functions in these cases because we are working with TF tensors. I will talk more about them later, for now it is just enough to know they are kind of equivalent for numpy operations.

In [None]:
def huber_loss(y_true, y_pred):
    # hardcoded threshold value
    threshold = 1
    # get an error between label and prediction
    error = y_true - y_pred
    # is the error smaller than threshold delta?
    is_small_error = tf.abs(error) <= threshold
    # quadratic values for smaller errors
    small_error_loss = 0.5 * tf.square(error)
    # linear part for bigger than threshold
    big_error_loss = threshold * (tf.abs(error) - (0.5*threshold))
    # if it is a small error, return small error loss, big error loss otherwise
    return tf.where(is_small_error, small_error_loss, big_error_loss)

Model creation.

In [None]:
%%time
# we need to define layers again, so it will not work with pre-trained weights from previous MSE model
input_layer = Input(shape=(1,))
output_layer = Dense(1)(input_layer)
model_huber_loss = Model(inputs=input_layer, outputs=output_layer)
# loss='huber_loss' would also work
model_huber_loss.compile(optimizer='sgd', loss=huber_loss)
history = model_huber_loss.fit(x, y, epochs=500, verbose=0)

In [None]:
plt.title('Huber loss with hard coded threshold')
plt.plot(history.history['loss'], label='Huber loss')
plt.legend()
plt.grid()
plt.show()

Predicting test value.

In [None]:
model_huber_loss.predict([[20.0]])

Get coefficients for **weight** and **bias**.

In [None]:
model_huber_loss.get_weights()

## Hyperparameter for Huber loss

As we saw, just setting arbitrary value can produce worse result than standard MSE. One thing we can do is to allow user to configure delta as a hyperparameter (parameters of network that needs to be set by user, not by training).

In [None]:
# wrapper function that accepts the hyperparameter
def my_huber_loss_with_threshold(threshold):
  
    # the same error loss as before in closure
    def my_huber_loss(y_true, y_pred):
        error = y_true - y_pred
        # using threshold value from wrapper
        is_small_error = tf.abs(error) <= threshold
        small_error_loss = 0.5 * tf.square(error)
        big_error_loss = threshold * (tf.abs(error) - (0.5*threshold))        
        return tf.where(is_small_error, small_error_loss, big_error_loss) 

    # return the inner function with set hyperparameter
    return my_huber_loss

Defining model

In [None]:
%%time
input_layer = Input(shape=(1,))
output_layer = Dense(1)(input_layer)
model_huber_loss_threshold = Model(inputs=input_layer, outputs=output_layer)
# now I can set threshold value!
model_huber_loss_threshold.compile(optimizer='sgd', loss=my_huber_loss_with_threshold(threshold=1.2))
history = model_huber_loss_threshold.fit(x, y, epochs=500, verbose=0)

In [None]:
plt.title('Huber loss with threshold as a hyperparameter')
plt.plot(history.history['loss'], label='Huber loss')
plt.legend()
plt.grid()
plt.show()

Predicting test value.

In [None]:
model_huber_loss_threshold.predict([20.0])

Get coefficients for **weight** and **bias**.

In [None]:
model_huber_loss_threshold.get_weights()

## Implement custom loss as a class

We can implement loss function as a class by inheriting from Keras Loss class.

In that case we need implement `call` function that calculates loss function value.

In [None]:
from tensorflow.keras.losses import Loss

# inheriting from Loss class
class MyHuberLoss(Loss):  
    # we set threshold in constructor
    def __init__(self, threshold=1):
        super().__init__()        
        self.threshold = threshold

    # body of a function is the same as before
    def call(self, y_true, y_pred):
        error = y_true - y_pred
        is_small_error = tf.abs(error) <= self.threshold
        small_error_loss = 0.5 * tf.square(error)
        big_error_loss = self.threshold * (tf.abs(error) - (0.5 * self.threshold))        
        return tf.where(is_small_error, small_error_loss, big_error_loss) 

In [None]:
%%time
input_layer = Input(shape=(1,))
output_layer = Dense(1)(input_layer)
model_huber_class_loss = Model(inputs=input_layer, outputs=output_layer)
model_huber_class_loss.compile(optimizer='sgd', loss=MyHuberLoss(threshold=0.7))
history = model_huber_class_loss.fit(x, y, epochs=500, verbose=0)

In [None]:
plt.title('Huber loss in a class')
plt.plot(history.history['loss'], label='Huber loss')
plt.legend()
plt.grid()
plt.show()

Predicting test value.

In [None]:
model_huber_class_loss.predict([20.0])

Get coefficients for **weight** and **bias**.

In [None]:
model_huber_loss_threshold.get_weights()