In [11]:
# reference https://medium.com/

<h2> 1. Categorical Crossentropy </h2>

<p>
This loss function work for multiclass, single-label classification. This employ when only one category applies to each data point. It compares the distribution of predictions (the activations in the output layer, one for each class) with the actual distribution, where the probability of the true class is 1and 0 for others. The true class is represented as a one-hot encoded vector, and the closer the model’s output to that vector, lower will be the Loss. In this category, the last layer activation function used is “Softmax” activation function.    
    
</p>

<p> 
When to Use: In MNIST number digit classification problem, Categorical cross-entropy gives the probability that an image of a number is, a nine or a five or any different number.    
</p>
<img src="1.png">

In [17]:
import numpy as np

predictions = np.array([[0.25,0.25,0.25,0.25],
                       [0.01,0.01,0.01,0.96]])

# predictions = np.array([[0.0,0.0,0.0,0.9],
#                         [0.01,0.01,0.0,0.96]])

targets = np.array([[0,0,0,1],
                   [0,0,0,1]])
def categorical_crossentropy(predictions, targets, epsilon=1e-10):
    predictions = np.clip(predictions, epsilon, 1. - epsilon)
    N = predictions.shape[0]
    ce_loss = -np.sum(np.sum(targets * np.log(predictions + 1e-5)))
    return ce_loss

categorical_crossentropy_loss = categorical_crossentropy(predictions, targets)
print ("Categorical_cross entropy loss is: " + str(categorical_crossentropy_loss))

Categorical_cross entropy loss is: 1.427065939827711


<h2>2. Binary Crossentropy </h2>

<p> This loss function work for multiclass, multilabel classification. The Loss tells us how wrong the model predictions are. In multilabel problems, where an example can belong to multiple classes at the same time, the model tries to decide for each category whether the sample belongs to that category or not.
Binary cross-entropy measures how far away from the actual value (either 0 or 1). The prediction is for each of the classes and then averages these class-wise errors to obtain the final Loss.
</p>
<p>When to use: If we want to determine the attitude of a piece of music. Every piece can have more than one attitude. For instance, it can be both “Happy” and “Energetic” at the same time. To solve this problem, we can use Binary cross-entropy. </p>


<img src="2.png">

In [13]:
import numpy as np
predictions = np.array([[0.25,0.25,0.25,0.25],
                        [0.01,0.01,0.01,0.96]])
actual = np.array([[0,0,0,1],
                   [0,0,0,1]])
def binary_crossentropy(predictions, targets, epsilon=1e-10):
    predictions = np.clip(predictions, epsilon, 1. - epsilon)
    N = predictions.shape[0]
    ce_loss = -np.sum(np.sum(targets * np.log(predictions + 1e-5)))/N
    return ce_loss
binary_crossentropy_loss = binary_crossentropy(predictions, targets)
print ("Binary_Cross entropy loss is: " + str(binary_crossentropy_loss))

Binary_Cross entropy loss is: 0.7135329699138555


<h2>3. Mean Absolute Error(MAE) </h2>
<p>It is the absolute difference between the actual and predicted value.
MAE is not sensitive towards outliers and given several examples with the same input feature values, and the optimal prediction will be their median target value.
The disadvantage of MAE is that the gradient magnitude is not dependent on the error size; it depends only on the sign of (y-y_hat). This means that the gradient magnitude will be considerable even when the error is small, which is, in turn, can lead to convergence problems.
</p>
<p>When to use: Use MAE where you are doing regression and don’t want outliers to play a significant role. It can also be useful if you know that your distribution is multimodel, and it’s desirable to have predictions in median mode.</p>
<img src="3.png">

In [15]:
import numpy as np
predictions = np.array([2.5, 0.0, 2, 8])
actual = np.array([3, -0.5, 2, 7])

print("p is: " + str(["%.8f" % elem for elem in predictions]))
print("a is: " + str(["%.8f" % elem for elem in actual]))

def mae(predictions, actual):
    difference = predictions - actual
    absolute_difference = np.absolute(difference)
    mean_absolute_difference = absolute_difference.mean()
    return mean_absolute_difference
mae_val = mae(predictions, actual)
print ("mae error is: " + str(mae_val))

p is: ['2.50000000', '0.00000000', '2.00000000', '8.00000000']
a is: ['3.00000000', '-0.50000000', '2.00000000', '7.00000000']
mae error is: 0.5


<h2>4. Mean Squared Error (MSE)</h2>
<p>It is the mean of the squared difference between the actual and predicted value and most commonly used loss function for regression.
MSE is sensitive towards outliers and given several examples with the same input feature values; the optimal prediction will be their mean target value.</p>
<p>When to use: When doing regression. It penalizes significant error as compared to small ones.</p>

<img src="4.png">

In [16]:
import numpy as np
predictied = np.array([2.5, 0.0, 2, 8])
actual = np.array([3, -0.5, 2, 7])
def rmse(predicted, actual):
    difference = predicted - actual
    difference_squared = difference ** 2
    mean_of_difference_squared = difference_squared.mean()
    rmse_val = np.sqrt(mean_of_difference_squared)
    return rmse_val
print("p is: " + str(["%.8f" % elem for elem in predictions]))
print("a is: " + str(["%.8f" % elem for elem in actual]))
rmse_val = rmse(predictions, actual)
print("rms error is: " + str(rmse_val))

p is: ['2.50000000', '0.00000000', '2.00000000', '8.00000000']
a is: ['3.00000000', '-0.50000000', '2.00000000', '7.00000000']
rms error is: 0.6123724356957945
