<h1>Chapter-5 : Calculating Network Error with Loss </h1>


<h3>Categorical Cross-Entropy Loss</h3>


<img src="./lossFunc.png"/>

Where Li denotes sample loss value, i is the i-th sample in the set, 
j is the label/output index, y denotes the target values, and y-hat denotes the predicted values.

A classification model (like that of ours), returns a probability distributions over all the outputs.<br/>
Cross-Entropy compares two probability distributions.

In [120]:
# Consider the below arr as the output of out model
softmax_output = [0.7, 0.1, 0.2]

# consider that the desired prediction is the first index, then probablity distribution will be like-
desired_ouput = [1, 0, 0]
#Thea above is also called as the one-hot vector, in which the index of desired-output class is 1 and others are 0.

<img src="./lossFunc2.png"/>

This is how the categorical-cross-entropy loss function works.
<br/>
Let's implement this in a Python code.

In [121]:
import math

softmax_output = [0.7, 0.1, 0.2]
target_output = [1, 0, 0]

loss = -(
    math.log(softmax_output[0])*target_output[0] +
    math.log(softmax_output[1])*target_output[1] +
    math.log(softmax_output[2])*target_output[2] 
)

print("Loss: ", loss)

# This is the full categorical cross entropy calculation.
# Furthermore, value of target_output[1] and  target_output[2] is 0, so we need not to calculate that.
# Also, value of target_output[0] is 1.
# Therefore, the below gives us the same output

new_loss = -(math.log(softmax_output[0]))
print("New Loss: ", new_loss)

Loss:  0.35667494393873245
New Loss:  0.35667494393873245


In [122]:
# One important feature regarding Cross Entropy Function
output_1 = [0.22, 0.6, 0.18]
output_2 = [0.32, 0.36, 0.32]

# In both the above cases, the argmax (returns the index of highest value in array),
# will return us 1 (index of -> 0.6 and 0.36 {HIGHEST})
# But the confidence of our model is far very less in the case of output_2.
# Therefore, the Categorical Cross-Entropy Loss accounts for this and outputs a larger loss in the case of lower confidence
# Let's see how

print(math.log(1.))
print(math.log(0.9))
print(math.log(0.8))
print(math.log(0.7))
print(math.log(0.6))
print(math.log(0.5))
print("...")
print(math.log(0.1))
print(math.log(0.05))
print(math.log(0.01))

# The different log values show the different loss values.
# As noted, log(1) = 0, this means that the model is 100% sure about its prediction and that the loss function is 0.


0.0
-0.10536051565782628
-0.2231435513142097
-0.35667494393873245
-0.5108256237659907
-0.6931471805599453
...
-2.3025850929940455
-2.995732273553991
-4.605170185988091


In [123]:
import numpy as np

# Batches of input
softmax_output = np.array([
    [0.7, 0.1, 0.2],
    [0.1, 0.5, 0.4],
    [0.02, 0.9, 0.08]
])

class_targets = [0, 1, 1]

# np gives us independence to get values from a list of indexes
correct_confidence = softmax_output[range(len(softmax_output)), class_targets]
print(correct_confidence)

# Applying -ve log to these values
neg_loss = -np.log(correct_confidence)
print(neg_loss)

# Avg loss
avg_loss = np.mean(neg_loss)
print(avg_loss)

[0.7 0.5 0.9]
[0.35667494 0.69314718 0.10536052]
0.38506088005216804


In [124]:
# The above is the case when the target_values are sparse, 
# meaning, the values in array contain the correct class numbers
# However the target_values can also be one-hot coded and we have to handle seperately.

import math

softmax_output = np.array([
    [0.7, 0.1, 0.2],
    [0.1, 0.5, 0.4],
    [0.02, 0.9, 0.08]
])

class_targets = np.array([
    [1, 0, 0],
    [0, 1, 0],
    [0, 1, 0]
])

correct_confidence = np.sum(softmax_output*class_targets, axis=1)
print(correct_confidence)

# Rest is same as above
# Applying -ve log to these values
neg_loss = -np.log(correct_confidence)
print(neg_loss)

# Avg loss
avg_loss = np.mean(neg_loss)
print(avg_loss)


[0.7 0.5 0.9]
[0.35667494 0.69314718 0.10536052]
0.38506088005216804


In [125]:
# There is a problem with log(0) = INF | Undefined.
# Therefore we clip values from both end in a log function

# y_pred_clipped = np.clip(y_pred, 1e-7, 1-1e-7)


LOSS FUNCTION

In [126]:
# Common Loss Function -> Will be used by all the different types of loss functions
# Returns an avg of losses
class Loss:

    # output => model's prediction
    # y => ground truth
    def calculate(self, output, y):
        # forward method is of specific loss function eg. Cross Entropy
        sample_losses = self.forward(output, y)
        
        data_loss = np.mean(sample_losses)

        return data_loss

In [127]:
# Cross Entropy Loss:
class Loss_Categorical_Cross_Entropy(Loss):
    
    def forward(self, y_pred, y_true):
        
        y_pred_clipped = np.clip(y_pred, 1e-7, 1 - 1e-7)

        # check if y_true is sparse or one-hot-coded
        if len(y_true.shape) == 1:
            correct_confidence = y_pred_clipped[range(len(y_pred_clipped)), y_true]
        else:
            correct_confidence = np.sum(y_pred_clipped * y_true, axis=1)

        # Losses
        neg_log = -np.log(correct_confidence)
        return neg_log

In [128]:
softmax_output = np.array([
    [0.7, 0.1, 0.2],
    [0.1, 0.5, 0.4],
    [0.02, 0.9, 0.08]
])

class_targets = np.array([
    [1, 0, 0],
    [0, 1, 0],
    [0, 1, 0]
])

loss_function = Loss_Categorical_Cross_Entropy()
loss = loss_function.calculate(softmax_output, class_targets)

print("Loss: ", loss)

Loss:  0.38506088005216804


<h2>Code upto this point</h2>

In [129]:
import numpy as np
import nnfs
from nnfs.datasets import spiral_data

nnfs.init()

# Dense Layer
class Layer_Dense:

    # Layer init
    def __init__(self, n_inputs, n_neurons):
        self.weights = 0.01 * np.random.randn(n_inputs, n_neurons)
        self.biases = np.zeros((1, n_neurons))

    def forward(self, inputs):
        self.output = np.dot(inputs, self.weights) + self.biases

# ReLU activation function
class Activation_ReLU:

    def forward(self, inputs):
        self.output = np.maximum(0, inputs)

# Softmax Activation Function
class Activation_Softmax:

    def forward(self, inputs):
        expo_values = np.exp(inputs - np.max(inputs, axis=1, keepdims=True))
        norm_values = expo_values / np.sum(expo_values, axis=1, keepdims=True)
        self.output = norm_values

# Common Loss
class Loss:

    # output => model's prediction
    # y => ground truth
    def calculate(self, output, y):
        # forward method is of specific loss function eg. Cross Entropy
        sample_losses = self.forward(output, y)
        
        data_loss = np.mean(sample_losses)

        return data_loss


# Cross Entropy Loss:
class Loss_Categorical_Cross_Entropy(Loss):
    
    def forward(self, y_pred, y_true):
        
        y_pred_clipped = np.clip(y_pred, 1e-7, 1 - 1e-7)

        # check if y_true is sparse or one-hot-coded
        if len(y_true.shape) == 1:
            correct_confidence = y_pred_clipped[range(len(y_pred_clipped)), y_true]
        else:
            correct_confidence = np.sum(y_pred_clipped * y_true, axis=1)

        # Losses
        neg_log = -np.log(correct_confidence)
        return neg_log


X, y = spiral_data(samples=100, classes=3)

# Initialization
dense1 = Layer_Dense(2, 3)
activation1 = Activation_ReLU()

dense2 = Layer_Dense(3, 3)
activation2 = Activation_Softmax()

loss_function = Loss_Categorical_Cross_Entropy()

# Forward pass
dense1.forward(X)
activation1.forward(dense1.output)

dense2.forward(activation1.output)
activation2.forward(dense2.output)

print(activation2.output[:5])

loss = loss_function.calculate(activation2.output, y)
print("Avg Loss: ", loss)

[[0.33333334 0.33333334 0.33333334]
 [0.33333316 0.3333332  0.33333364]
 [0.33333287 0.3333329  0.33333418]
 [0.3333326  0.33333263 0.33333477]
 [0.33333233 0.3333324  0.33333528]]
Avg Loss:  1.0986104
