# Chapter 5: Calculating Network Error with Loss

The loss function or the cost function is used to quantify how wrong a model is.

The output of a NN is confidence, more confidence in the correct answer is better.

We try to increase correct confidence and decrease misplaced confidence.

## 5.1. Categorical Cross-Entropy Loss

In classification, the Softmax activation on the output layer returns a probability distribution. The predicted distribution ($\hat y$ or predictions) is compared with the ground-truth probability ( $y$ or targets) using categorical cross-entropy.

The categorical cross-entropy of $y$ (actual/desired distribution) and $\hat y$ (predicted distribution) is:

$$
L_i = - \sum_{j} y_{i,j} \ \text{log} \ (\hat{y}_{i,j})
$$

$L_i$ denotes sample loss value, $i$ is the  $i$-th sample in the set, $j$ is the label/output index, $y$ denotes the target values, and $\hat y$ denotes the predicted values.

In coding, we simplify it further to

$$
L_i = \ \text{log} \ \big(  \hat{y}_{i,k} \big) \qquad \text{where }  k \text{ is an index of "true" probability}
$$

$L_i$ denotes sample loss value, $i$ is the  $i$-th sample in the set, $k$ is the index of the target label (ground-true label), $y$ denotes the target values, and $\hat y$ denotes the predicted values.

Consider a Softmax output of `[0.7, 0.1, 0.2]` and targets of `[1, 0, 0]`, we have

$$
\begin{align}
L_i & = - \sum_{j} y_{i,j} \ \text{log} \ (\hat{y}_{i,j}) = - \Big( 1 \cdot \text{log} \ (0.7) +0 \cdot \text{log} \ (0.1) + 0 \cdot \text{log} \ (0.2) \Big)\\
& = -\Big( -0.35667494393873245 +0 + 0 \Big) = 0.35667494393873245 \
\end{align}
$$

In Python:

In [None]:
import math

# An example output from the output layer of the neural network
softmax_output = [0.7, 0.1, 0.2]

# Ground truth
target_output = [1, 0, 0]
loss = - (math.log(softmax_output[0]) * target_output[0] +
          math.log(softmax_output[1]) * target_output[1] +
          math.log(softmax_output[2]) * target_output[2])
print (loss)

loss = - math.log(softmax_output[0])
print (loss)

0.35667494393873245
0.35667494393873245


Update our process to work on batches of softmax output distribution and make the negative log calculation dynamic to the target index.

In [7]:
# Probabilities for 3 samples
softmax_outputs = [[0.7, 0.1, 0.2], [0.1, 0.5, 0.4], [0.02, 0.9, 0.08]]
class_targets = [0, 1, 1]    # numbers contained are the correct class numbers
for targ_idx, distribution in zip (class_targets, softmax_outputs):
    print (distribution[targ_idx])

0.7
0.5
0.9


This can be further simplified using NumPy.

In [None]:
import numpy as np

softmax_outputs = np.array([[0.7, 0.1, 0.2], [0.1, 0.5, 0.4], [0.02, 0.9, 0.08]])
class_targets = [0, 1, 1]
print (softmax_outputs[[0, 1, 2], class_targets])

[0.7 0.5 0.9]


This returns a list of the confidences at the target indices for each of the samples.

We can use a `range()` instead of type each value:

In [None]:
print(softmax_outputs[range(len(softmax_outputs)), class_targets])

[0.7 0.5 0.9]


Now apply the negative log to this list.

In [None]:
print(-np.log(softmax_outputs[range(len(softmax_outputs)), class_targets]))

[0.35667494 0.69314718 0.10536052]


Average loss per batch to have an idea about how our model is doing during training.

In [None]:
neg_log = - np.log(softmax_outputs[range(len(softmax_outputs)), class_targets])
average_loss = np.mean(neg_log)
print (average_loss)

0.38506088005216804


Targets can be one-hot encoded where the correct label’s position is filled with 1, all others are zeros.

In [None]:
softmax_outputs = np.array([[0.7, 0.1, 0.2], [0.1, 0.5, 0.4], [0.02, 0.9, 0.08]])
class_targets = np.array([[1, 0, 0], [0, 1, 0], [0, 1, 0]])

# Probabilities for target values - only if categorical labels
if len (class_targets.shape) == 1:   # for [0,1,1]
    correct_confidences = softmax_outputs[ range ( len (softmax_outputs)), class_targets]

# Mask values - only for one-hot encoded labels
elif len (class_targets.shape) == 2:
    correct_confidences = np.sum(softmax_outputs * class_targets, axis = 1)

# Losses
neg_log = - np.log(correct_confidences)
average_losses = np.mean(neg_log)

print(average_losses)


0.38506088005216804


## 5.2. The Categorical Cross-Entropy Loss Class

In [None]:
# Common loss class
class Loss :
    # Calculates the data and regularization losses given model output and ground truth values
    def calculate ( self , output , y ):
        # Calculate sample losses
        sample_losses = self.forward(output, y)
        # Calculate mean loss
        data_loss = np.mean(sample_losses)
        # Return loss
        return data_loss

# Cross-entropy loss
class Loss_CategoricalCrossentropy(Loss):
    # Forward pass
    def forward(self, y_pred, y_true):
        # Number of samples in a batch
        samples = len(y_pred)
        # Clip data to prevent division by 0
        # Clip both sides to not drag mean towards any value
        y_pred_clipped = np.clip(y_pred, 1e-7, 1 - 1e-7)
        # Probabilities for target values - only if categorical labels
        if len(y_true.shape) == 1:
            correct_confidences = y_pred_clipped[ range(samples), y_true]
        # Mask values - only for one-hot encoded labels
        elif len(y_true.shape) == 2:
            correct_confidences = np.sum( y_pred_clipped*y_true, axis=1)
        # Losses
        negative_log_likelihoods = -np.log(correct_confidences)
        return negative_log_likelihoods

loss_function = Loss_CategoricalCrossentropy()
loss = loss_function.calculate(softmax_outputs, class_targets)
print (loss)

0.38506088005216804


## 5.3. Combining everything up to this point

In [None]:
import numpy as np
import nnfs
from nnfs.datasets import spiral_data
nnfs.init()

# Dense layer
class Layer_Dense:
    # Layer initialization
    def __init__(self, n_inputs, n_neurons):
        # Initialize weights and biases
        self.weights = 0.01 * np.random.randn(n_inputs, n_neurons)
        self.biases = np.zeros((1, n_neurons))
    # Forward pass
    def forward(self, inputs):
        # Calculate output values from inputs, weights and biases
        self.output = np.dot(inputs, self.weights) + self.biases

# ReLU activation
class Activation_ReLU:
    # Forward pass
    def forward(self, inputs):
        # Calculate output values from inputs
        self.output = np.maximum(0, inputs)

# Softmax activation
class Activation_Softmax:
    # Forward pass
    def forward(self, inputs):
        # Get unnormalized probabilities
        exp_values = np.exp(inputs - np.max(inputs, axis=1, keepdims=True))
        # Normalize them for each sample
        probabilities = exp_values / np.sum(exp_values, axis=1, keepdims=True)
        self.output = probabilities

# Common loss class
class Loss:

    # Calculates the data and regularization losses given model output and ground truth values
    def calculate(self, output, y):
        # Calculate sample losses
        sample_losses = self.forward(output, y)
        #print("Hey Um sample losees: ", sample_losses)
        # Calculate mean loss
        data_loss = np.mean(sample_losses)
        # Return loss
        return data_loss

# Cross-entropy loss
class Loss_CategoricalCrossentropy(Loss):
    # Forward pass
    def forward(self, y_pred, y_true):
        # Number of samples in a batch
        samples = len(y_pred)
        # Clip data to prevent division by 0
        # Clip both sides to not drag mean towards any value
        y_pred_clipped = np.clip(y_pred, 1e-7, 1 - 1e-7)
        # Probabilities for target values - only if categorical labels
        if len(y_true.shape) == 1:
            correct_confidences = y_pred_clipped[ range(samples), y_true]
        # Mask values - only for one-hot encoded labels
        elif len(y_true.shape) == 2:
            correct_confidences = np.sum( y_pred_clipped*y_true, axis=1)
        # Losses
        negative_log_likelihoods = -np.log(correct_confidences)
        return negative_log_likelihoods

# Create dataset
X, y = spiral_data(samples=100, classes=3)
# Create Dense layer with 2 input features and 3 output values
dense1 = Layer_Dense(2, 3)
# Create ReLU activation (to be used with Dense layer):
activation1 = Activation_ReLU()
# Create second Dense layer with 3 input features (as we take output previous layer here) and 3 output values
dense2 = Layer_Dense(3, 3)
# Create Softmax activation (to be used with Dense layer):
activation2 = Activation_Softmax()
# Create loss function
loss_function = Loss_CategoricalCrossentropy()
# Perform a forward pass of our training data through this layer
dense1.forward(X)
# Perform a forward pass through activation function takes the output of first dense layer here
activation1.forward(dense1.output)
# Perform a forward pass through second Dense layer it takes outputs of activation function of first layer as inputs
dense2.forward(activation1.output)
# Perform a forward pass through activation function it takes the output of second dense layer here
activation2.forward(dense2.output)
# Let's see output of the first few samples:
print(activation2.output[:5])
# Perform a forward pass through loss function it takes the output of second dense layer here and returns loss
loss = loss_function.calculate(activation2.output, y)
# Print loss value
print('Loss:', loss)

# Calculate accuracy from output of activation2 and targets
predictions = np.argmax(activation2.output, axis = 1 )
if len (y.shape) == 2 :
    y = np.argmax(y, axis = 1 )
accuracy = np.mean(predictions == y)
# Print accuracy
print ('Acc:' , accuracy)


[[0.33333334 0.33333334 0.33333334]
 [0.3333332  0.3333332  0.33333364]
 [0.3333329  0.33333293 0.3333342 ]
 [0.3333326  0.33333263 0.33333477]
 [0.33333233 0.3333324  0.33333528]]
Hey Um sample losees:  [1.0986123 1.0986127 1.0986136 1.0986145 1.0986153 1.0986137 1.0986168
 1.098617  1.0986178 1.0986192 1.0986199 1.098619  1.0986207 1.0986221
 1.098622  1.0986223 1.0986185 1.0986241 1.0986226 1.098626  1.098626
 1.0986195 1.098617  1.0986255 1.0986354 1.0986278 1.0986179 1.0986184
 1.0986398 1.0986419 1.0986304 1.098637  1.0986211 1.0986254 1.0986376
 1.0986471 1.0986526 1.098653  1.0986507 1.098654  1.0986497 1.0986483
 1.0986478 1.0986323 1.0986618 1.098662  1.098664  1.0986414 1.0986664
 1.0986533 1.0986621 1.0986348 1.0986497 1.0986599 1.0986315 1.0986141
 1.0986186 1.0986125 1.0986273 1.0986137 1.0986171 1.098616  1.0986123
 1.0986311 1.0986419 1.0986344 1.0986123 1.0986576 1.0986377 1.0986589
 1.0986654 1.0986648 1.0986633 1.0986542 1.0986674 1.0986677 1.0986686
 1.0986708 1.098

## 5.4. Accuracy Calculation 

Loss is a useful metric for optimizing a model.

Accuracy is commonly used to describes how often the largest confidence is the correct class in terms of a fraction.

In [17]:
import numpy as np

# Probabilities of 3 samples
softmax_outputs = np.array([[ 0.7 , 0.2 , 0.1 ],
                            [ 0.5 , 0.1 , 0.4 ],
                            [ 0.02 , 0.9 , 0.08 ]])

# Target (ground-truth) labels for 3 samples
class_targets = np.array([ 0 , 1 , 1 ])

# Calculate values along second axis (axis of index 1)
predictions = np.argmax(softmax_outputs, axis = 1 )

# If targets are one-hot encoded - convert them
if len (class_targets.shape) == 2 :
    class_targets = np.argmax(class_targets, axis = 1 )

# True evaluates to 1; False to 0
accuracy = np.mean(predictions == class_targets)
print ( 'acc:' , accuracy)
# print(predictions ==class_targets)

# print(np.mean([True, False, True]))

acc: 0.6666666666666666


We performed a forward pass through our network and calculated the metrics to signal if the model is performing poorly, we will embark on optimization in the next chapter.