## Calculating Network Error with Loss
To Begin adjusting ***weights*** and ***biases*** to decrease error over time, our next step is to quantify how wrong the model is through whats defined as a ***loss function***.
***accuracy*** is simply applying an argmax to the output to find the index of the biggest value
We’re not performing regression in this example; we’re classifying, so we need a different loss function. The model has a softmax activation function for the output layer, which means it’s outputting a probability distribution

##### Categorical Cross-Entropy Loss
Categorical cross-entropy is explicitly used to compare a ground-truth probability (y or targets) and some predicted distribution (y-hat o r predictions), so it makes sense to use cross-entropy here. It is also one of the most commonly used loss functions with a softmax activation on the output layer.
<div>
<img src="images/image5-6.1.png" width="400"/>
</div>
Where Li denotes sample loss value, i is the i-th sample in the set, j is the label/output index, y denotes the target values, and y-hat denotes the predicted values.

In [11]:
import math
# An example output from the output layer of the neural network
softmax_output = [0.7, 0.1, 0.2] # Ground truth
target_output = [1, 0, 0]
loss = -(math.log(softmax_output[0])*target_output[0] +
         math.log(softmax_output[1])*target_output[1] +
         math.log(softmax_output[2])*target_output[2])
print(loss)

0.35667494393873245


the equation can be simplified into:
loss = -math.log(softmax_output[0]) because target_output[1] and target_output[2] are 0. This is because targets can be one-hot encoded, where all values, except for one, are zeros, and the correct label’s position is filled with 1

Log is short for logarithm and is defined as the solution for the x-term in an equation of the form a^x = b.

In [15]:
# pip install numpy nnfs
import numpy as np
import nnfs
from nnfs.datasets import spiral_data
nnfs.init()

class Layer_Dense:
    # Initialize weights and biases
    def __init__(self, n_inputs, n_neurons) :
        self.weights = 0.01 * np.random.randn(n_inputs, n_neurons)
        self.biases = np.zeros((1, n_neurons))
        
    # Forward pass
    # Calculate output values from inputs, weights and biases
    def forward(self, inputs):
        self.output = np.dot(inputs, self.weights) + self.biases
        
# ReLU activation
class Activation_ReLU:
    # Forward pass
    def forward(self, inputs):
        # Calculate output values from inputs self.output = np.maximum(0, inputs)
        self.output = np.maximum(0, inputs)
        
# Softmax activation
class Activation_Softmax: # Forward pass
    def forward(self, inputs):
        # Get unnormalized probabilities
        exp_values = np.exp(inputs - np.max(inputs, axis=1, keepdims=True))
        probabilities = exp_values / np.sum(exp_values, axis=1,keepdims=True)
        self.output = probabilities   
        
# Common loss class
class Loss:
    # Calculates the data and regularization losses # given model output and ground truth values
    def calculate(self, output, y) :
        # Calculate sample losses
        sample_losses = self.forward(output, y)
        # Calculate mean loss
        data_loss = np.mean(sample_losses)
        # Return loss 
        return data_loss
    
# Cross-entropy loss
class Loss_CategoricalCrossentropy(Loss):
    # Forward pass
    def forward(self, y_pred, y_true) :
        # Number of samples in a batch
        samples = len(y_pred)
        
        # Clip data to prevent division by 0
        # Clip both sides to not drag mean towards any value
        y_pred_clipped = np.clip(y_pred, 1e-7, 1 - 1e-7)
        
        # Probabilities for target values - # only if categorical labels
        if len(y_true.shape) == 1:
            correct_confidences = y_pred_clipped[range(samples), y_true]
            
        # Mask values - only for one-hot encoded labels
        elif len(y_true.shape) == 2:
            correct_confidences = np.sum(y_pred_clipped * y_true, axis= 1)
            
        # Losses
        negative_log_likelihoods = -np.log(correct_confidences)
        return negative_log_likelihoods
    
# Create dataset
X, y = spiral_data(samples=100, classes=3)
# Create Dense layer with 2 input features and 3 output values
dense1 = Layer_Dense(2, 3)
# Create ReLU activation (to be used with Dense layer):
activation1 = Activation_ReLU()
# Create second Dense layer with 3 input features (as we take output # of previous layer here) and 3 output values (output values)
dense2 = Layer_Dense(3, 3)
# Create Softmax activation (to be used with Dense layer):
activation2 = Activation_Softmax()

# Create loss function
loss_function = Loss_CategoricalCrossentropy()

# Make a forward pass of our training data through this layer
dense1.forward(X)

# Make a forward pass through activation function
activation1.forward(dense1.output)

# Make a forward pass through second Dense layer
# it takes outputs of activation function of first layer as inputs
dense2.forward(activation1.output)
# Make a forward pass through activation function
# it takes the output of second dense layer here
activation2.forward(dense2.output)

# Let's see output of the first few samples:
print(activation2.output[:5])


# Perform a forward pass through loss function
# it takes the output of second dense layer here and returns loss
loss = loss_function.calculate(activation2.output, y)

# Print loss value
print('loss:', loss)

[[0.33333334 0.33333334 0.33333334]
 [0.33333316 0.3333332  0.33333364]
 [0.33333287 0.3333329  0.33333418]
 [0.3333326  0.33333263 0.33333477]
 [0.33333233 0.3333324  0.33333528]]
loss: 1.0986104


## Accuracy Calculation
While loss is a useful metric for optimizing a model, the metric commonly used in practice along with loss is the ***accuracy***, which describes how often the largest confidence is the correct class in terms of a fraction.

In [13]:
import numpy as np

# Probabilities of 3 samples
softmax_outputs = np.array([[0.7, 0.2, 0.1], [0.5, 0.1, 0.4],[0.02, 0.9, 0.08]])

# Target (ground-truth) labels for 3 samples
class_targets = np.array([0, 1, 1])

# Calculate values along second axis (axis of index 1)
predictions = np.argmax(softmax_outputs, axis=1) 

# If targets are one-hot encoded - convert them 
if len(class_targets.shape) == 2:
    class_targets = np.argmax(class_targets, axis= 1) # True evaluates to 1; False to 0
    
accuracy = np.mean(predictions==class_targets)
print('acc:', accuracy)

acc: 0.6666666666666666


In [14]:
# Calculate accuracy from output of activation2 and targets
# Calculate values along first axis
predictions = np.argmax(activation2.output, axis=1)
if len(y.shape) == 2:
    y = np.argmax(y, axis=1)
accuracy = np.mean(predictions==y)
# Print accuracy
print('acc:', accuracy)

acc: 0.34


Now that you’ve learned how to perform a forward pass through our network and calculate the metrics to signal if the model is performing poorly, we will embark on optimization
## Optimization
Now that the neural network is built, able to have data passed through it, and capable of calculating loss, the next step is to determine how to adjust the weights and biases to decrease the loss.