# Chapter 5: Cross-Entropy Loss Function

- the **loss function** is the algorithm that quantifies how wrong a model is
- since we are performing a classification task, we will use the **cross-entropy** loss function

In [2]:
softmax_output = [0.7, 0.1, 0.2]

- in the code above, we have 3 class confidences; let's assume that the desired prediction is the 1st class (index 0)
- if that's the desired prediction, then the desired probability distribution is `[1, 0, 0]`
- arrays/vectors that look like this are called **one-hot**, which implies that only one of the values is "hot" (1), while the others are not (0)
---
- the code below depicts cross-entropy given a softmax output of `[0.7, 0.1, 0.2]` and a desired output of `[1, 0, 0]`:

In [3]:
import math


softmax_output = [0.7, 0.1, 0.2]  # example output from the output layer of the neural network
target_output = [1, 0, 0]  # ground truth

loss = -(math.log(softmax_output[0])*target_output[0] + 
         math.log(softmax_output[1])*target_output[1] + 
         math.log(softmax_output[2])*target_output[2])

loss

0.35667494393873245

- that's the complete cross-entropy calculation, but after a closer look, we can yield the same result from just the first line

In [4]:
loss = -math.log(softmax_output[0])
loss

0.35667494393873245

- any mention of **log** will always refer to the **natural logarithm** 
- the natural log represents the solution for the x-term in the equation $e^x = b$
- for example, $e^x = 5.2$ is solved by `np.log(5.2)`

In [5]:
b = 5.2
np.log(b)

1.6486586255873816

In [10]:
math.e ** 1.6486586255873816

5.199999999999999

### Modifying the Output

In [11]:
# probabilities for 3 samples
softmax_outputs = np.array([[0.7, 0.1, 0.2], # each vector is a class
                            [0.1, 0.5, 0.4], # each vector has 3 samples
                            [0.02, 0.9, 0.08]]) # this is a batch of 3

- in this example there are 3 classes; let’s say we’re trying to classify something as a “dog,” “cat,” or “human” 
- a dog is class 0 (index 0), a cat class 1 (index 1), and a human class 2 (index 2)
- let's assume the batch of three sample inputs to this neural network is mapped to target values of a dog, a cat, and another cat
 - so the targets are `[0, 1, 1]`

In [12]:
softmax_outputs = np.array([[0.7, 0.1, 0.2],
                            [0.1, 0.5, 0.4],
                            [0.02, 0.9, 0.08]])

class_targets = [0, 1, 1]

In [13]:
softmax_outputs = np.array([[0.7, 0.1, 0.2], # each vector is a distribution 
                            [0.1, 0.5, 0.4], # this batch has 3 distributions
                            [0.02, 0.9, 0.08]])

class_targets = [0, 1, 1]

for targ_idx, distribution in zip(class_targets, softmax_outputs):
    print(distribution[targ_idx])

0.7
0.5
0.9


- again, the `zip()` function let's us iterate over multiple iterables at the same time in Python

In [19]:
print(softmax_outputs[[0, 1, 2], class_targets]) # nicer output

[0.7 0.5 0.9]


- the `[0, 1, 2]` are the indices of each distribution in the batch
- we’re going to have as many indices as distributions in our entire batch, so we can use `range()`:

In [20]:
print(softmax_outputs[range(len(softmax_outputs)), class_targets])

[0.7 0.5 0.9]


- now we apply negative log to each value:

In [26]:
print(-np.log(softmax_outputs[range(len(softmax_outputs)), class_targets])) 

[0.35667494 0.69314718 0.10536052]


- finally, what we really want is the average loss per batch 
- NumPy has a method that computes this average on arrays

In [27]:
neg_log = -np.log(softmax_outputs[range(len(softmax_outputs)), class_targets])
average_loss = np.mean(neg_log)
average_loss # average loss of entire batch of 3 distributions (classes)

0.38506088005216804

- let’s convert our loss code into a class for convenience down the line:

In [30]:
# Cross-entropy loss
class Loss_CategoricalCrossentropy:

    def forward(self, y_pred, y_true):

        # Number of samples in a batch
        samples = len(y_pred)

        # Probabilities for target values
        y_pred = y_pred[range(samples), y_true]

        # Losses
        negative_log_likelihoods = -np.log(y_pred)

        # Overall loss
        data_loss = np.mean(negative_log_likelihoods)
        return data_loss

- `y_pred` = outputs | `y_true` = class targets
- this class performs all the error calculations that we derived throughout this chapter and can be used as an object
- for example, using the manually-created output and targets: 

In [34]:
loss_function = Loss_CategoricalCrossentropy()
loss = loss_function.forward(softmax_outputs, class_targets)
loss

0.38506088005216804

### Accuracy
- again, loss is a useful metric for optimizing a model
- **accuracy**, however, describes how often the largest confidence is the correct class in terms of a percent
- to find this accuracy, we will use the `argmax()` values from the `softmax_outputs` and then compare these to the targets:

In [46]:
import numpy as np

softmax_outputs = [[0.7, 0.2, 0.1],  # probabilities for 3 samples
                   [0.5, 0.1, 0.4],  # values swapped here compared to examples above to create lower accuracy
                   [0.02, 0.9, 0.08]]
targets = [0, 1, 1]  # target (ground truth) labels for 3 samples

predictions = np.argmax(softmax_outputs, axis=1)  # calculate values along first axis=1 (rows)
accuracy = np.mean(predictions==targets) # True evaluates to 1; False to 0

print('acc:', accuracy)

acc: 0.6666666666666666


- let's add this accuracy measurement to the end of our code

### Full Code

In [48]:
import numpy as np


np.random.seed(0)


def create_data(points, classes):
    X = np.zeros((points*classes, 2))
    y = np.zeros(points*classes, dtype='uint8')
    for class_number in range(classes):
        ix = range(points*class_number, points*(class_number+1))
        r = np.linspace(0.0, 1, points)  # radius
        t = np.linspace(class_number*4, (class_number+1)*4, points) + np.random.randn(points)*0.05
        X[ix] = np.c_[r*np.sin(t*2.5), r*np.cos(t*2.5)]
        y[ix] = class_number
    return X, y


# Dense layer
class Layer_Dense:

    # Layer initialization
    def __init__(self, inputs, neurons):
        # Initialize weights and biases
        self.weights = 0.01 * np.random.randn(inputs, neurons)
        self.biases = np.zeros((1, neurons))

    # Forward pass
    def forward(self, inputs):
        # Calculate output values from input ones, weights and biases
        self.output = np.dot(inputs, self.weights) + self.biases


# ReLU activation
class Activation_ReLU:

    # Forward pass
    def forward(self, inputs):
        self.output = np.maximum(0, inputs)


# Softmax activation
class Activation_Softmax:

    # Forward pass
    def forward(self, inputs):

        # get unnormalized probabilities
        exp_values = np.exp(inputs - np.max(inputs, axis=1, keepdims=True))
        # normalize them for each sample
        probabilities = exp_values / np.sum(exp_values, axis=1, keepdims=True)

        self.output = probabilities


# Cross-entropy loss
class Loss_CategoricalCrossentropy:

    def forward(self, y_pred, y_true):

        # Number of samples in a batch
        samples = len(y_pred)

        # Probabilities for target values
        y_pred = y_pred[range(samples), y_true]

        # Losses
        negative_log_likelihoods = -np.log(y_pred)

        # Average loss
        data_loss = np.sum(negative_log_likelihoods) / samples
        return data_loss

# Create dataset
X, y = create_data(100, 3)

# Create Dense layer with 2 input features and 3 output values
dense1 = Layer_Dense(2, 3)  # first dense layer, 2 inputs (each sample has 2 features), 3 outputs

# Create ReLU activation (to be used with Dense layer):
activation1 = Activation_ReLU()

# Create second Dense layer with 3 input features (as we take output of previous layer here) and 3 output values (output values)
dense2 = Layer_Dense(3, 3)  # second dense layer, 3 inputs, 3 outputs

# Create Softmax activation (to be used with Dense layer):
activation2 = Activation_Softmax()

# Create loss function
loss_function = Loss_CategoricalCrossentropy()

# Make a forward pass of our training data thru this layer
dense1.forward(X)

# Make a forward pass thru activation function - we take output of previous layer here
activation1.forward(dense1.output)

# Make a forward pass thru second Dense layer - it takes outputs of activation function of first layer as inputs
dense2.forward(activation1.output)

# Make a forward pass thru activation function - we take output of previous layer here
activation2.forward(dense2.output)

# Let's see output of few first samples:
print(activation2.output[:5])

# Calculate loss from output of activation2 (softmax activation)
loss = loss_function.forward(activation2.output, y)

# Let's print loss value
print('loss:', loss)

# Calculate accuracy from output of activation2 and targets
predictions = np.argmax(activation2.output, axis=1)  # calculate values along first axis
accuracy = np.mean(predictions==y)

# Print accuracy
print('accuracy:', accuracy) 

[[0.33333333 0.33333333 0.33333333]
 [0.3333332  0.33333321 0.33333358]
 [0.333333   0.33333302 0.33333397]
 [0.33333271 0.33333275 0.33333454]
 [0.33333247 0.33333252 0.33333501]]
loss: 1.0986095611088502
accuracy: 0.33
