---
#### **Loss Function**
---
- The ***loss function***, also the ***cost function***, is the algorithm that quantifies how wrong a model is.
- Loss is the measure of this metric. Since loss is the model’s error, we ideally want it to be 0.
- Mean squared error or Squared error is used in regression but in classification we will use ***crossentropy***
- ***Categorical cross-entropy*** is explicitly used to compare a “ground-truth” probability (y or “targets”) and some predicted distribution (y-hat or “predictions”)
- It is also one of the most commonly used loss functions with a softmax activation on the output layer.

$$L_{i} = -\sum_{j}y_{i,j}\log(\hat{y}_{i,j})\tag{1}$$

- Where **$L_{i}$** denotes sample loss value, **$i$** is the i-th sample in the set, $j$ is the label/output index, $y$ denotes the target values, and $\hat{y}$ denotes the predicted values.

$$L_{i} = -\log(\hat{y}_{i,k})\tag{2}$$
- Where **$L_{i}$** denotes sample loss value, **$i$** is the i-th sample in a set, $k$ is the index of the target label (ground-true label), $y$ denotes the target values and $\hat{y}$ denotes the predicted values.
- **NB**: targets are one hot encoded values & They can also be sparse, which means that the numbers they contain are the correct class numbers

---
##### **Categorical Cross-entropy in single training**
---

In [1]:
import math
# An example output from the output layer of the neural network
softmax_output = [0.7, 0.1, 0.2]
# Ground truth
target_output = [1, 0, 0]
loss = -(math.log(softmax_output[0])*target_output[0]+
math.log(softmax_output[1])*target_output[1] +
math.log(softmax_output[2])*target_output[2])
print(loss)

0.35667494393873245


---
##### **Categorical Cross-entropy in multiple training samples**
---

In [2]:
softmax_outputs = [[0.7, 0.1, 0.2], # softmax outputs of three samples 
                   [0.1, 0.5, 0.4],
                   [0.02, 0.9, 0.08]]
class_targets = [0, 1, 1] # target class for cat=1, dog=0 and human=2

In [3]:
for idx, soft in  zip(class_targets, softmax_outputs):
    print(soft[idx])

0.7
0.5
0.9


---
- Class target at 0 indicates the softmax output with greater indices of 0.7
- Class target at both of 1 indicates the softmax output with greater indices of 0.5 and 0.9 in both cases
---

In [6]:
import numpy as np

softmax_outputs = np.array([[0.7, 0.1, 0.2], # softmax outputs of three samples 
                   [0.1, 0.5, 0.4],
                   [0.02, 0.9, 0.08]])
print(softmax_outputs[[0,1,1], class_targets])

[0.7 0.5 0.5]


In [7]:
len(softmax_outputs)

3

In [11]:
print(softmax_outputs[range(len(softmax_outputs)),class_targets])

[0.7 0.5 0.9]


In [12]:
print(-np.log(softmax_outputs[range(len(softmax_outputs)),class_targets])) # entire loss for training samples

[0.35667494 0.69314718 0.10536052]


In [14]:
# to find the average loss per batch
losses = -np.log(softmax_outputs[range(len(softmax_outputs)), class_targets])
average_loss = np.mean(losses)
print(average_loss)

0.38506088005216804


In [15]:
# what if the target output is spiral
class_targets  = np.array([[ 1,  0, 0],
                            [0, 1, 0],
                            [0, 1, 0]])

In [17]:
if len(class_targets.shape) == 1:
    confidences = softmax_outputs[range(len(softmax_outputs)), class_targets]
elif len(class_targets.shape) == 2:
    confidences = np.sum(softmax_outputs*class_targets, axis = 1)

In [18]:
lg = -np.log(confidences)
print(np.mean(lg))

0.38506088005216804


---
- The softmax consists of numbers in the range from 0 to 1 - a list of confidences.
- It is possible that the model will have full confidence for one label making all the remaining confidences zero. 
- Similarly, it is also possible that the model will assign full confidence to a value that wasn’t the target.
---
- What if the ***np.log(0)***?

In [19]:
class Loss:
    def calculate(self, output, y):
        sample_loss = self.forward(output, y)
        data_loss = np.mean(sample_loss)
        return data_loss

In [22]:
class CrossEntropy(Loss):
    def forward(self, y_pred, y_true):
        #calculate sample
        sample = len(y_pred)
        #clip the y to prevent division by 0
        y_pred_clip = np.clip(y_pred, 1e-7, 1-1e-7)
        if len(y_pred_clip.shape) == 1:
            confidence = y_pred_clip[range(sample), y_true]
        elif len(y_pred_clip.shape) == 2:
            confidence = np.sum(y_true*y_pred_clip, axis = 1)
        negLL = -np.log(confidence)
        return negLL

In [24]:
loss = CrossEntropy()
los = loss.calculate(softmax_outputs, class_targets)
print(los)

0.38506088005216804
