In [3]:
import math

In [4]:
softmax_output = [0.7, 0.1 , 0.2]
target_output = [1 , 0 , 0]

In [5]:
loss = - (math.log(softmax_output[0])*target_output[0] +
          math.log(softmax_output[1])*target_output[1] +
          math.log(softmax_output[2])*target_output[2])
print(loss)

0.35667494393873245


In [6]:
loss = -math.log(softmax_output[0])
print(loss)

0.35667494393873245


In [7]:
print(-math.log(0.9))
print(-math.log(0.75))
print(-math.log(0.5))
print(-math.log(0.2))

0.10536051565782628
0.2876820724517809
0.6931471805599453
1.6094379124341003


In [8]:
import numpy as np

In [11]:
softmax_outputs = np.array([[0.7 , 0.1 , 0.2],
                            [0.1 , 0.5 , 0.4],
                            [0.02 , 0.9 , 0.08]])
class_targets = [0 , 1 , 1]

In [14]:
print(softmax_outputs[[0 , 1 , 2] , [0,0,0]])
print(softmax_outputs[[0 , 1 , 2] , class_targets])
print(softmax_outputs[range(len(softmax_outputs)),  class_targets])
print(-np.log(softmax_outputs[range(len(softmax_outputs)),  class_targets]))

[0.7  0.1  0.02]
[0.7 0.5 0.9]
[0.7 0.5 0.9]
[0.35667494 0.69314718 0.10536052]


### Cross-Entropy Loss Function

The **cross-entropy loss** (also known as log loss) is commonly used in classification tasks to measure the difference between the true distribution of labels and the predicted distribution (output from softmax or sigmoid). It quantifies how well the model's predicted probabilities align with the actual labels.

For a single prediction, the formula for cross-entropy loss is:

$$
\text{Cross-Entropy Loss} = -\sum_{i=1}^{n} y_i \log(p_i)
$$

Where:
- $ y_i $ is the true label (1 if the class is correct, 0 otherwise).
- $ p_i $ is the predicted probability for class $i$ (output from the softmax or sigmoid).
- $ n $ is the total number of classes.

In **binary classification**, the cross-entropy loss simplifies to:

$$
\text{Loss} = -[y \log(p) + (1 - y) \log(1 - p)]
$$

Here:
- $ y $ is the true label (either 0 or 1).
- $ p $ is the predicted probability of the positive class.

#### Why Cross-Entropy is Effective:
- **Low loss for correct predictions**: If the predicted probability for the correct class is close to 1, the log of that value is close to 0, resulting in a small loss.
- **High loss for incorrect predictions**: If the model assigns a low probability to the correct class (close to 0), the log of that small value will be very large in magnitude, resulting in a high loss.

### The Problem with Confidence Value 0

If the predicted probability $ p $ for the correct class is **exactly 0**, the cross-entropy loss becomes problematic. This is because:

$$
\log(0) = -\infty
$$

Therefore, if a model predicts a probability of **0** for the true class, the loss function will return **infinity**. This can cause major issues:
- **Numerical instability**: Loss values approaching infinity can lead to unstable or untrainable models.
- **No gradient**: When the loss becomes infinite, the model has difficulty learning because the gradients needed to adjust the model's weights explode or become undefined.

### How to Solve the Issue: Using **Smoothing**

To avoid the issue of taking the log of 0, we use a technique called **label smoothing** or apply a small constant $ \epsilon $ to the predicted probabilities. This prevents any probability from being exactly 0 or 1.

For example, instead of allowing probabilities to be exactly 0 or 1, we clip them to a very small positive value $ \epsilon $ such as $ 10^{-15} $:

$$
p_i = \max(\epsilon, \min(1 - \epsilon, p_i))
$$

This ensures that:

- The log function never receives a 0 as input.
- The loss remains finite and the model can continue learning.

### Modified Cross-Entropy Formula with Smoothing:
After clipping the probabilities to avoid $ \log(0) $, the cross-entropy formula becomes:

$$
\text{Cross-Entropy Loss} = -\sum_{i=1}^{n} y_i \log(\max(\epsilon, p_i))
$$

Where $ \epsilon $ is a small positive constant (e.g., $ 10^{-15} $) to prevent any log(0) operations.

By applying this smoothing or clipping, we can avoid the infinite loss problem and keep the training stable.
