## Categorical Cross-Entropy Loss

**Categorical cross-entropy** is a loss function commonly used in machine learning and deep learning algorithms for <u>multi-class classification problems</u>. 
It measures the dissimilarity between the predicted probability distribution and the true probability distribution of the classes.  

Categorical cross-entropy is often employed when the classes are mutually exclusive, meaning *each input belongs to only one class*.

The formula for categorical cross-entropy is as follows:

$$
CategoricalCrossEntropy(\mathbf{y, p}) = -\sum_{i=1}^{n} y_i \cdot \log(p_i)
$$

In this formula, $\mathbf{y} = [y_1, y_2, \ldots, y_n]$ represents the true probability distribution of the classes, and $\mathbf{p} = [p_1, p_2, \ldots, p_n]$ represents the predicted probability distribution, which are usually the **result of output layer with softmax function**. The summation symbol $\sum$ denotes the sum over all classes, and $\log$ represents the natural logarithm. The negative sign at the beginning indicates that it is a minimization problem, aiming to minimize the dissimilarity between the predicted distribution $\mathbf{p}$ and the true distribution $\mathbf{y}$.

### Other aliases of categorical cross-entropy

Categorical cross-entropy is commonly referred to by several other names in the literature and different fields. Some of the alternative names for categorical cross-entropy include:

- Log loss
- Multiclass cross-entropy
- Softmax cross-entropy
- Negative log-likelihood loss
- Cross-entropy loss





### Application

When **training a neural network**, the goal is to <u>minimize the categorical cross-entropy loss</u>. This is typically done by using gradient-based optimization algorithms, such as stochastic gradient descent (**SGD**), gradient descent (**GD**), adaptive moment estimation (**ADAM**)  or **Sophia** (which is a new optimization algorithm, 2023) to update the network's parameters iteratively.



### Additional insights on Categorical Cross-Entropy Loss:

1. **One-Hot Encoding**: Categorical cross-entropy loss requires the true class labels to be one-hot encoded, where each class is represented as a binary vector with a single '1' and '0's elsewhere. 


2. **Overfitting Detection**: Increasing categorical cross-entropy loss on the validation set while the training loss continues to decrease may indicate overfitting, warranting the need for regularization or early stopping.

3. **Normalization**: Predicted probabilities should be normalized using softmax.

4. **Label Smoothing**: To improve model robustness, label smoothing is sometimes applied by replacing the '1' in the one-hot encoded labels with a value slightly less than 1 and distributing the remaining value across other classes.

5. **Class Imbalance Handling**: In scenarios where the class distribution is imbalanced, meaning some classes have significantly more or fewer samples than others, it is beneficial to consider techniques such as class weighting or data augmentation to address the impact of imbalanced classes on the categorical cross-entropy loss.

6. **Early Stopping**: Categorical cross-entropy loss can be used as a criterion for early stopping during training. If the loss on a validation set starts to increase consistently after a certain number of epochs, training can be halted to prevent overfitting and select the best-performing model.

7. **Information Gain**: Categorical cross-entropy loss can be interpreted as the measure of information gain or reduction in uncertainty about the true class given the predicted probabilities. Minimizing the loss encourages the model to maximize the information gain and improve its predictive power.

These insights provide a solid foundation for utilizing Categorical Cross-Entropy loss effectively in various machine learning tasks, particularly in multi-class classification problems.

**Below we will compute Categorical Cross-Entropy Loss via 2 different methods: PyTroch and Numpy.

### Import libraries

In [1]:
import numpy as np
import torch
import torch.nn.functional as F

### Compute Categorical Cross-Entropy with PyTorch

In [2]:
# Define the ground truth labels and predicted logits
true_labels_torch = torch.tensor([1, 0, 2])  # Ground truth labels (indices)

predicted_logits_torch = torch.tensor([[0.4, 1.5, 0.8],
                                       [0.3, 0.9, 0.2],
                                       [0.6, 0.1, 1.5]])  # Predicted logits

# Apply softmax to convert logits into probabilities
predicted_probs_torch = F.softmax(predicted_logits_torch, dim=1)

# Compute the categorical cross-entropy loss
loss = F.cross_entropy(predicted_probs_torch, true_labels_torch)

print('Categorical Cross-Entropy Loss with PyTorch:', loss.item())

Categorical Cross-Entropy Loss with PyTorch: 0.9714786410331726


### Compute Categorical Cross-Entropy with Numpy

##### Custom functions for softmax and Categorical Cross-Entropy

In [3]:
def softmax(x):
    """
    Compute the softmax function for an array of logits.
    
    Args:
        x (numpy.ndarray): Input array of logits.
        
    Returns:
        numpy.ndarray: Array of softmax probabilities.
    """
    e_x = np.exp(x - np.max(x, axis=1, keepdims=True))
    return e_x / np.sum(e_x, axis=1, keepdims=True)

In [4]:
def categorical_cross_entropy(y_true, y_pred):
    """
    Compute categorical cross-entropy loss between y_true and y_pred.
    
    Args:
        y_true (numpy.ndarray): True labels in index format. Shape (batch_size,).
        y_pred (numpy.ndarray): Predicted probabilities for each class. Shape (batch_size, num_classes).
    
    Returns:
        float: Categorical cross-entropy loss.
    """
    epsilon = 1e-10  # small value to avoid division by zero
    
    # Convert true labels to one-hot encoded format
    num_classes = y_pred.shape[1]
    y_true_one_hot = np.eye(num_classes)[y_true]
    
    # Clip predicted values to prevent numerical instability
    y_pred = np.clip(y_pred, epsilon, 1.0 - epsilon)
    
    # Compute cross-entropy loss
    loss = -np.sum(y_true_one_hot * np.log(y_pred + epsilon)) / y_true.shape[0]
    
    return loss

In [5]:
# Define the ground truth labels and predicted logits
true_labels_numpy = np.array([1, 0, 2])  # Ground truth labels (indices)

predicted_logits_numpy = np.array([[0.4, 1.5, 0.8],
                                   [0.3, 0.9, 0.2],
                                   [0.6, 0.1, 1.5]])  # Predicted logits

# Apply softmax to convert logits into probabilities
predicted_probs_numpy = softmax(predicted_logits_numpy)

# Compute the categorical cross-entropy loss
loss_numpy = categorical_cross_entropy(true_labels_numpy, predicted_probs_numpy)

print('Categorical Cross-Entropy Loss with Numpy:', loss_numpy)

Categorical Cross-Entropy Loss with Numpy: 0.8074344512710289
