# Understanding Cross-Entropy Loss and Multi-Label Loss

Cross-Entropy Loss and Multi-Label Loss are fundamental concepts in machine learning and deep learning for classification tasks. These loss functions are crucial for training models to make accurate predictions. Here, we will explore both, explaining their purposes, differences, and how they are mathematically formulated.

## Cross-Entropy Loss

Cross-Entropy Loss is primarily used in multi-class classification problems. It measures how well the predicted probability distribution over all classes matches the actual distribution. A higher divergence between the predicted and actual distributions results in a higher loss, making it an effective measure for classification models.

The Cross-Entropy Loss for a single observation and `C` classes is given by:

$$
L = -\sum_{c=1}^{C} y_{o,c} \log(p_{o,c})
$$

where:
- `L` is the loss for one observation,
- `C` is the number of classes,
- `y_{o,c}` is a binary indicator (0 or 1) if class `c` is the correct classification for observation `o`,
- `p_{o,c}` is the predicted probability of observation `o` being of class `c`.

## Multi-Label Loss

For scenarios where an observation can be associated with multiple classes simultaneously, Multi-Label Loss is used. It differs from Cross-Entropy Loss, which assumes each observation is associated with a single class.

A common approach to Multi-Label Loss is using Binary Cross-Entropy Loss, calculated for each class as an independent binary classification. The formula for `N` samples and `C` classes is:

$$
L = -\frac{1}{N} \sum_{n=1}^{N} \sum_{c=1}^{C} [y_{n,c} \log(\sigma(x_{n,c})) + (1 - y_{n,c}) \log(1 - \sigma(x_{n,c}))]
$$

where:
- `N` is the number of samples,
- `C` is the number of classes,
- `y_{n,c}` is a binary indicator if class `c` is the correct classification for sample `n`,
- `x_{n,c}` is the raw output of the model for class `c` for sample `n`,
- `\sigma` denotes the sigmoid function, which converts the raw output into a probability.

Understanding and applying these loss functions correctly is vital for improving the performance of classification models, enabling them to make accurate predictions across a variety of tasks.



In [2]:
import torch
import torch.nn as nn

# Simulated logits from a neural network (for 2 samples and 3 labels)
logits = torch.tensor([[0.1, -0.2, 0.0], [0.4, 0.1, -0.3]])

# Corresponding ground truth labels (1 indicates the label is present, 0 indicates the label is absent)
targets = torch.tensor([[1, 0, 1], [0, 1, 0]], dtype=torch.float)

In [3]:
# Define the MultiLabelSoftMarginLoss
loss_fn = nn.MultiLabelSoftMarginLoss()

# Compute the loss
loss = loss_fn(logits, targets)
print(f"MultiLabelSoftMarginLoss: {loss.item()}")

MultiLabelSoftMarginLoss: 0.6745749711990356


In [11]:
torch.sigmoid(torch.tensor([50, 0.0006, 2.3]))

tensor([1.0000, 0.5002, 0.9089])

In [12]:
torch.sigmoid(torch.tensor([0.92, 0.35, 0.85]))

tensor([0.7150, 0.5866, 0.7006])

In [9]:
# Manually apply sigmoid to convert logits to probabilities
probabilities = torch.sigmoid(logits)
print(probabilities)
# Compute binary cross-entropy loss
bce_loss_fn = nn.BCELoss(reduction='none')
bce_loss = bce_loss_fn(probabilities, targets)
print(bce_loss)
print(f"Binary Cross-Entropy Loss: {bce_loss.mean()}")


tensor([[0.5250, 0.4502, 0.5000],
        [0.5987, 0.5250, 0.4256]])
tensor([[0.6444, 0.5981, 0.6931],
        [0.9130, 0.6444, 0.5544]])
Binary Cross-Entropy Loss: 0.6745750308036804


In [8]:
bce_loss.mean()

tensor(0.6746)