# Introduction

Cross-Entropy Loss, also known as Log Loss, comes from information theory and is widely used in classification tasks, especially for neural networks. It measures the dissimilarity between two probability distributions: the true labels and the predicted probabilities from a model.

Cross-Entropy Loss evaluates the performance of a classification model by comparing the predicted probability distribution to the actual label distribution. It penalizes predictions that are far from the true label.

**Mathematically**, the formula for Cross-Entropy Loss is: (for binary classification)

$$\mathcal{L}(y, \hat{y}) = -(y \log(\hat{y}) + (1 - y) \log(1 - \hat{y}))$$

where:
- $y$ is the true label (0 or 1)
- $\hat{y}$ is the predicted probability of the positive class (between 0 and 1)

For multi-class classification, the formula is:

$$\mathcal{L}(y, \hat{y}) = -\sum_{i=1}^{C} y_i \log(\hat{y}_i)$$

where:
- $C$ is the number of classes

# How it works

Cross-Entropy Loss assigns a high penalty to predictions with high confidence but incorrect classification, and a lower penalty when the model predicts closer to the true label. The loss is minimized when the predicted probabilities match the true labels perfectly.

**Example:**

If the true label is $y = 1$ and the model predicts $\hat{y} = 0.9$, the Cross-Entropy Loss is:

$$\mathcal{L}(1, 0.9) = -(1 \log(0.9) + (1 - 1) \log(1 - 0.9)) = -\log(0.9)$$

The model performs well, so the loss is small.

# Pros and Cons

Pros:
- Works well for both binary and multi-class classification.
- Efficient for probabilistic models and outputs calibrated probabilities.
- Differentiable, making it ideal for gradient-based optimization.

Cons:
- Sensitive to noisy labels.
- Overconfidence in wrong predictions can cause large penalties.

# Use Cases

- Use in classification problems (binary or multi-class).
- Works well with models that output probability distributions, such as neural networks with softmax or sigmoid activation.
