# Introduction

Focal Loss was introduced to address the class imbalance problem in tasks like object detection. In highly imbalanced datasets, standard loss functions (e.g., cross-entropy) can become biased toward the majority class. Focal loss mitigates this by focusing more on hard-to-classify examples.

Focal Loss extends the cross-entropy loss by adding a **modulating factor** to down-weight easy examples and focus on hard misclassified cases.

**Mathematically**, the focal loss is defined as: (binary classification)

$$\mathrm{L}(y, \hat{y}) = -\alpha (1 - \hat{y})^\gamma \log(\hat{y}) - (1 - \alpha) \hat{y}^\gamma \log(1 - \hat{y})$$

where:
- $y$ is the ground truth label (0 or 1)
- $\hat{y}$ is the predicted probability
- $\alpha$ is the balancing factor (default: 0.25)
- $\gamma$ is the focusing parameter (default: 2)

# How it works

The modulating factor $(1 - \hat{y})^\gamma$ down-weights the loss for well-classified examples. For misclassified or difficult examples, this factor approaches $1$, meaning their loss is amplified. This allows the model to focus more on hard-to-classify examples.

**Example**

If the true label is $y = 1$ and the predicted probability is $\hat{y} = 0.1$, the moduling factor is $(1 - 0.1)^\gamma = 0.9^\gamma$. For a large value $\gamma$,The loss will focus heavily on this example because it’s hard to classify.

# Pros and Cons

Pros:
- Excellent for class-imbalanced datasets where most examples are easy to classify, and only a few are hard.
- Helps improve model performance in tasks like object detection and rare event classification.

Cons:
- Requires careful tuning of the focusing parameter $\gamma$ and balancing factor $\alpha$.
- Can complicate training by focusing too much on hard examples, possibly leading to overfitting.