# üìâ Loss Functions: MSE vs Binary Cross Entropy

This notebook demonstrates why **Binary Cross Entropy (BCE)** is preferred over **Mean Squared Error (MSE)** for classification tasks.

## Key Concepts
- **MSE**: Measures squared difference between prediction and target
- **BCE**: Measures the "surprise" of a prediction, heavily penalizing confident wrong predictions

In [None]:
import torch
import torch.nn as nn

## MSE Loss Example

MSE calculates: `(prediction - target)¬≤`

Here we predict 0.1 when the target is 1.0 (a wrong prediction).

In [None]:
criterion = nn.MSELoss()

y_pred = torch.tensor([0.1])
y_true = torch.tensor([1.0], dtype=torch.float32)

loss = criterion(y_pred, y_true)
print(f"MSE Loss: {loss.item(): .2f}")

## Binary Cross Entropy Loss

BCE formula: `-[y √ó log(p) + (1-y) √ó log(1-p)]`

### Example 1: Confident Wrong Prediction
Predicting 0.9 (90% confident it's class 1) when target is 0.

**Notice how BCE gives a much higher loss (2.30) compared to MSE for a similar wrong prediction!**

This is the key advantage of BCE: it heavily penalizes confident wrong predictions.

In [None]:
criterion = nn.BCELoss()
y_pred = torch.tensor([0.9])
y_true = torch.tensor([0], dtype=torch.float32)

loss = criterion(y_pred, y_true)
print(f"Binary Cross Entropy Loss: {loss.item(): .2f}")

### Example 2: Multiple Predictions

BCE averages the loss across all samples:

| Prediction | Target | Correct? |
|------------|--------|----------|
| 0.8 | 1 | ‚úÖ Yes |
| 0.2 | 0 | ‚úÖ Yes |
| 0.8 | 0 | ‚ùå No (confident wrong) |
| 0.9 | 1 | ‚úÖ Yes |

The third prediction (0.8 when target is 0) will contribute most to the loss.

In [None]:
criterion = nn.BCELoss()

y_pred = torch.tensor([0.8, 0.2, 0.8, 0.9])
y_true = torch.tensor([1, 0, 0, 1], dtype=torch.float32)

loss = criterion(y_pred, y_true)
print(f"Binary Cross Entropy Loss: {loss.item(): .2f}")

## Key Takeaways

1. **MSE** is good for regression but creates non-convex loss surfaces for classification
2. **BCE** creates smooth, convex loss surfaces ideal for binary classification
3. BCE heavily penalizes confident wrong predictions, helping the model learn faster
4. Always use BCE (or `BCEWithLogitsLoss`) for binary classification tasks