# Day 48 — "Maximum Likelihood Estimation (MLE): Why Loss Functions Look the Way They Do"

MLE reframes loss functions as negative log-likelihoods. Loss minimization is probability maximization in disguise.


In [1]:
# Ensure repo root is on sys.path for local imports
import sys
from pathlib import Path

repo_root = Path.cwd()
if not (repo_root / "days").exists():
    for parent in Path.cwd().resolve().parents:
        if (parent / "days").exists():
            repo_root = parent
            break

sys.path.insert(0, str(repo_root))
print(f"Using repo root: {repo_root}")


Using repo root: /media/abdul-aziz/sdb7/masters_research/math_course_dlcv


## 1. Core Intuition
MLE asks: *Which model parameters make the observed data most likely?*
Instead of punishing errors arbitrarily, we choose a probability model and maximize the likelihood.


## 2. Likelihood and Log-Likelihood
For a dataset \(\{(x_i, y_i)\}_{i=1}^N\):

\[ L(	heta) = \prod_{i=1}^N p(y_i \mid x_i; 	heta) \]

Take logs for numerical stability:

\[ \log L(	heta) = \sum_{i=1}^N \log p(y_i \mid x_i; 	heta) \]


## 3. From Maximization to Loss Minimization
Deep learning frameworks minimize the negative log-likelihood (NLL):

\[ \mathcal{L}(	heta) = -\sum_{i=1}^N \log p(y_i \mid x_i; 	heta) \]

Each standard loss corresponds to an assumed noise model.


## 4. Losses as Likelihoods
- **MSE** assumes Gaussian noise: \(y = f_	heta(x) + arepsilon, arepsilon \sim \mathcal{N}(0,\sigma^2)\).
- **Binary cross-entropy** assumes Bernoulli labels.
- **Softmax cross-entropy** assumes categorical labels.

Loss choice = modeling assumption, not a heuristic.


In [2]:
import numpy as np

y = 1
p = 0.8
log_likelihood = np.log(p)
loss = -log_likelihood
print("Log-likelihood:", log_likelihood)
print("Loss:", loss)


Log-likelihood: -0.2231435513142097
Loss: 0.2231435513142097


## 5. Code Demo (Reusable Module)
Run the module to see simple Bernoulli and Gaussian NLL values:

```bash
python -m days.day48.code.mle_demo
```


## 6. Visualization — NLL Curves
The Bernoulli NLL curves show how confident wrong predictions are penalized.
Run the visualization script to generate plots in `days/day48/outputs/`.


In [3]:
# from days.day48.code.visualizations import main
# main()


## 7. Key Takeaways
- MLE explains why loss functions look the way they do.
- Negative log-likelihood is the standard training objective.
- MSE assumes Gaussian noise; cross-entropy assumes Bernoulli/Categorical noise.
- Loss design encodes beliefs about how data is generated.
