
# 📌 Core Loss Functions in ML

---

### **1. Mean Squared Error (MSE)**

* **Definition:** Average of squared differences between actual and predicted values.
* **Formula:**

$$
MSE = \frac{1}{n}\sum_{i=1}^n (y_i - \hat{y}_i)^2
$$

* **Intuition:** Large errors penalized more (quadratic effect).
* **Use Case:** Regression (Linear Regression, baseline models).

👉 **Python Example:**

```python
import numpy as np
y_true = np.array([3, -0.5, 2, 7])
y_pred = np.array([2.5, 0.0, 2, 8])

mse = np.mean((y_true - y_pred)**2)
print("MSE:", mse)
```

✅ Output: `MSE = 0.375`

---

### **2. Root Mean Squared Error (RMSE)**

* **Definition:** Square root of MSE, giving error in same units as target.
* **Formula:**

$$
RMSE = \sqrt{\frac{1}{n}\sum_{i=1}^n (y_i - \hat{y}_i)^2}
$$

* **Intuition:** Easier to interpret (same scale as y).
* **Use Case:** Regression evaluation metrics (house price prediction).

👉 **Python Example:**

```python
rmse = np.sqrt(mse)
print("RMSE:", rmse)
```

✅ Output: `RMSE = 0.612`

---

### **3. Mean Absolute Error (MAE)**

* **Definition:** Average of absolute differences between actual and predicted values.
* **Formula:**

$$
MAE = \frac{1}{n}\sum_{i=1}^n |y_i - \hat{y}_i|
$$

* **Intuition:** Linear penalty → robust to outliers.
* **Use Case:** Regression when robustness is needed (finance, demand forecasting).

👉 **Python Example:**

```python
mae = np.mean(np.abs(y_true - y_pred))
print("MAE:", mae)
```

✅ Output: `MAE = 0.5`

---

### **4. Huber Loss**

* **Definition:** Hybrid of MSE & MAE. Quadratic for small errors, linear for large ones.
* **Formula:**

$$
L_\delta(y,\hat{y}) = 
\begin{cases} 
\frac{1}{2}(y-\hat{y})^2 & \text{if } |y-\hat{y}| \leq \delta \\
\delta |y-\hat{y}| - \frac{1}{2}\delta^2 & \text{otherwise}
\end{cases}
$$

* **Intuition:** Robust to outliers but smoother than MAE.
* **Use Case:** Robust regression, computer vision.

👉 **Python Example:**

```python
delta = 1.0
errors = y_true - y_pred
huber_loss = np.where(np.abs(errors) <= delta,
                      0.5 * errors**2,
                      delta * np.abs(errors) - 0.5 * delta**2).mean()
print("Huber Loss:", huber_loss)
```

---

### **5. Binary Cross-Entropy (Log Loss)**

* **Definition:** Loss for binary classification.
* **Formula:**

$$
L = -\frac{1}{n}\sum_{i=1}^n \big[y_i \log(\hat{y}_i) + (1-y_i)\log(1-\hat{y}_i)\big]
$$

* **Intuition:** Penalizes confident wrong predictions.
* **Use Case:** Logistic Regression, Neural Networks (binary classification).

👉 **Python Example:**

```python
from sklearn.metrics import log_loss
y_true = [1, 0, 1, 1]
y_prob = [0.9, 0.1, 0.8, 0.6]

print("Binary Cross-Entropy:", log_loss(y_true, y_prob))
```

---

### **6. Categorical Cross-Entropy**

* **Definition:** Extension of log loss for multi-class classification.
* **Formula:**

$$
L = -\sum_{i=1}^n \sum_{c=1}^C y_{i,c}\log(\hat{y}_{i,c})
$$

* **Intuition:** Compares one-hot encoded true labels vs predicted probabilities.
* **Use Case:** Softmax classifiers, deep learning models.

👉 **Python Example:**

```python
y_true = [[1,0,0], [0,1,0], [0,0,1]]
y_pred = [[0.7,0.2,0.1], [0.1,0.8,0.1], [0.2,0.2,0.6]]

print("Categorical Cross-Entropy:", log_loss(y_true, y_pred))
```

---

### **7. Sparse Categorical Cross-Entropy**

* **Definition:** Variant of categorical cross-entropy when labels are integers (not one-hot).
* **Intuition:** Saves memory for large class counts (e.g., NLP vocabularies).
* **Use Case:** Multi-class classification with large vocab (text, NLP).

👉 **Python Example:**

```python
# true labels as class indices
y_true = [0, 1, 2]
y_pred = [[0.7,0.2,0.1], [0.1,0.8,0.1], [0.2,0.2,0.6]]

print("Sparse Categorical Cross-Entropy:", log_loss(y_true, y_pred))
```

---

### **8. Hinge Loss (SVM)**

* **Definition:** Loss used in Support Vector Machines (SVMs).
* **Formula:**

$$
L = \max(0, 1 - y_i \hat{y}_i)
$$

* **Intuition:** Encourages a margin of at least 1 between classes.
* **Use Case:** SVM classification, max-margin classifiers.

👉 **Python Example:**

```python
y_true = np.array([1, -1, 1])
y_pred = np.array([0.8, -0.5, 0.3])  # decision function scores

hinge_loss = np.mean(np.maximum(0, 1 - y_true * y_pred))
print("Hinge Loss:", hinge_loss)
```

---

# ⚡ Interview Rapid Fire

* **Which loss in regression?** → MSE, MAE, Huber.
* **Which loss in binary classification?** → Binary Cross-Entropy.
* **Which loss in multi-class classification?** → Categorical / Sparse Categorical Cross-Entropy.
* **Which loss in SVMs?** → Hinge Loss.
* **Which is robust to outliers?** → MAE, Huber.

