# 🧠 What is a Loss Function?

> A **Loss Function** tells your model **how wrong** it is by comparing the **predicted value** with the **actual value**.

🔁 The goal is to **reduce the loss** to make the model more accurate during training.

---

# 🤖 MACHINE LEARNING LOSS FUNCTIONS

---

## 🟢 ML REGRESSION LOSS FUNCTIONS

Used when the output is a **continuous value** (e.g., price, age, temperature).

### 🔹 1. Mean Squared Error (MSE)
- 📌 Measures average of **squared** prediction errors.
- 🔍 Penalizes **large errors more** than small ones.

\[
\text{MSE} = \frac{1}{n} \sum (y - \hat{y})^2
\]

✅ Use When:
- You want high accuracy.
- Errors should be minimized heavily.
- Data is **clean** (no extreme outliers).

---

### 🔹 2. Mean Absolute Error (MAE)
- 📌 Measures average of **absolute differences**.
- 🔍 All errors are treated **equally**, no squaring.

\[
\text{MAE} = \frac{1}{n} \sum |y - \hat{y}|
\]

✅ Use When:
- Your data has **outliers**.
- You want a simple and fair error measurement.

---

### 🔹 3. Huber Loss
- 📌 Combines **MSE and MAE**.
- 🔍 MSE for small errors, MAE for large errors.

\[
\text{Huber} =
\begin{cases}
\frac{1}{2}(y - \hat{y})^2 & \text{if } |y - \hat{y}| \leq \delta \\
\delta \cdot (|y - \hat{y}| - \frac{1}{2} \delta) & \text{otherwise}
\end{cases}
\]

✅ Use When:
- You want the **benefits of both MSE and MAE**.
- Data is **noisy** or contains some **outliers**.

---

## 🔵 ML CLASSIFICATION LOSS FUNCTIONS

Used when the output is a **class label** (e.g., spam/not spam, cat/dog).

### 🔹 1. Log Loss (Binary Cross Entropy)
- 📌 Measures error between actual class (0 or 1) and predicted probability.
- 🔍 Used in **logistic regression**.

\[
\text{Log Loss} = - \left[ y \log(p) + (1 - y) \log(1 - p) \right]
\]

✅ Use When:
- You are doing **binary classification** (2 classes).
- You want the model to output **probabilities** (0 to 1).

---

### 🔹 2. Hinge Loss
- 📌 Used in **Support Vector Machines (SVM)**.
- 🔍 Focuses on correct classification **with a margin**.

\[
\text{Hinge} = \max(0, 1 - y \cdot \hat{y})
\]

✅ Use When:
- You're using **SVM** for classification.
- You want a model that learns **confidence margins**.

---

# 🤖 DEEP LEARNING LOSS FUNCTIONS

---

## 🟢 DL REGRESSION LOSS FUNCTIONS

Used when neural networks predict **continuous values**.

### 🔹 1. Mean Squared Error (MSE)
- Same as ML.
- 🔍 Works well with **backpropagation**.

### ✅ Use When:
- You want smooth training.
- You’re predicting numbers like prices, ages, or scores.

### 🧱 Architecture:

- Fully Connected Neural Network (FCNN)

### ⚡ Activation Function:

- Hidden Layers: **ReLU**

- Output Layer: **Linear**

---

### 🔹 2. Mean Absolute Error (MAE)
- Same as ML.
- 🔍 More **robust** when training with **noisy data**.

### ✅ Use When:
- You want stable updates.
- The dataset contains **outliers**.

### 🧱 Architecture:

- Deep Feedforward Network or FCNN

### ⚡ Activation Function:

- Hidden Layers: ReLU

- Output Layer: Linear

---

### 🔹 3. Huber Loss
- Same as ML.
- 🔍 Combines smoothness of MSE and robustness of MAE.

✅ Use When:
- You’re using **deep regression models**.
- Data is **real-world** and may have outliers.

### 🧱 Architecture:

- Deep Regression Network / FCNN

### ⚡ Activation Function:

- Hidden Layers: ReLU

- Output Layer: Linear

  
---

## 🔵 DL CLASSIFICATION LOSS FUNCTIONS

Used when output is a **class label** (binary or multi-class).

### 🔹 1. Binary Cross Entropy
- 📌 Same as Log Loss.
- 🔍 Works with **Sigmoid activation** in the final layer.

\[
\text{BCE} = - [y \log(p) + (1 - y) \log(1 - p)]
\]

### ✅ Use When:
- You’re predicting **2 classes** (e.g., positive/negative).
- Final layer = **Sigmoid**.

### 🧱 Architecture:

- Binary Classification Neural Network

### ⚡ Activation Function:

- Hidden Layers: ReLU

- Output Layer: Sigmoid

---

### 🔹 2. Categorical Cross Entropy
- 📌 Used in **multi-class** classification.
- 🔍 Works with **Softmax** activation.
-     Labels must be **one-hot encoded**.

\[
\text{CCE} = - \sum y_i \log(p_i)
\]

### ✅ Use When:
- You have **3+ classes**.
- Labels are one-hot encoded.
- Final layer = **Softmax**.

### 🧱 Architecture:

- CNN / FCNN / RNN for classification

### ⚡ Activation Function:

- Hidden Layers: ReLU

- Output Layer: Softmax

---

### 🔹 3. Sparse Categorical Cross Entropy
- 📌 Same as CCE, but labels are **integers**, not one-hot.
- 🔍 Saves **memory** and works faster with large datasets.

### ✅ Use When:
- Labels are integers like 0, 1, 2.
- You have **many classes**.
- Final layer = **Softmax**.

### 🧱 Architecture:

- Deep Classification Networks (CNN, RNN, Transformer)

### ⚡ Activation Function:

- Hidden Layers: ReLU

- Output Layer: Softmax

---

# ✅ Final Summary

| Task Type                 | ML Loss Functions             | DL Loss Functions                              |
|---------------------------|-------------------------------|------------------------------------------------|
| **Regression**            | MSE, MAE, Huber               | MSE, MAE, Huber                                |
| **Binary Classification** | Log Loss, Hinge Loss          | Binary Cross Entropy                           |
| **Multi-Class**           | –                             | Categorical Cross Entropy, Sparse Cross Entropy |

