## Machine Learning 14: Lose and Cost function

In [1]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split

In [2]:
df =pd.read_csv("salary data.csv")
df = df.drop(columns = ['Date'])
df.columns = df.columns.str.strip()
df.head(3)

Unnamed: 0,Starting Salary,Ending Salary
0,16800,16500
1,18800,17000
2,17800,17500


In [3]:
x = df.drop("Ending Salary", axis=1)
y = df.drop("Starting Salary", axis=1)

## Liniear Regression

In [4]:
from sklearn.linear_model import LinearRegression

In [5]:
reg = LinearRegression()
reg.fit(x, y)

In [6]:
df["Predicted_y"] = reg.predict(x)
df.head()

Unnamed: 0,Starting Salary,Ending Salary,Predicted_y
0,16800,16500,16201.143025
1,18800,17000,17087.426729
2,17800,17500,16644.284877
3,16800,16500,16201.143025
4,15800,16500,15758.001172


## lose and cost function

### 🔹 **Loss Function**

* A **loss function** measures the error **for a single training example**.
* It quantifies how far the model's prediction is from the actual target.
* Common examples:

  * **Mean Squared Error (MSE)** (for regression):

    $$
    \text{Loss} = (y_{\text{true}} - y_{\text{pred}})^2
    $$
  * **Binary Cross Entropy** (for binary classification):

    $$
    \text{Loss} = -[y \log(p) + (1 - y) \log(1 - p)]
    $$

---

### 🔹 **Cost Function**

* A **cost function** is the **average loss** over the entire training dataset.
* It's the function that an optimization algorithm (like gradient descent) minimizes.
* If we have `n` samples:

  $$
  \text{Cost} = \frac{1}{n} \sum_{i=1}^{n} \text{Loss}(y_i, \hat{y}_i)
  $$

---

### ✅ Summary

| Term              | Definition                               | Applies To      |
| ----------------- | ---------------------------------------- | --------------- |
| **Loss Function** | Error for a **single** data point        | One prediction  |
| **Cost Function** | **Average** loss over **entire dataset** | All predictions |

You can think of it like this:

> **Loss is individual**, **Cost is collective**.

In [7]:
df['Lose'] = df ["Ending Salary"] - df ['Predicted_y']

In [8]:
df.head()

Unnamed: 0,Starting Salary,Ending Salary,Predicted_y,Lose
0,16800,16500,16201.143025,298.856975
1,18800,17000,17087.426729,-87.426729
2,17800,17500,16644.284877,855.715123
3,16800,16500,16201.143025,298.856975
4,15800,16500,15758.001172,741.998828


## MAE and MSE

### 📏 **MAE (Mean Absolute Error)**

* **Formula:**

  $$
  \text{MAE} = \frac{1}{n} \sum_{i=1}^{n} \left| y_i - \hat{y}_i \right|
  $$
* **Meaning:**
  Average of the **absolute** differences between actual and predicted values.
* **Use when:**
  You want a metric that treats **all errors equally**.
* **Pros:**

  * Easy to understand.
  * Less sensitive to outliers.
* **Cons:**

  * Not as smooth for optimization in ML models.

---

### 🔁 **MSE (Mean Squared Error)**

* **Formula:**

  $$
  \text{MSE} = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2
  $$
* **Meaning:**
  Average of the **squared** differences between actual and predicted values.
* **Use when:**
  You want to **penalize larger errors more heavily**.
* **Pros:**

  * Smooth for gradient-based optimization (used in deep learning).
* **Cons:**

  * More sensitive to outliers.
  * Units are squared (harder to interpret).

---

If you want a **compromise**, use **RMSE** (Root Mean Squared Error):

$$
\text{RMSE} = \sqrt{\text{MSE}}
$$



In [9]:
from sklearn.metrics import mean_squared_error, mean_absolute_error

In [10]:
MAE = mean_absolute_error( df ["Ending Salary"] , df ['Predicted_y'])
MAE

831.6187964048459

In [11]:
MSE =  mean_squared_error( df ["Ending Salary"] , df ['Predicted_y'])
MSE

1322148.5443532642

In [12]:
reg.score(x ,y)

0.06575695379130464

## R2 Squared Value/Accuatacy for Regression

In regression, **R² (R-squared)** — also called the **coefficient of determination** — is a common metric used to evaluate the **accuracy or goodness of fit** of a regression model.

---

### 🔷 **What is R² (R-squared)?**

R² represents the proportion of the **variance in the dependent variable** (target) that is **predictable from the independent variables** (features).

---

### 🔹 **Formula:**

$$
R^2 = 1 - \frac{SS_{res}}{SS_{tot}}
$$

Where:

* $SS_{res} = \sum (y_i - \hat{y}_i)^2$ → Residual Sum of Squares
* $SS_{tot} = \sum (y_i - \bar{y})^2$ → Total Sum of Squares

---

### 🔹 **Interpretation:**

| R² Value  | Interpretation                        |
| --------- | ------------------------------------- |
| 1         | Perfect fit (predictions are perfect) |
| 0         | Model explains none of the variance   |
| < 0       | Worse than predicting the mean        |
| \~0.7-0.9 | Good fit (depends on context)         |

---

### ✅ **Is R² = Accuracy?**

**Not exactly**. R² is often informally called "accuracy" in regression, but it's **not accuracy in the classification sense**.

* **In classification:** Accuracy = % of correct predictions.
* **In regression:** Accuracy is **not directly defined** — we use metrics like R², MAE, RMSE, etc.

---

### 🔹 **Other Regression Metrics (for comparison):**

| Metric | Description                      |
| ------ | -------------------------------- |
| MAE    | Mean Absolute Error              |
| MSE    | Mean Squared Error               |
| RMSE   | Root Mean Squared Error          |
| R²     | Proportion of explained variance |

---

In [13]:
reg.score(x ,y)

0.06575695379130464

In [14]:
from sklearn.metrics import r2_score
r2_score(y,reg.predict(x))

0.06575695379130464