In [3]:
import numpy as np

In [6]:
# Reading dataset using numpy
data = np.loadtxt('Data/Random_regression.csv', delimiter=',', skiprows=1)

### Summary Table of Regression Metrics

| **Metric**     | **Penalizes Large Errors** | **Unit-based** | **Robust to Outliers** | **Good for % Errors** | **When to Avoid**                          |
|----------------|----------------------------|----------------|------------------------|-----------------------|---------------------------------------------|
| MAE            | No                         | Yes            | Yes                    | No                    | When large errors matter                    |
| MSE            | Yes                        | No             | No                     | No                    | When error units matter                     |
| RMSE           | Yes                        | Yes            | No                     | No                    | When outliers are extreme                   |
| R²             | No                         | No             | No                     | No                    | Non-linear models                           |
| Adjusted R²    | No                         | No             | No                     | No                    | Same as R² + fewer predictors               |
| MAPE           | No                         | No             | No                     | Yes                   | Targets near zero                           |


# Checking the model performance

## 1. Mean Absolute Error (MAE)

**Equation:**
$$
\text{MAE} = \frac{1}{n} \sum_{i=1}^{n} |y_i - \hat{y}_i|
$$

**✅ When to Use:**
- You need a simple average of absolute errors.
- Outliers should have limited influence.

**❌ When *Not* to Use:**
- You want to penalize large errors more severely.

**📌 Example:**  
Predicting house prices where an average error of \$10,000 is acceptable.

In [7]:
def MAE(data):
    mae=0
    for i in range(len(data)):
        mae=mae+ np.abs(data[i][0]-data[i][1])
    mae= mae/len(data)
    return mae

## 2. Mean Squared Error (MSE)

**Equation:**
$$
\text{MSE} = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2
$$

**✅ When to Use:**
- During model training (sensitive to large errors).
- Emphasizing large prediction errors.

**❌ When *Not* to Use:**
- When you need results in the same units as the target variable.

**📌 Example:**  
Used for optimizing models via gradient descent.

In [11]:
def MSE(data):
    mse=0
    for i in range(len(data)):
        mse=mse+ np.square(data[i][0]-data[i][1])
    mse= mse/len(data)
    return mse

## 3. Root Mean Squared Error (RMSE)

**Equation:**
$$
\text{RMSE} = \sqrt{\frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2}
$$

**✅ When to Use:**
- You need an interpretable metric in the same unit as the output.
- Large errors need to be penalized more.

**❌ When *Not* to Use:**
- In presence of many extreme outliers.

**📌 Example:**  
Forecasting temperatures in °C or °F.

In [16]:
def RMSE(data):
    mse= MSE(data)
    rmse= np.sqrt(mse)
    return rmse

## 4. R-squared (R²)

**Equation:**
$$
R^2 = 1 - \frac{\sum_{i=1}^{n}(y_i - \hat{y}_i)^2}{\sum_{i=1}^{n}(y_i - \bar{y})^2}
$$

**✅ When to Use:**
- To evaluate the proportion of variance explained.
- Comparing models on the same dataset.

**❌ When *Not* to Use:**
- For non-linear models.
- Comparing models across different datasets.

**📌 Example:**  
Evaluating how well education predicts income.

In [22]:
def R_Square(data):
    rss=0 # residual sum of square
    tss=0 # total sum of square
    avg= np.mean(data[:][0])
    for i in range(len(data)):
        rss= rss + np.square(data[i][0] - data[i][1])
        tss= tss + np.square(data[i][0] - avg)
    r_square= 1- (rss/tss)
    return r_square

## 5. Adjusted R-squared

**Equation:**
$$
\text{Adjusted } R^2 = 1 - \left( \frac{(1 - R^2)(n - 1)}{n - k - 1} \right)
$$

**✅ When to Use:**
- Comparing models with different numbers of predictors.
- To avoid overfitting.

**❌ When *Not* to Use:**
- Same limitations as R² (assumes linearity).

**📌 Example:**  
Choosing between models with different feature sets in linear regression.

In [30]:
def Adj_R_Square(data, k):
    r_square= R_Square(data)
    n= len(data)
    adj_r_square= 1- ((1-r_square)*(n-1)/(n-k-1))
    return adj_r_square

Mean Absolute Percentage Error (MAPE)

**Equation:**
$$
\text{MAPE} = \frac{100\%}{n} \sum_{i=1}^{n} \left| \frac{y_i - \hat{y}_i}{y_i} \right|
$$

**✅ When to Use:**
- You need errors as percentages.
- Targets are not zero or near-zero.

**❌ When *Not* to Use:**
- Target values can be zero (division by zero issue).

**📌 Example:**  
Forecasting monthly sales in percentage terms.

In [24]:
def MAPE(data):
    avg = np.mean(data[:][0])
    mape = 0
    for i in range(len(data)):
        mape= mape + abs((data[i][0]-avg)/data[i][0])
    mape= mape *100/len(data)
    return mape