# Error Evaluation for Regression Models

# Problem
### Employees' years of experience and salary information are given.
### Create the linear regression model equation according to the given bias and weight.
### Bias = 275
### Weight = 90
### Equation: y' = b+wx
### Estimate the salary for all years of experience in the table according to the model equation you created.
### Calculate MSE, RMSE, MAE, and R2 scores to measure the success of the model

In [1]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import warnings
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
warnings.filterwarnings('ignore')

In [2]:
df = pd.read_excel('/kaggle/input/evaluation-regression-model/evaluation_regression_model.xlsx')
df

Unnamed: 0,experience_year,salary,salary_prediction,errors,squared_error,absolute_error
0,5,600,,,,
1,7,900,,,,
2,3,550,,,,
3,3,500,,,,
4,2,400,,,,
5,7,950,,,,
6,3,540,,,,
7,10,1200,,,,
8,6,900,,,,
9,4,550,,,,


## Let's get the lineer model equation and predict the salary

### salary_prediction = bias + weight * experince_year
### salary_prediction = 275 + 90 * experience_year

In [3]:
bias = 275
weight = 90
df['salary_prediction'] = bias + weight*df['experience_year']
df

Unnamed: 0,experience_year,salary,salary_prediction,errors,squared_error,absolute_error
0,5,600,725,,,
1,7,900,905,,,
2,3,550,545,,,
3,3,500,545,,,
4,2,400,455,,,
5,7,950,905,,,
6,3,540,545,,,
7,10,1200,1175,,,
8,6,900,815,,,
9,4,550,635,,,


### Now, we predicted the salary using the given model equation. Let's calculate the errors

In [4]:
df['errors'] = df['salary'] - df['salary_prediction']
df

Unnamed: 0,experience_year,salary,salary_prediction,errors,squared_error,absolute_error
0,5,600,725,-125,,
1,7,900,905,-5,,
2,3,550,545,5,,
3,3,500,545,-45,,
4,2,400,455,-55,,
5,7,950,905,45,,
6,3,540,545,-5,,
7,10,1200,1175,25,,
8,6,900,815,85,,
9,4,550,635,-85,,


### The subsequent step is to calculate the squared errors. Let's make it

In [5]:
df['squared_error'] = np.power(df['errors'].values, 2)
df

Unnamed: 0,experience_year,salary,salary_prediction,errors,squared_error,absolute_error
0,5,600,725,-125,15625,
1,7,900,905,-5,25,
2,3,550,545,5,25,
3,3,500,545,-45,2025,
4,2,400,455,-55,3025,
5,7,950,905,45,2025,
6,3,540,545,-5,25,
7,10,1200,1175,25,625,
8,6,900,815,85,7225,
9,4,550,635,-85,7225,


### Then, let's estimate the absolute errors

In [6]:
df['absolute_error'] = np.abs(df['errors'].values)
df

Unnamed: 0,experience_year,salary,salary_prediction,errors,squared_error,absolute_error
0,5,600,725,-125,15625,125
1,7,900,905,-5,25,5
2,3,550,545,5,25,5
3,3,500,545,-45,2025,45
4,2,400,455,-55,3025,55
5,7,950,905,45,2025,45
6,3,540,545,-5,25,5
7,10,1200,1175,25,625,25
8,6,900,815,85,7225,85
9,4,550,635,-85,7225,85


# Mean Squared Error (MSE)

### Let's calculate thr MSE using the parameters computed in the above table

### First, we should calculate the sum of squared error (SSE) as follows

In [7]:
SSE = df['squared_error'].sum()
print('The sum squared error (SSE) value is', SSE)

The sum squared error (SSE) value is 66575


### Now, let's get the observation number (n)

In [8]:
N_OBS = df.shape[0]
print('The number of observation is', N_OBS)

The number of observation is 15


### Now, it is so easy. Let's calculate the MSE by dividing the sse value by the number of observations

In [9]:
MSE = round(SSE/N_OBS, 1)
print('The MSE is', MSE)

The MSE is 4438.3


# Root Mean Square Error (RMSE)

### Let's calculate the RMSE by using the calculated MSE value

In [10]:
RMSE = round(np.sqrt(MSE), 2)
print('The root mean squared error (RMSE) value is', RMSE)

The root mean squared error (RMSE) value is 66.62


# Mean Absolute Error (MAE)

### To calculate the MAE value, we need the sum of absolute error and number of observation.

### Let's get the sum of absolute error (SAE)

In [11]:
SAE = df['absolute_error'].sum()
print('The sum of absolute eror (SAE) is', SAE)

The sum of absolute eror (SAE) is 815


### Now, let's get the MAE value

In [12]:
MAE = SAE/N_OBS
print('The mean absolute error (MAE) is', round(MAE, 2))

The mean absolute error (MAE) is 54.33


# Let's calcaluate them all using sklearn metrics nad NumPy

## MSE

In [13]:
MAE = round(mean_squared_error(df['salary'], df['salary_prediction']), 2)
print('The mean squared error (MSE) is', MAE)

The mean squared error (MSE) is 4438.33


## RMSE

In [14]:
RMSE = round(np.sqrt(MAE), 2)
print('The root mean squared error (RMSE) is', RMSE)

The root mean squared error (RMSE) is 66.62


## MAE

In [15]:
MAE = round(mean_absolute_error(df['salary'], df['salary_prediction']), 2)
print('The mean absolute error (MAE) is', MAE)

The mean absolute error (MAE) is 54.33


# Let's calulate the R2 value by using the actual and predicted salary values

In [16]:
R2_Score = round(r2_score(df['salary'], df['salary_prediction']), 2)
print('The R2 score is', R2_Score)

The R2 score is 0.94


# **Thank you for checking my notebook!**