<h2>Regression Model Selection</h2>
<h4>Mean Squared Error (MSE)</h4>

$$\text{MSE} = \frac{1}{n}  \sum_{i=1}^n (y_i - \hat{y_i})^2$$ 

- ***$n$*** is the number of data points;
- ***$y_i$***  represents the actual value of the dependent variable for the ***i***-th observation;
- ***$\hat{y_i}$***  represents the predicted value of the dependent variable for the***i***-th observation;

**MSE** is widely used to assess the accuracy of a regression model. It squares the differences between predicted and actual values, giving larger errors more weight. A lower MSE indicates better model performance, with 0 being a perfect fit (though practically rare due to overfitting).

<h4>Mean Absolute Error (MAE)</h4>

$$\text{MAE} = \frac{1}{n}  \sum_{i=1}^n \lvert y_i - \hat{y_i}
\rvert$$ 

- ***$n$*** is the number of data points;
- ***$y_i$***  represents the actual value of the dependent variable for the ***i***-th observation;
- ***$\hat{y_i}$***  represents the predicted value of the dependent variable for the***i***-th observation;

**MAE** provides a more straightforward measure of average error compared to MSE because it treats all errors equally by taking the absolute differences. It is less sensitive to outliers and can be useful when the focus is on the magnitude of errors rather than their direction.

<h4>R-squared (R^2)</h4>

$$R^2 = \frac{
\sum_{i=1}^n (y_i - \hat{y_i})^2
}{
\sum_{i=1}^n (y_i - \bar{y})^2
}$$

- ***$n$*** is the number of data points;
- ***$y_i$***  represents the actual value of the dependent variable for the ***i***-th observation;
- ***$\hat{y_i}$***  represents the predicted value of the dependent variable for the***i***-th observation;
- ***$\bar{y}$*** is the mean of the actual values $y_i$;

**R-squared** ranges from 0 to 1, with 1 indicating a perfect fit where the model explains all the variability of the dependent variable around its mean. A lower R² suggests that the model does not explain much of the variability in the data. It is a widely used metric to evaluate the overall fit and performance of regression models.


<h4>Importing libraries</h4>

In [1]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

<h4>Loading & preparing datasets</h4>

In [2]:
df = pd.read_csv("./Data.csv")
X = df.iloc[:, :-1].values
y = df.iloc[:, -1].values

In [3]:
print(X)

[[  14.96   41.76 1024.07   73.17]
 [  25.18   62.96 1020.04   59.08]
 [   5.11   39.4  1012.16   92.14]
 ...
 [  31.32   74.33 1012.92   36.48]
 [  24.48   69.45 1013.86   62.39]
 [  21.6    62.52 1017.23   67.87]]


In [4]:
print(y)

[463.26 444.37 488.56 ... 429.57 435.74 453.28]


In [5]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=23)

In [6]:
print(X_train)

[[  27.35   77.95 1012.14   74.14]
 [  28.11   70.98 1007.76   85.6 ]
 [  18.68   43.69 1016.68   48.88]
 ...
 [  24.22   49.82 1014.61   66.82]
 [  25.59   61.5  1009.12   68.  ]
 [  20.54   49.15 1021.02   56.  ]]


In [7]:
print(X_test)

[[  24.22   68.51 1013.23   74.96]
 [  33.59   79.05 1007.79   63.55]
 [  14.43   35.85 1021.99   78.25]
 ...
 [  33.72   74.33 1011.4    37.51]
 [  17.47   58.59 1014.03   97.13]
 [  31.63   70.17  999.4    59.94]]


In [8]:
print(y_train)

[431.72 431.83 463.02 ... 452.2  439.14 458.6 ]


In [9]:
print(y_test)

[440.01 436.51 464.6  ... 428.96 449.41 432.07]


<h4>Collecting regression models</h4>

In [10]:
regressors = []

<h4>Polynomial Regression</h4>

In [11]:
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression

poly_feats = PolynomialFeatures()
X_train_poly = poly_feats.fit_transform(X_train)
X_test_poly = poly_feats.transform(X_test)

poly_regressor = LinearRegression()

<h4>Linear Regression<h4>

In [12]:
regressors.append(LinearRegression())

<h4>Support Vector Regression</h4>

In [13]:
from sklearn.svm import SVR

regressors.append(SVR())

<h4>Decision Tree Regression</h4>

In [14]:
from sklearn.tree import DecisionTreeRegressor

regressors.append(DecisionTreeRegressor())

<h4>Random Forest Regression</h4>

In [15]:
from sklearn.ensemble import RandomForestRegressor

regressors.append(RandomForestRegressor())

In [16]:
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score

poly_regressor.fit(X_train_poly, y_train)
y_pred = poly_regressor.predict(X_test_poly)

print("Polynomial Regressor")
print("Mean Squarred Error:", mean_squared_error(y_test, y_pred))
print("Mean Absolute Error:", mean_absolute_error(y_test, y_pred))
print("R Squared Score:", r2_score(y_test, y_pred))

for model in regressors:
    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)

    print("\n", model.__class__)
    print("Mean Squarred Error:", mean_squared_error(y_test, y_pred))
    print("Mean Absolute Error:", mean_absolute_error(y_test, y_pred))
    print("R Squared Score:", r2_score(y_test, y_pred))

Polynomial Regressor
Mean Squarred Error: 17.83196882590042
Mean Absolute Error: 3.2977543121500807
R Squared Score: 0.9397368002721032

 <class 'sklearn.linear_model._base.LinearRegression'>
Mean Squarred Error: 20.036867448832787
Mean Absolute Error: 3.547314467257839
R Squared Score: 0.9322853378233498

 <class 'sklearn.svm._classes.SVR'>
Mean Squarred Error: 181.3664882353528
Mean Absolute Error: 11.250795756309916
R Squared Score: 0.38707133176460007

 <class 'sklearn.tree._classes.DecisionTreeRegressor'>
Mean Squarred Error: 18.888693207941486
Mean Absolute Error: 2.9552873563218385
R Squared Score: 0.9361655966033426

 <class 'sklearn.ensemble._forest.RandomForestRegressor'>
Mean Squarred Error: 10.905018933636407
Mean Absolute Error: 2.2749188087774326
R Squared Score: 0.9631464511602497
