<img src="../../../../images/r2.png" style="background:white; display: block; margin-left: auto;margin-right: auto; width:60%"/>

<p style="font-size:13px">
we compare the actual values and predicted values to calculate the accuracy of a regression model. Evaluation metrics provide a key role in the development of a model, as it provides insight to areas that require improvement.

There are different model evaluation metrics, lets use MSE here to calculate the accuracy of our model based on the test set: 
<ul style="font-size:13px">
    <li><i><strong>Mean Absolute Error (MAE)</strong></i>: It is the mean of the absolute value of the errors. This is the easiest of the metrics to understand since it’s just average error.</li>
    <li><i><strong>Mean Squared Error (MSE)</strong></i>: Mean Squared Error (MSE) is the mean of the squared error. It’s more popular than Mean absolute error because the focus is geared more towards large errors. This is due to the squared term exponentially increasing larger errors in comparison to smaller ones.</li>
    <li><i><strong>Root Mean Squared Error (RMSE)</strong></i>: This is the square root of the Mean Square Error. </li>
    <li><i><strong>R-squared (R2)</strong></i>: is not error, but is a popular metric for accuracy of your model. It represents how close the data are to the fitted regression line. The higher the R-squared, the better the model fits your data. Best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse).</li>
</ul>
</p>

In [None]:
from sklearn.metrics import r2_score

print("Mean Absolute Error (MAE): %.2f" % np.mean(np.absolute(y_test - y_pred)))
print("Residual sum of squares (MSE): %.2f" % np.mean((y_test - y_pred) ** 2))
print("Root MSE (RMSE): %.2f" % (np.mean((y_test - y_pred) ** 2))**0.5)
print("R2-score: %.2f" % r2_score(y_test , y_pred) )

### explained variance regression score:
#### If $\hat{y}$ is the estimated target output, y the corresponding (correct) target output, and Var is Variance, the square of the standard deviation, then the explained variance is estimated as follow:

### $\texttt{explainedVariance}(y, \hat{y}) = 1 - \frac{Var\{ y - \hat{y}\}}{Var\{y\}}$  
### The best possible score is 1.0, lower values are worse.

In [None]:
from sklearn.linear_model import LinearRegression

lr = LinearRegression()
print('Variance score: %.2f' % lr.score(x, y))

---
<h1>REGRESSION MODEL SELECTION</h1>

In [1]:
import pandas as pd
import numpy as np

df = pd.read_csv('../../../../data/clean/Data.csv')
print("AT : Engine temperature\nV  : Exhaust vacuum\nAP : Presure\nRH: Relative humidity\nPE : Energy output\n")
display(df.head())
x = df.iloc[:, 1:-1].values
y = df.iloc[:, -1].values
y_transpose = df.iloc[:, -1].values.reshape(len(df.iloc[:, -1].values),1) # transform y to 2D array like x variable

AT : Engine temperature
V  : Exhaust vacuum
AP : Presure
RH: Relative humidity
PE : Energy output



Unnamed: 0,AT,V,AP,RH,PE
0,14.96,41.76,1024.07,73.17,463.26
1,25.18,62.96,1020.04,59.08,444.37
2,5.11,39.4,1012.16,92.14,488.56
3,20.86,57.32,1010.24,76.64,446.48
4,10.82,37.5,1009.23,96.62,473.9


In [2]:
from sklearn.model_selection import train_test_split

x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.2, random_state = 0)
x_transpose_train, x_transpose_test, y_transpose_train, y_transpose_test = train_test_split(x, y_transpose, test_size = 0.2, random_state = 0)

In [3]:
from sklearn.preprocessing import PolynomialFeatures

pf = PolynomialFeatures(degree=4, include_bias=True, order='C').fit(X=x_train, y=None)
x_train_poly = pf.transform(X=x_train)

In [4]:
from sklearn.preprocessing import StandardScaler

stand_x = StandardScaler().fit(x_transpose_train)
stand_y = StandardScaler().fit(y_transpose_train)
x_ss = stand_x.transform(x_transpose_train)
y_ss = stand_y.transform(y_transpose_train)

In [5]:
from sklearn.linear_model import LinearRegression
from sklearn.svm import SVR
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import RandomForestRegressor

mlr = LinearRegression(fit_intercept=True, normalize=False, copy_X=True, n_jobs=-1)
mlr.fit(x_train, y_train)

pr = LinearRegression(fit_intercept=True, normalize=False, copy_X=True, n_jobs=-1)
pr.fit(x_train_poly, y_train)

svr_reg = SVR(kernel='rbf', degree=3, gamma="scale",
              coef0=0.0, tol=0.001, C=1.0, epsilon=0.1)
svr_reg.fit(x_ss, y_ss)


dtr = DecisionTreeRegressor(random_state=0)
dtr.fit(x_train, y_train)


rfr = RandomForestRegressor(n_estimators=10, random_state=0)
rfr.fit(x_train, y_train)

RandomForestRegressor(n_estimators=10, random_state=0)

In [6]:
from sklearn.metrics import r2_score

all_scores = {"Model":[], "R2_Score":[]}

y_pred_mlr = mlr.predict(x_test)
r2_mlr = r2_score(y_test, y_pred_mlr)
all_scores["Model"].append("MultiLinear")
all_scores["R2_Score"].append(r2_mlr)

y_pred_pr = pr.predict(pf.transform(X=x_test))
r2_pr = r2_score(y_test, y_pred_pr)
all_scores["Model"].append("Polynomial")
all_scores["R2_Score"].append(r2_pr)

y_pred_svr = svr_reg.predict(stand_x.transform(x_transpose_test))
r2_svr = r2_score(y_transpose_test, stand_y.inverse_transform(y_pred_svr))
all_scores["Model"].append("SupportVector")
all_scores["R2_Score"].append(r2_svr)

y_pred_dtr = dtr.predict(x_test)
r2_dtr = r2_score(y_test, y_pred_dtr)
all_scores["Model"].append("DecisionTree")
all_scores["R2_Score"].append(r2_dtr)

y_pred_rfr = rfr.predict(x_test)
r2_rfr = r2_score(y_test, y_pred_rfr)
all_scores["Model"].append("RandomForest")
all_scores["R2_Score"].append(r2_rfr)

display(pd.DataFrame(all_scores))

Unnamed: 0,Model,R2_Score
0,MultiLinear,0.803329
1,Polynomial,0.841872
2,SupportVector,0.849425
3,DecisionTree,0.879267
4,RandomForest,0.935001
