# Project Description
A combined cycle power plant is an arrangement of multiple heat turbines that all draw from the same heat source to produce energy. Data on average ambient variables will be used to predict the net hourly electrical energy output of the power plant. Two sets of the data will be fit to both a multiple regression model and a SVM regression model and the outputs will be compared. One set will be the raw data and the other will be scaled using feature scaling.


# Data Preprocessing
## Importing the Libraries

In [46]:
import numpy as np
import pandas as pd

## Importing Dataset

In [47]:
dataset = pd.read_csv('Power Plant Data.csv')

## Showing the Dataset in a Table

In [48]:
dataset

Unnamed: 0,Ambient Temperature (C),Exhaust Vacuum (cm Hg),Ambient Pressure (milibar),Relative Humidity (%),Hourly Electrical Energy output (MW)
0,14.96,41.76,1024.07,73.17,463.26
1,25.18,62.96,1020.04,59.08,444.37
2,5.11,39.40,1012.16,92.14,488.56
3,20.86,57.32,1010.24,76.64,446.48
4,10.82,37.50,1009.23,96.62,473.90
...,...,...,...,...,...
9563,16.65,49.69,1014.01,91.00,460.03
9564,13.19,39.18,1023.67,66.78,469.62
9565,31.32,74.33,1012.92,36.48,429.57
9566,24.48,69.45,1013.86,62.39,435.74


## Seperate The Input and Output
Here, we put the independent variables in X and the dependent variable in y. 

In [49]:
X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, -1].values

## Feature Scaling the Independent Variables

In [50]:
from sklearn.preprocessing import StandardScaler
# instantiate StandardScalar object
scx = StandardScaler()
scy = StandardScaler()
# scale independent data
X = scx.fit_transform(X[:, :])
# reshape and scale dependent data
y = scy.fit_transform(y.reshape(len(y),1))

## Splitting the Dataset into the Training set and Test set

In [51]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 1)

# Regression Model Analysis
## Multiple Regression
Train the multiple regression model.

In [52]:
from sklearn import linear_model
# instantiate linear regression object
linear = linear_model.LinearRegression()
# train models
linear = linear.fit(X_train, y_train)

## SVR 
Train the SVR model trying 4 different kernals.

In [53]:
from sklearn.svm import SVR
# kernals to try
kernals = ['linear', 'poly', 'rbf', 'sigmoid']
# list to store models
svr_models = []
# train model using each different kernal and save results in list as tuples of form (kernal, model)
for kernal in kernals:
    regressor = SVR(kernel = kernal)
    svr_models.append((kernal, regressor.fit(X_train, np.ravel(y_train))))

# Checking Models with Test Set
## Multiple Regression Models

In [85]:
# make prediction
y_pred_linear = linear.predict(X_test)
# show predicitons in table
scy.inverse_transform(y_pred_linear)
scy.inverse_transform(y_test)
pd.DataFrame([np.ndarray.tolist(scy.inverse_transform(y_pred_linear)), np.ndarray.tolist(scy.inverse_transform(y_test))], index=['Predict', 'Test']).T.head()

Unnamed: 0,Predict,Test
0,[457.25522107775566],[458.96]
1,[466.7192736632424],[463.29]
2,[440.3669491129304],[435.27]
3,[482.57800980008994],[484.31]
4,[474.88054717752976],[473.55]


## SVR Models

In [88]:
# list to hold predictions of the 4 models as tuples (kernal, prediction)
svr_predictions = []
# make predicitons for all 4 models
for model in svr_models:
    svr_predictions.append((model[0], model[1].predict(X_test)))
# show preditions in tables
for prediction in svr_predictions:
    print(f"{pd.DataFrame([scy.inverse_transform(prediction[1]), scy.inverse_transform(y_test)], index=['Predict ' + prediction[0], 'Test']).T.head()}")
    print()

  Predict linear      Test
0        457.401  [458.96]
1        466.691  [463.29]
2        440.029  [435.27]
3        482.924  [484.31]
4        475.507  [473.55]

  Predict poly      Test
0      456.033  [458.96]
1      462.224  [463.29]
2      441.581  [435.27]
3      483.013  [484.31]
4      484.023  [473.55]

  Predict rbf      Test
0     457.066  [458.96]
1     463.702  [463.29]
2     437.547  [435.27]
3     485.537  [484.31]
4      478.09  [473.55]

  Predict sigmoid      Test
0         503.352  [458.96]
1        -3131.87  [463.29]
2         3138.53  [435.27]
3        -2824.38  [484.31]
4         4113.05  [473.55]



# Measuring the Model Performances
Performance of regression models is typically measured using Root Mean Square Error RMSE. I will calculate the RMSE and prediction accuracy of each model.

## Multiple Regression Model

In [89]:
from sklearn.metrics import mean_squared_error
rmse = (mean_squared_error(y_pred_linear, y_test))**0.5
print(f'RMSE: {rmse}\n')

RMSE: 0.26420086441266966



## SVR Models

In [92]:
for prediction, model in zip(svr_predictions, svr_models):
    rmse = (mean_squared_error(prediction[1], y_test))**0.5
    print(f'SVR using kernal {prediction[0]}\nRMSE: {rmse}\n')

SVR using kernal linear
RMSE: 0.2642502598784891

SVR using kernal poly
RMSE: 0.46374319155186755

SVR using kernal rbf
RMSE: 0.23136443481901295

SVR using kernal sigmoid
RMSE: 156.3870383804716



# Conclusion
From my analysis of both the scaled and non-scaled data using a multiple regression model and SVR models using kernals linear, poly, rbf, and sigmoid I determined that the best model for this set of data is a SVR model using the rbf kernal. This model had a RMSE of approximately 0.23. 