In this project, we will create a multiple regression model and SVM regression model that can predict the electrical energy output (EP) of a power plant with a dataset consisting of several features: 
<br>
-Temperature (T)
<br>
-Ambient Pressure (AP)
<br>
-Relative Humidity (RH)
<br>
-Exhaust Vacuum (V)

In [1]:
import numpy as np 
import pandas as pd 
from matplotlib import pyplot as plt

In [2]:
dataset = pd.read_csv('Power Plant Data.csv')

In [3]:
pd.DataFrame(dataset)

Unnamed: 0,Ambient Temperature (C),Exhaust Vacuum (cm Hg),Ambient Pressure (milibar),Relative Humidity (%),Hourly Electrical Energy output (MW)
0,14.96,41.76,1024.07,73.17,463.26
1,25.18,62.96,1020.04,59.08,444.37
2,5.11,39.40,1012.16,92.14,488.56
3,20.86,57.32,1010.24,76.64,446.48
4,10.82,37.50,1009.23,96.62,473.90
...,...,...,...,...,...
9563,16.65,49.69,1014.01,91.00,460.03
9564,13.19,39.18,1023.67,66.78,469.62
9565,31.32,74.33,1012.92,36.48,429.57
9566,24.48,69.45,1013.86,62.39,435.74


In [4]:
dataset.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9568 entries, 0 to 9567
Data columns (total 5 columns):
 #   Column                                Non-Null Count  Dtype  
---  ------                                --------------  -----  
 0   Ambient Temperature (C)               9568 non-null   float64
 1   Exhaust Vacuum (cm Hg)                9568 non-null   float64
 2   Ambient Pressure (milibar)            9568 non-null   float64
 3   Relative Humidity (%)                 9568 non-null   float64
 4   Hourly Electrical Energy output (MW)  9568 non-null   float64
dtypes: float64(5)
memory usage: 373.9 KB


In [5]:
X=dataset.iloc[:, :-1].values
y=dataset.iloc[:, -1].values

In [6]:
dataset.describe()

Unnamed: 0,Ambient Temperature (C),Exhaust Vacuum (cm Hg),Ambient Pressure (milibar),Relative Humidity (%),Hourly Electrical Energy output (MW)
count,9568.0,9568.0,9568.0,9568.0,9568.0
mean,19.651231,54.305804,1013.259078,73.308978,454.365009
std,7.452473,12.707893,5.938784,14.600269,17.066995
min,1.81,25.36,992.89,25.56,420.26
25%,13.51,41.74,1009.1,63.3275,439.75
50%,20.345,52.08,1012.94,74.975,451.55
75%,25.72,66.54,1017.26,84.83,468.43
max,37.11,81.56,1033.3,100.16,495.76


## Feature Scaling



In [7]:
from sklearn.preprocessing import StandardScaler
my_sc = StandardScaler()
X= my_sc.fit_transform(X)

## Split the data


In [8]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

In [9]:
X_test.shape

(1914, 4)

In [10]:
X_train.shape

(7654, 4)

## Multiple Regression


In [11]:
from sklearn.linear_model import LinearRegression
my_model = LinearRegression()
my_model.fit(X_train, y_train)

LinearRegression()

In [12]:
y_pred = my_model.predict(X_test)
from sklearn.metrics import r2_score
score = r2_score(y_test, y_pred)
print('R2 Score = %.3f' %score)

R2 Score = 0.933


## SVM Regression


In [13]:
from sklearn.svm import SVR
SVMregression = SVR(kernel = 'rbf')
SVMregression.fit(X_train, y_train)

SVR()

In [14]:
y_pred = SVMregression.predict(X_test)
from sklearn.metrics import r2_score
score = r2_score(y_test, y_pred)
print('R2 Score = %.3f' %score)

R2 Score = 0.943


## Conclusion: Both R2 scores are similar but SVM Regression has a slightly higher score which means that the trained data fits better with the test data in the SVR model, and can predict electric output more accurately. A possible reason why SVM Regression is more accurate is because it is non-linear. 