# Multiple Linear Regression - Combined Cycle Power Plant

This example has the objective to demonstrate the use of a multiple linear regression model to predict energy output.

# Dataset information

The dataset contains 9568 data points collected from a Combined Cycle Power Plant over 6 years (2006-2011), when the power plant was set to work with full load. Features consist of hourly average ambient variables Temperature (AT), Ambient Pressure (AP), Relative Humidity (RH) and Exhaust Vacuum (V) to predict the net hourly electrical energy output (PE) of the plant.

## Attribute Information:

Features consist of hourly average ambient variables

- Temperature (AT) in the range 1.81°C and 37.11°C
- Exhaust Vacuum (V) in teh range 25.36-81.56 cm Hg
- Ambient Pressure (AP) in the range 992.89-1033.30 milibar
- Relative Humidity (RH) in the range 25.56% to 100.16%
- Net hourly electrical energy output (PE) 420.26-495.76 MW

The averages are taken from various sensors located around the plant that record the ambient variables every second. The variables are given without normalization.

# Libraries

In [1]:
import pandas as pd

from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score,mean_absolute_error

# Data

In [2]:
source = 'https://raw.githubusercontent.com/LucasKiraly/Datasets/main/CCPP.xlsx'

data = pd.read_excel(source)

data.head()

Unnamed: 0,AT,V,AP,RH,PE
0,14.96,41.76,1024.07,73.17,463.26
1,25.18,62.96,1020.04,59.08,444.37
2,5.11,39.4,1012.16,92.14,488.56
3,20.86,57.32,1010.24,76.64,446.48
4,10.82,37.5,1009.23,96.62,473.9


# Multiple linear regression model

## Defining variables

In [3]:
X = data.drop('PE', axis = 1)

y = data['PE']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2)

## Creating model

In [4]:
multiple_linear_reg = LinearRegression()

multiple_linear_reg.fit(X_train, y_train)

y_pred = multiple_linear_reg.predict(X_test)

## Defining coefficients of the equation

In [5]:
coef_a = multiple_linear_reg.coef_

coef_b =  multiple_linear_reg.intercept_

coefs = [['Temperature', coef_a[0], ],
         ['Exhaust Vacuum', coef_a[1]],
         ['Ambient Pressure', coef_a[2]],
         ['Relative Humidity', coef_a[3]],
         ['*Intercept', coef_b]]

df_equation = pd.DataFrame(coefs, 
                           columns = ['Parameter', 'Coefficient Value'])

df_equation

Unnamed: 0,Parameter,Coefficient Value
0,Temperature,-1.975304
1,Exhaust Vacuum,-0.233011
2,Ambient Pressure,0.062273
3,Relative Humidity,-0.158113
4,*Intercept,454.322438


## Evaluating model

In [6]:
mae = mean_absolute_error(y_test, y_pred)

rsq = r2_score(y_test,y_pred)

ajusted_rsq = 1 - ((1 - rsq) * (len(y_test) - 1)) / (len(y_test) - X_test.shape[1] - 1)

print('MAE: %.4f' %mae)

print('\nR²: %.4f' %rsq)

print('\nAjusted R²: %.4f' %ajusted_rsq)

MAE: 3.5979

R²: 0.9337

Ajusted R²: 0.9335
