# Random Forest Regression - Combined Cycle Power Plant

This example have the objective to demonstrate by a simple form the use of a RFR model to predict energy output based on exhaust vacuum.

The dataset used in this project was the ["Combined Cycle Power Plant Data Set"](http://archive.ics.uci.edu/ml/datasets/Combined+Cycle+Power+Plant).

# Libraries

In [1]:
import pandas as pd
import numpy as np

import matplotlib.pyplot as plt

from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import r2_score, mean_squared_error

# Dataset information

The dataset contains 9568 data points collected from a Combined Cycle Power Plant over 6 years (2006-2011), when the power plant was set to work with full load. Features consist of hourly average ambient variables Temperature (AT), Ambient Pressure (AP), Relative Humidity (RH) and Exhaust Vacuum (V) to predict the net hourly electrical energy output (PE) of the plant.

### Attribute Information:

Features consist of hourly average ambient variables

- Temperature (AT) in the range 1.81°C and 37.11°C
- Exhaust Vacuum (V) in teh range 25.36-81.56 cm Hg
- Ambient Pressure (AP) in the range 992.89-1033.30 milibar
- Relative Humidity (RH) in the range 25.56% to 100.16%
- Net hourly electrical energy output (PE) 420.26-495.76 MW

The averages are taken from various sensors located around the plant that record the ambient variables every second. The variables are given without normalization.

# Data

In [2]:
data = pd.read_excel('Power.xlsx')

data.head()

Unnamed: 0,AT,V,AP,RH,PE
0,14.96,41.76,1024.07,73.17,463.26
1,25.18,62.96,1020.04,59.08,444.37
2,5.11,39.4,1012.16,92.14,488.56
3,20.86,57.32,1010.24,76.64,446.48
4,10.82,37.5,1009.23,96.62,473.9


# Variables

In [3]:
X = data['V'].values

y = data['PE']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)

# Model - Random Forest Regression

In [4]:
random_forest = RandomForestRegressor(n_estimators=10, max_depth=7)

random_forest.fit(X_train.reshape(-1, 1), y_train)

y_pred = random_forest.predict(X_test.reshape(-1, 1))

### Results - Evaluating model

In [5]:
rmse = np.sqrt(mean_squared_error(y_test, y_pred))

rsq = r2_score(y_test, y_pred)

print('RMSE:', rmse)

print('\nR Square:', rsq)

RMSE: 6.575667370552725

R Square: 0.847890273298709
