The dataset contains 36733 instances of 11 sensor measures aggregated over one hour (by means of average or sum) from a gas turbine. 
The Dataset includes gas turbine parameters (such as Turbine Inlet Temperature and Compressor Discharge pressure) in addition to the ambient variables.



Problem statement: predicting turbine energy yield (TEY) using ambient variables as features.



Attribute Information:

The explanations of sensor measurements and their brief statistics are given below.

Variable (Abbr.) Unit Min Max Mean

Ambient temperature (AT) C â€“6.23 37.10 17.71

Ambient pressure (AP) mbar 985.85 1036.56 1013.07

Ambient humidity (AH) (%) 24.08 100.20 77.87

Air filter difference pressure (AFDP) mbar 2.09 7.61 3.93

Gas turbine exhaust pressure (GTEP) mbar 17.70 40.72 25.56

Turbine inlet temperature (TIT) C 1000.85 1100.89 1081.43

Turbine after temperature (TAT) C 511.04 550.61 546.16

Compressor discharge pressure (CDP) mbar 9.85 15.16 12.06

Turbine energy yield (TEY) MWH 100.02 179.50 133.51

Carbon monoxide (CO) mg/m3 0.00 44.10 2.37

Nitrogen oxides (NOx) mg/m3 25.90 119.91 65.29

In [1]:
# Importing the necessary packages
import pandas as pd
import numpy as np
import keras
import tensorflow
from sklearn.preprocessing import StandardScaler

In [62]:
# Import necessary modules
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
from math import sqrt

# Keras specific
import keras
from keras.models import Sequential
from keras.layers import Dense

In [2]:
pip install keras

Note: you may need to restart the kernel to use updated packages.


In [3]:
pip install tensorflow

Note: you may need to restart the kernel to use updated packages.


In [4]:
data = pd.read_csv("gas_turbines.csv")

In [5]:
data.head()

Unnamed: 0,AT,AP,AH,AFDP,GTEP,TIT,TAT,TEY,CDP,CO,NOX
0,6.8594,1007.9,96.799,3.5,19.663,1059.2,550.0,114.7,10.605,3.1547,82.722
1,6.785,1008.4,97.118,3.4998,19.728,1059.3,550.0,114.72,10.598,3.2363,82.776
2,6.8977,1008.8,95.939,3.4824,19.779,1059.4,549.87,114.71,10.601,3.2012,82.468
3,7.0569,1009.2,95.249,3.4805,19.792,1059.6,549.99,114.72,10.606,3.1923,82.67
4,7.3978,1009.7,95.15,3.4976,19.765,1059.7,549.98,114.72,10.612,3.2484,82.311


In [15]:
data[['TEY1']] = data[['TEY']]

In [16]:
data.head()

Unnamed: 0,AT,AP,AH,AFDP,GTEP,TIT,TAT,TEY,CDP,CO,NOX,y,TEY1
0,6.8594,1007.9,96.799,3.5,19.663,1059.2,550.0,114.7,10.605,3.1547,82.722,114.7,114.7
1,6.785,1008.4,97.118,3.4998,19.728,1059.3,550.0,114.72,10.598,3.2363,82.776,114.72,114.72
2,6.8977,1008.8,95.939,3.4824,19.779,1059.4,549.87,114.71,10.601,3.2012,82.468,114.71,114.71
3,7.0569,1009.2,95.249,3.4805,19.792,1059.6,549.99,114.72,10.606,3.1923,82.67,114.72,114.72
4,7.3978,1009.7,95.15,3.4976,19.765,1059.7,549.98,114.72,10.612,3.2484,82.311,114.72,114.72


In [19]:
data.drop(['y'], axis='columns', inplace=True)

In [20]:
data.head()

Unnamed: 0,AT,AP,AH,AFDP,GTEP,TIT,TAT,TEY,CDP,CO,NOX,TEY1
0,6.8594,1007.9,96.799,3.5,19.663,1059.2,550.0,114.7,10.605,3.1547,82.722,114.7
1,6.785,1008.4,97.118,3.4998,19.728,1059.3,550.0,114.72,10.598,3.2363,82.776,114.72
2,6.8977,1008.8,95.939,3.4824,19.779,1059.4,549.87,114.71,10.601,3.2012,82.468,114.71
3,7.0569,1009.2,95.249,3.4805,19.792,1059.6,549.99,114.72,10.606,3.1923,82.67,114.72
4,7.3978,1009.7,95.15,3.4976,19.765,1059.7,549.98,114.72,10.612,3.2484,82.311,114.72


In [21]:
data.drop(['TEY'], axis='columns', inplace=True)

In [22]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 15039 entries, 0 to 15038
Data columns (total 11 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   AT      15039 non-null  float64
 1   AP      15039 non-null  float64
 2   AH      15039 non-null  float64
 3   AFDP    15039 non-null  float64
 4   GTEP    15039 non-null  float64
 5   TIT     15039 non-null  float64
 6   TAT     15039 non-null  float64
 7   CDP     15039 non-null  float64
 8   CO      15039 non-null  float64
 9   NOX     15039 non-null  float64
 10  TEY1    15039 non-null  float64
dtypes: float64(11)
memory usage: 1.3 MB


In [24]:
data.shape

(15039, 11)

In [25]:
data.describe()

Unnamed: 0,AT,AP,AH,AFDP,GTEP,TIT,TAT,CDP,CO,NOX,TEY1
count,15039.0,15039.0,15039.0,15039.0,15039.0,15039.0,15039.0,15039.0,15039.0,15039.0,15039.0
mean,17.764381,1013.19924,79.124174,4.200294,25.419061,1083.79877,545.396183,12.102353,1.972499,68.190934,134.188464
std,7.574323,6.41076,13.793439,0.760197,4.173916,16.527806,7.866803,1.103196,2.222206,10.470586,15.829717
min,0.5223,985.85,30.344,2.0874,17.878,1000.8,512.45,9.9044,0.000388,27.765,100.17
25%,11.408,1008.9,69.75,3.7239,23.294,1079.6,542.17,11.622,0.858055,61.3035,127.985
50%,18.186,1012.8,82.266,4.1862,25.082,1088.7,549.89,12.025,1.3902,66.601,133.78
75%,23.8625,1016.9,90.0435,4.5509,27.184,1096.0,550.06,12.578,2.1604,73.9355,140.895
max,34.929,1034.2,100.2,7.6106,37.402,1100.8,550.61,15.081,44.103,119.89,174.61


In [56]:
dataset= data.values
X = dataset[:,0:10]
y = dataset[:,-1]

In [57]:
X


array([[   6.8594, 1007.9   ,   96.799 , ...,   10.605 ,    3.1547,
          82.722 ],
       [   6.785 , 1008.4   ,   97.118 , ...,   10.598 ,    3.2363,
          82.776 ],
       [   6.8977, 1008.8   ,   95.939 , ...,   10.601 ,    3.2012,
          82.468 ],
       ...,
       [   7.2647, 1006.3   ,   99.496 , ...,   10.483 ,    7.9632,
          90.912 ],
       [   7.006 , 1006.8   ,   99.008 , ...,   10.533 ,    6.2494,
          93.227 ],
       [   6.9279, 1007.2   ,   97.533 , ...,   10.583 ,    4.9816,
          92.498 ]])

In [58]:
Y

array([114.7 , 114.72, 114.71, ..., 110.19, 110.74, 111.58])

In [59]:
# Standardization
a = StandardScaler()
a.fit(X)
X_standardized = a.transform(X)

In [60]:
pd.DataFrame(X_standardized).describe()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9
count,15039.0,15039.0,15039.0,15039.0,15039.0,15039.0,15039.0,15039.0,15039.0,15039.0
mean,-1.16968e-15,-1.92528e-14,2.007245e-16,3.810001e-16,1.111478e-16,-2.324212e-15,1.744899e-15,2.542166e-16,1.9592610000000003e-17,-3.6468530000000004e-17
std,1.000033,1.000033,1.000033,1.000033,1.000033,1.000033,1.000033,1.000033,1.000033,1.000033
min,-2.276462,-4.266288,-3.536594,-2.779497,-1.806771,-5.021933,-4.188141,-1.992416,-0.8874862,-3.861033
25%,-0.8392292,-0.670651,-0.6796337,-0.626693,-0.5091458,-0.2540512,-0.4101146,-0.4354335,-0.5015202,-0.6578107
50%,0.05566605,-0.06227861,0.2277844,-0.01854065,-0.08075681,0.2965544,0.571257,-0.07011925,-0.2620452,-0.1518527
75%,0.8051309,0.5772924,0.7916582,0.4612196,0.4228638,0.738249,0.5928675,0.431168,0.08455882,0.5486567
max,2.266234,3.27597,1.528011,4.486233,2.871006,1.028678,0.6627839,2.700105,18.95949,4.937717


In [63]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=40)
print(X_train.shape); print(X_test.shape)

(10527, 10)
(4512, 10)


In [64]:
# Define model
model = Sequential()
model.add(Dense(500, input_dim=10, activation= "relu"))
model.add(Dense(100, activation= "relu"))
model.add(Dense(50, activation= "relu"))
model.add(Dense(1))
#model.summary() #Print model Summary

In [65]:
model.compile(loss= "mean_squared_error" , optimizer="adam", metrics=["mean_squared_error"])
model.fit(X_train, y_train, epochs=20)

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


<keras.callbacks.History at 0x7fd00c6d6580>

In [66]:
pred_train= model.predict(X_train)
print(np.sqrt(mean_squared_error(y_train,pred_train)))

pred= model.predict(X_test)
print(np.sqrt(mean_squared_error(y_test,pred))) 

0.9134095055619667
0.9317314483714805


### The first line of code predicts on the train data, while the second line prints the RMSE value on the train data. The same is repeated in the third and fourth lines of code which predicts and prints the RMSE value on test data.

In [69]:
pred

array([[133.32233],
       [165.60696],
       [133.24268],
       ...,
       [131.35298],
       [135.0003 ],
       [134.42787]], dtype=float32)

In [70]:
pred_train

array([[133.78947 ],
       [132.41571 ],
       [108.59266 ],
       ...,
       [126.57919 ],
       [133.94077 ],
       [122.370415]], dtype=float32)

### Here we predicted  turbine energy yield (TEY)  using test and train data.