# PROJECT DESCRIPTION


**OBJECTIVE**: Prediction of Gas Turbine CO Emission 

**Description**: Predict the Gas Turbine CO Emission using 11 sensor measures aggregated over one hour (by means of average orsum) from a gas turbine located in Turkey's north western region for the purpose of studying flue gas emissions,namely CO and NOx (NO + NO2)

**Motivation**: Harmful effect of Flue gas emitted from power plant turbines on environment has always been a substantial concern. In the recent past years many peaceful protest to save environment has been seen. Environmental organization that seeks to protect, analyse or monitor the environment have conducted many events and activities to raise people awareness on environment.
This project aims to predict emission of flue gases based on sensor data from gas turbine and various Machine Learning techniques. 

The ML model can be used to predict/estimate amount of emission for future operations of Turbine and Turbine of same homologus series. Model output can also be used for validation and backing up of costly continuous emission monitoring systems used in gas-turbine-based power plants. Their implementation relies on the availability of appropriate and ecologically valid data.

**Data Source**: https://archive.ics.uci.edu/ml/datasets/Gas+Turbine+CO+and+NOx+Emission+Data+Set#

**Data Description:** The dataset contains 36733 instances of 11 sensor measures aggregated over one hour (by means of average or sum) from a gas turbine located in Turkey's north western region for the purpose of studying flue gas emissions, namely CO and NOx (NO + NO2).

 **Variable (Abbr.)  &  Unit**

Ambient temperature (AT) C 

Ambient pressure (AP) mbar 

Ambient humidity (AH) (%) 

Air filter difference pressure (AFDP) mbar

Gas turbine exhaust pressure (GTEP) mbar 

Turbine inlet temperature (TIT) C 

Turbine after temperature (TAT) C 

Compressor discharge pressure (CDP) mbar 

Turbine energy yield (TEY) MWH 

Carbon monoxide (CO) mg/m^3

 Nitrogen oxides (NOx) mg/m^3

# Model Building

In [15]:
# Importing Libraries
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler, PolynomialFeatures
from sklearn.decomposition import PCA
from sklearn.ensemble import RandomForestRegressor
import pandas as pd
import numpy as np

In [16]:
# path of each file
path = ("dataFolder/CO_NOX.csv")

df = pd.read_csv(path)
print(df.shape)
df.head()

(25537, 11)


Unnamed: 0,AT,AP,AH,AFDP,GTEP,TIT,TAT,TEY,CDP,CO,NOX
0,23.056,1019.3,62.777,4.2547,30.505,1100.0,542.3,150.94,13.379,1.6653,49.305
1,25.551,1010.5,81.232,4.4498,29.848,1099.6,545.38,146.08,13.117,1.0618,55.238
2,18.25,1017.9,81.401,3.913,21.331,1043.6,539.33,113.22,11.017,12.659,71.888
3,19.743,1016.0,82.356,3.7566,24.196,1078.6,549.94,130.07,11.891,2.0195,52.263
4,26.957,1010.2,65.205,5.376,30.726,1099.9,544.02,148.01,13.272,1.0975,55.536


In [19]:
# DATA
X = df.iloc[:,0:-2]
y = df.loc[:,['CO','NOX']]

X_final_CO = X.loc[:,['AT',  'AFDP', 'GTEP', 'TIT', 'TAT', 'TEY', 'CDP']].copy()
X_final_CO.reset_index(drop=True,inplace=True)

y_final_CO = y.loc[:,['CO']].copy()
y_final_CO.reset_index(drop=True,inplace=True)

In [20]:
# creating pipeline
pipe = Pipeline([('scaler', StandardScaler()), 
                 ('pca', PCA(n_components=4,svd_solver='full')),
                 ('poly', PolynomialFeatures(degree=2)),
                 ('randomForest', RandomForestRegressor(max_depth=16, n_estimators=161, n_jobs=-1, 
                                  criterion='friedman_mse', warm_start=True, 
                                  oob_score=True, bootstrap=True,
                                  max_features=4, random_state=4578))
                ])

In [21]:
# Fitting Pipeline
pipe.fit(X_final_CO,y_final_CO.CO)

Pipeline(memory=None,
         steps=[('scaler',
                 StandardScaler(copy=True, with_mean=True, with_std=True)),
                ('pca',
                 PCA(copy=True, iterated_power='auto', n_components=4,
                     random_state=None, svd_solver='full', tol=0.0,
                     whiten=False)),
                ('poly',
                 PolynomialFeatures(degree=2, include_bias=True,
                                    interaction_only=False, order='C')),
                ('randomForest',
                 RandomForestRegressor(bootstrap=True, ccp_alpha=0.0,
                                       criterion='friedman_mse', max_depth=16,
                                       max_features=4, max_leaf_nodes=None,
                                       max_samples=None,
                                       min_impurity_decrease=0.0,
                                       min_impurity_split=None,
                                       min_samples_leaf=1, min_samp

In [22]:
# Checking performance
mean_squared_error(y_final_CO,pipe.predict(X_final_CO))

0.2808514010018447

# Saving Model in Pickle file

In [23]:
import pickle

In [24]:
pickle.dump(pipe,open('RandomForest_CO_pred.pkl','wb'))

# Creating Flask app

## Note : Creating Flask app and hosting model is in app.py file

# References & Citation

References & Citation
Heysem Kaya, PÄ±nar TÃ¼fekci and ErdinÃ§ Uzun. 'Predicting CO and NOx emissions from gas turbines: novel data and a benchmark PEMS', Turkish Journal of Electrical Engineering & Computer Sciences, vol. 27, 2019, pp. 4783-4796

x-----------------------x-----------------------x------------------------x-----------------------x----------------------------x----------------------x