Notebook Link:
[link text](https://colab.research.google.com/drive/1JJ3iZ15UxIVr9mVDuRaoFfHTYG5DvnaN?usp=sharing)



**Combined cycle power plant:**
A combined cycle power plant is an assembly of heat engines that work in tandem from the same source of heat, converting it into mechanical energy. On land, when used to make electricity the most common type is called a combined cycle gas turbine (CCGT) plant. The same principle is also used for marine propulsion, where it is called a combined gas and steam (COGAS) plant. Combining two or more thermodynamic cycles improves overall efficiency, which reduces fuel costs.

The principle is that after completing its cycle in the first engine, the working fluid (the exhaust) is still hot enough that a second subsequent heat engine can extract energy from the heat in the exhaust. Usually the heat passes through a heat exchanger so that the two engines can use different working fluids.

By generating power from multiple streams of work, the overall efficiency of the system can be increased by 50–60%. That is, from an overall efficiency of say 34% (for a simple cycle), to as much as 64% (for a combined cycle).[1] This is more than 84% of the theoretical efficiency of a Carnot cycle. Heat engines can only use part of the energy from their fuel, so in a non-combined cycle heat engine, the remaining heat (i.e., hot exhaust gas) from combustion is wasted.



In [5]:
pip install google-cloud-storage

Collecting google-cloud-storage
  Downloading google_cloud_storage-2.4.0-py2.py3-none-any.whl (106 kB)
     -------------------------------------- 107.0/107.0 kB 1.0 MB/s eta 0:00:00
Collecting google-resumable-media>=2.3.2
  Downloading google_resumable_media-2.3.3-py2.py3-none-any.whl (76 kB)
     -------------------------------------- 76.9/76.9 kB 605.6 kB/s eta 0:00:00
Collecting google-cloud-core<3.0dev,>=2.3.0
  Downloading google_cloud_core-2.3.1-py2.py3-none-any.whl (29 kB)
Collecting google-crc32c<2.0dev,>=1.0
  Downloading google_crc32c-1.3.0-cp39-cp39-win_amd64.whl (27 kB)
Installing collected packages: google-crc32c, google-resumable-media, google-cloud-core, google-cloud-storage
Successfully installed google-cloud-core-2.3.1 google-cloud-storage-2.4.0 google-crc32c-1.3.0 google-resumable-media-2.3.3
Note: you may need to restart the kernel to use updated packages.


In [None]:
pip install google-colab

In [6]:
from google.colab import drive

ModuleNotFoundError: No module named 'google.colab'

In [None]:
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).

**Importing necessary liabraries**

In [None]:
import os
import pandas as pd  # First, we'll import Pandas, a data processing and CSV file I/O library
import numpy as np
from datetime import datetime

from sklearn import preprocessing,metrics 

import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

In [None]:
import pandas as pd, numpy as np, matplotlib.pyplot as plt

**Reading Dataset**

In [None]:
data = pd.read_excel('/content/drive/My Drive/CCPP.xlsx')

In [None]:
df = pd.DataFrame(data)

**Renaming Dataset**

In [None]:
df.rename(columns={'AT': 'Average Temperature', 'V': 'Exhaust Vacuum','AP': 'Ambient Pressure',
                   'RH': 'Relative Humidity','PE': 'Net Hourly Electrical Energy Output'}, inplace=True)

***Descriptive Statistical Data Analysis***

All the values are numerical and continuous values,so there is not need to transform the data into numerical values as seen below:



In [None]:
df.info()

There is no any missing value in the data set as seen below:

In [None]:
df.isnull().sum()

As we see there is no null value in our dataset.

We can see all the statistical information for all features and the target column below:

In [None]:
df.describe()



1.   Ambient Temperature (Average Temperature) in the range 1.81°C and 37.11°C
2.   Ambient Pressure (AP) in the range 992.89–1033.30 millibar
3.   Relative Humidity (RH) in the range of 25.56% to 100.16% 
4.   Exhaust Vacuum (V) in the range 25.36–81.56 cm Hg
5.   Net hourly electrical energy output (PE) 420.26–495.76 MW 


**Correlation**

In [None]:
df.corr()["Net Hourly Electrical Energy Output"].sort_values(ascending=False)

As seen above there is strong **positive** correlation between **Net Hourly Electrical Energy Output and Ambient Pressure** while very strong **negative** correlation between **Net Hourly Electrical Energy Output and Average Temperature or Exhaust Vacuum.** Lets visualize this correlation with seaborn heatmap below:

In [None]:
import seaborn as sns
plt.figure(figsize = (7, 5))
sns.heatmap(df.corr(), annot = True)

In [None]:
df_1 = df['Average Temperature']
df_2 = df[['Average Temperature', 'Exhaust Vacuum']]
df_3 = df[['Average Temperature', 'Exhaust Vacuum', 'Relative Humidity']]
df_4 = df[['Average Temperature', 'Exhaust Vacuum', 'Ambient Pressure', 'Relative Humidity']]
y = df['Net Hourly Electrical Energy Output']

In [None]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(df_1, y, test_size = 0.2, random_state = 0)

In [None]:
from sklearn.linear_model import LinearRegression
from sklearn import linear_model
regressor = linear_model.LinearRegression()
model = regressor.fit(X_train.values.reshape(-1,1), y_train)



*   From this we can say that **Ambient Pressure** and **Relative Humidity** has some outliers.
*   We need to analyze that wheather it is outlier or extreme values.



**Checking for the distribution using violin plot:**

In [None]:
for i in range(len(df.columns)):
    sns.distplot(df.iloc[:,i])
    plt.show()



*   Average Temperature - Temperature is more or less normally distributed.
*   Ambient Pressure - Pressure is normally distrubuted
*   Relative Humidity - Humidity is left skewed.
*   Net Hourly Electrical Energy Output - It has 2 peaks with normally distributed plot.



***Checking linearity:***

In [None]:
sns.pairplot(df)
print(df.corr(method="spearman"))

col=df.columns
print(col)



*   Relation between **Average Temperature** vs **Exhaust Vacuum** has strong **positive** correlation as 0.8 - says that temperature and vaccum has strong **positive** correlation.

*   **Average Temperature** vs **Net Hourly Electrical Energy Output** has strong **negative** correlation as -0.9 - says that temeperature vs electrical output as strong **negative** correlation. In which, ambient temperature increases the energy output is decreses.

*   **Exhaust Vacuum** vs **Average Temperature** has strong **positive** correlation as 0.8
*   **Exhaust Vacuum** vs **Net Hourly Electrical Energy Output** has strong **negative** correlation as -0.8 - In which, the exhaust vaccum is high then the energy output is very low and the vice versa



**Normalizing the data:**

In [None]:
from sklearn import preprocessing
df_nor=preprocessing.normalize(df)
print(df_nor)
df_nor=pd.DataFrame(df_nor)
print(df_nor)
sns.pairplot(df_nor)
df_nor.corr()
df_nor.columns = df.columns
df_nor.head()
x=df_nor.iloc[:,0:4]
y=df_nor.iloc[:,4]




*   After normalizing the data :
*   The correlation of **Ambient Pressure** is showing a strong **negative** correlation.



**Data Visualization:**

In [None]:
sns.set_style("darkgrid")

In [None]:
plt.figure(figsize=(12,10))
sns.histplot(data=df,x="Net Hourly Electrical Energy Output",color="red",kde=True)
plt.axvline(x=df["Net Hourly Electrical Energy Output"].mean(),ymax=0.55,color="green",linestyle='--',label="Mean")
plt.axvline(x=df["Net Hourly Electrical Energy Output"].median(),ymax=0.56,color="purple",linestyle='--',label="Median")
plt.legend()
plt.title("Histogram of the Target Column")

The data visualization of **Net Hourly Electrical Energy Output** using Histogram. The **Green** -- indicates **Mean** and  **Purple** -- indicates **Median**.

In [None]:
plt.figure(figsize = (12,10))
sns.histplot(df["Net Hourly Electrical Energy Output"],kde=True,bins=40,color="red",cumulative=True)
plt.title("Cumulative of the Target Column")

In [None]:
sns.pairplot(df,
                 markers="+",
                 kind='reg',
                 diag_kind="kde",
                 plot_kws={'line_kws':{'color':'#aec6cf'},
                           'scatter_kws': {'alpha': 0.5,
                                           'color': '#82ad32'}},
               
                 diag_kws= {'color': '#82ad32'})


**Regration:**

In [None]:
plt.figure(figsize = (15,10))
sns.regplot(data=df, x="Average Temperature", y="Net Hourly Electrical Energy Output",color="red")

The above plot is **Average Temperature** Vs **Net Hourly Electrical Energy Output**. The regration line indicates the strong **Negative** Correlation between them.



In [None]:
plt.figure(figsize = (15,10))
sns.regplot(data=df, x="Exhaust Vacuum", y="Net Hourly Electrical Energy Output",color="green")

The above plot is **Exhaust Vacuum** Vs **Net Hourly Electrical Energy Output** . The regration line indicates the strong **Negative** Correlation between them.



In [None]:
plt.figure(figsize = (15,10))
sns.regplot(data=df, x="Ambient Pressure", y="Net Hourly Electrical Energy Output",color="purple")

The above plot is **Ambient Pressure** Vs **Net Hourly Electrical Energy Output**. The regration line indicates the strong **Positive** Correlation between them.

