For this project, the data set can be found in the following directory:

Training Data : Datasets/gd_project2_train.csv

Testing Data : Datasets/gd_project2_test.csv

### Information about the dataset
The dataset contains 9568 data points collected from a Combined Cycle Power Plant over 6 years (2006-2011), when the power plant was set to work with full load. Features consist of hourly average ambient variables Temperature (T), Ambient Pressure (AP), Relative Humidity (RH) and Exhaust Vacuum (V) to predict the net hourly electrical energy output (EP)  of the plant.
A combined cycle power plant (CCPP) is composed of gas turbines (GT), steam turbines (ST) and heat recovery steam generators. In a CCPP, the electricity is generated by gas and steam turbines, which are combined in one cycle, and is transferred from one turbine to another. While the Vacuum is colected from and has effect on the Steam Turbine, he other three of the ambient variables effect the GT performance.
For comparability with our baseline studies, and to allow 5x2 fold statistical tests be carried out, we provide the data shuffled five times. For each shuffling 2-fold CV is carried out and the resulting 10 measurements are used for statistical testing.

Attribute Information:

Features consist of hourly average ambient variables 
- Temperature (T) in the range 1.81°C and 37.11°C,
- Ambient Pressure (AP) in the range 992.89-1033.30 milibar,
- Relative Humidity (RH) in the range 25.56% to 100.16%
- Exhaust Vacuum (V) in the range 25.36-81.56 cm Hg
- Net hourly electrical energy output (EP) 420.26-495.76 MW
The averages are taken from various sensors located around the plant that record the ambient variables every second. The variables are given without normalization.

In [9]:
import numpy as np
import pandas as pd

In [10]:
def cost(data,Y,m):
    Cost=0
    M=len(data)
    for i in range(len(data)):
        Cost+=(1/M)*((data[i].dot(m))-Y[i])**2
    return Cost

In [11]:
def step_gradient(data,Y,learningRate,m):
    M=len(data)
    N=data.shape[1]
    sum_=np.zeros((N,1)) 
    m=m-(-2/M)*learningRate*(data.T).dot(Y.reshape(7176,1)-data.dot(m))   
    return m

In [12]:
def gradient_descent(data,Y,learningRate,numIter):
    N=data.shape[1]
    m=np.zeros((N,1))
    
    for i in range(numIter):        
        m=step_gradient(data,Y,learningRate,m) 
    return m

In [18]:
def run():
    #loading data
    data_input=np.loadtxt('Datasets/gd_project2_train.csv',delimiter=',')
    data_test=np.loadtxt('Datasets/gd_project2_test.csv',delimiter=',')
    #checking the shape
    print(data_input.shape)
    print(data_test.shape)
    # Spliting intp inputs and outputs for the algorithm
    X=data_input[:,0:4]
    Y=data_input[:,4]
    data_train=np.c_[np.ones(len(X)),X ]
    data_test=np.c_[np.ones(len(data_test)),data_test ] 
        
    
     #adding some extra features
    for i in range(1,X.shape[1]+1):
        for j in range(i,X.shape[1]+1):
            col=data_train[:,i]*data_train[:,j]
            col_test=data_test[:,i]*data_test[:,j]

            #print myData.shape,col.shape
            data_train=np.append(data_train,col.reshape(7176,1),axis=1)
            data_test=np.append(data_test,col_test.reshape(2392,1),axis=1)

            
    #normalize features code
    for i in range(1,data_train.shape[1]):
        Avg=data_train[:,i].mean()
        deviation=data_train[:,i].std()
        data_train[:,i]=(data_train[:,i]-Avg)/deviation
    #end of normalization
    
    #normalize features code
    for i in range(1,data_test.shape[1]):
        Avg=data_test[:,i].mean()
        deviation=data_test[:,i].std()
        data_test[:,i]=(data_test[:,i]-Avg)/deviation
    #end of normalization

            
    #normalize features code
    for i in range(1,data_train.shape[1]):
        Avg=data_train[:,i].mean()
        deviation=data_train[:,i].std()
        data_train[:,i]=(myData[:,i]-Avg)/deviation
    #end of normalization
    
    #normalize features code
    for i in range(1,data_test.shape[1]):
        Avg=data_test[:,i].mean()
        deviation=data_test[:,i].std()
        data_test[:,i]=(data_test[:,i]-Avg)/deviation
    #end of normalization


    learningRate=0.00003
    numIterations=100000
    m=gradient_descent(data_train,Y,learningRate,numIterations)
    print(m)
    
    y_predicted=data_test.dot(m)
    print(y_predicted.shape)
    np.savetxt('output_GasPower.csv',y_predicted,fmt='%.5f')

In [None]:
run()