# Gradient Descent - Combined Cycle Power Plant

Combined Cycle Power Plant dataset contains 9568 data points collected from a Combined Cycle Power Plant over 6 years (2006-2011), when the power plant was set to work with full load. Features consist of hourly average ambient variables Temperature (T), Ambient Pressure (AP), Relative Humidity (RH) and Exhaust Vacuum (V) to predict the net hourly electrical energy output (EP) of the plant.

##### You are given:
1. A Readme file for more details on dataset. 
2. A Training dataset csv file with X train and Y train data
3. A X test File and you have to predict and submit predictions for this file.
##### Your task is to:
1. Code Gradient Descent for N features and come with predictions.
2. Try and test with various combinations of learning rates and number of iterations.
3. Try using Feature Scaling, and see if it helps you in getting better results. 
##### Read Instructions carefully
1. Use Gradient Descent as a training algorithm and submit results predicted.
2. Files are in csv format, you can use genfromtxt function in numpy to load data from csv file. Similarly you can use savetxt function to save data into a file.
3. Submit a csv file with only predictions for X test data. File should not have any headers and should only have one column i.e. predictions. Also predictions shouldn't be in exponential form.
4. Your score is based on coefficient of determination. So it can be possible that nobody gets full score.


Get Data

In [40]:
import numpy as np

data = np.genfromtxt("DataSets\\0000000000002419_training_ccpp_x_y_train.csv", delimiter = ',', skip_header=0)
data_test = np.genfromtxt("DataSets\\0000000000002419_test_ccpp_x_test.csv", delimiter = ',')

X_train = data[:, :4]
y_train = data[:, 4]

X_test = data_test

Feature Scaling

In [99]:
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
scaler.fit(X_train)
X_train_standard = scaler.transform(X_train)
X_test_standard = scaler.transform(X_test)

data_standard = np.concatenate((X_train_standard, y_train), axis = 1)


Gradient Descent

In [61]:
def step_gradient(points, learning_rate, m):
    m_slope = np.zeros(5)
    N = len(points)
    for i in range(N):
        x = points[i, 0:4]
        x = np.append(x, 1)
        y = points[i, 4]
        for j in range(5):
            m_slope[j] += (-2/N) * (y - (m * x).sum()) * x[j]
        m = m - (learning_rate * m_slope)
    return m

def gradient_n(points, learning_rate, num_iterations):
    m = np.zeros(5)
    _cost = 0
    for i in range(num_iterations):
        m = step_gradient(points, learning_rate, m)
        _cost = cost(points, m)
        # print(i, " Cost:\t", _cost)
    print(f"Cost after {num_iterations} iterations: {format(_cost, '.10f')}")
    return m

def cost(points, m):
    total_cost = 0
    N = len(points)
    for i in range(N):
        x = points[i, 0:4]
        x = np.append(x, 1)
        y = points[i, 4]
        total_cost += (1/N)*((np.sum(m * x) - y)**2)
    return total_cost


In [90]:
learning_rate = 0.0001
num_iterations = 500
m = gradient_n(data_standard, learning_rate, num_iterations)
print(m[:4], m[4])

Cost after 500 iterations: 20.9174036854
[-14.91110054  -2.83875397   0.35980985  -2.38533274] 454.457522471477


In [91]:
learning_rate = 0.0001
num_iterations = 100
m = gradient_n(data_standard, learning_rate, num_iterations)
print(m[:4], m[4])

Cost after 100 iterations: 20.9173543617
[-14.90685845  -2.84188676   0.36061708  -2.38395224] 454.4575257440055


Less cost model

In [89]:
learning_rate = 0.0001
num_iterations = 1000
m = gradient_n(data_standard, learning_rate, num_iterations)
print(m[:4], m[4])

Cost after 1000 iterations: 20.9174036854
[-14.91110054  -2.83875397   0.35980985  -2.38533274] 454.457522471477


Predict using test dataset

In [105]:
n = X_test_standard.shape[0]
y_t = []
m = m[:4]

for i in range(n):
    x = X_test_standard[i,:]
    y_t.append(np.sum(x*m) + c)

print("Predicted output mean: ", np.mean(y_t))

Predicted output mean:  454.24754174883486


Saving

In [107]:
np.savetxt("DataSets\\predictions_PowerPlant.csv", y_t, delimiter =',', fmt='%.10f')

for i in y_t:
    print(i)

469.89492074096233
471.7058724402789
433.9335925770952
457.2166882365718
464.57798596899426
448.2159466139801
478.22713309075203
446.7592409792283
483.77364913662524
440.03569736134364
434.3639096813245
431.7478804008523
472.3948812065355
463.0137787094754
444.1796625751031
456.58738019287875
488.11672375678677
447.7442400063448
426.7202142507176
438.3950453989508
439.4674942945576
483.0412744835315
459.7590025012573
475.3388653013832
431.50405560102064
434.22771820363903
467.4993521739899
470.0346987950671
432.5058360377289
476.46067163386704
443.28886700843
431.2760309702694
450.07348046664737
470.56834954581916
468.7990248726324
472.2775527327874
446.62660455124006
455.50895219922785
445.68857895814284
481.0472488541252
465.80077607979973
434.2960612460637
473.2008844543352
466.90580430202766
462.12717816152724
485.137291128152
436.4324281006878
430.8091645474216
440.4185957737529
475.40213983713795
472.73237245746276
465.150453970771
468.661275308344
434.1747912079635
464.631245015