Combined Cycle Power Plant dataset contains 9568 data points collected from a Combined 
Cycle Power Plant over 6 years (2006-2011), when the power plant was set to work with full load. 
Features consist of hourly average ambient variables Temperature (T), Ambient Pressure (AP), 
Relative Humidity (RH) and Exhaust Vacuum (V) to predict the net hourly electrical energy output 
(EP) of the plant.

You are given:
1. A Readme file for more details on dataset. 
2. A Training dataset csv file with X train and Y train data
3. A X test File and you have to predict and submit predictions for this file.

Your task is to:
1. Code Gradient Descent for N features and come with predictions.
2. Try and test with various combinations of learning rates and number of iterations.
3. Try using Feature Scaling, and see if it helps you in getting better results. 

Read Instructions carefully -
1. Use Gradient Descent as a training algorithm and submit results predicted.
2. Files are in csv format, you can use genfromtxt function in numpy to load data from csv file. 
   Similarly you can use savetxt function to save data into a file.
3. Submit a csv file with only predictions for X test data. File should not have any headers and 
   should only have one column i.e. predictions. Also predictions shouldn't be in exponential form.
4. Your score is based on coefficient of determination. So it can be possible that nobody gets full 
   score.

In [66]:
import numpy as np
from sklearn import preprocessing

In [67]:
# load traing data
data = np.loadtxt('training_ccpp_x_y_train.csv', delimiter = ',')
data.shape

(7176, 5)

In [57]:
# spilt data into features and response
x = data[:, :-1]
y = data[:, -1]
# check shape
x.shape, y.shape

((7176, 4), (7176,))

# Gradient Descent

In [58]:
def step_gradient(X, Y, m, learning_rate):
    M = len(X)
    N = len(X[0])
    m_slope = np.zeros(N)
    for i in range(M):
        x = X[i]
        y = Y[i]
        for j in range(N):
            m_slope[j] += (-2/M)*(y - sum(m*x))*x[j]
    new_m = m - (learning_rate*m_slope)
    return new_m

In [59]:
def gd(x, y, learning_rate, num_iterations):
    n_features = len(x[0])
    m = np.zeros(n_features)
    for i in range(num_iterations):
        m = step_gradient(x, y, m, learning_rate)
        print(i, 'Cost : ', cost(x, y, m))
    return m

In [60]:
def cost(x, y, m):
    total_cost = 0
    M = len(x)
    for i in range(M):
        total_cost += (1/M)*(y[i] - sum(m*x[i]))**2
    return total_cost

In [61]:
def gradient_descent(x, y):
    learning_rate = 0.1
    num_iterations = 600
    # adding 1 in x data
    x = np.append(x, np.ones(len(x)).reshape(-1, 1), axis = 1)
    m = gd(x, y, learning_rate, num_iterations)
    return m

In [62]:
# adding square values of each columns
square = []
for i in x:
    square.append(i**2)
# convert into np array
np_square = np.array(square)
# add into x data
x = np.append(x, np_square, axis = 1)

In [63]:
# apply feature scaling
scaler = preprocessing.StandardScaler()
scaler.fit(x)

StandardScaler()

In [64]:
x = scaler.transform(x)
m = gradient_descent(x, y)

0 Cost :  132210.3996461238
1 Cost :  84626.35732344464
2 Cost :  54172.64927958871
3 Cost :  34682.17114639684
4 Cost :  22208.1249040347
5 Cost :  14224.584203616907
6 Cost :  9114.967018832647
7 Cost :  5844.6656793817365
8 Cost :  3751.533342848077
9 Cost :  2411.796789631695
10 Cost :  1554.2412837659938
11 Cost :  1005.2892068329171
12 Cost :  653.8505450318584
13 Cost :  428.82728435359917
14 Cost :  284.71626776357976
15 Cost :  192.39504755884317
16 Cost :  133.22484351669308
17 Cost :  95.27644255979261
18 Cost :  70.91477748723037
19 Cost :  55.25305831190486
20 Cost :  45.16341613714744
21 Cost :  38.64371443572334
22 Cost :  34.41230714137092
23 Cost :  31.648682550861263
24 Cost :  29.82747409800969
25 Cost :  28.612225938177115
26 Cost :  27.787401656973778
27 Cost :  27.214868641307845
28 Cost :  26.806046945177044
29 Cost :  26.504083505401674
30 Cost :  26.272442653866783
31 Cost :  26.087603129534717
32 Cost :  25.93438331681687
33 Cost :  25.802948611401494
34 Cost 

272 Cost :  19.971580887168006
273 Cost :  19.963593573364896
274 Cost :  19.95566307303044
275 Cost :  19.947788978589738
276 Cost :  19.93997088540074
277 Cost :  19.93220839173322
278 Cost :  19.924501098748074
279 Cost :  19.916848610476197
280 Cost :  19.909250533797724
281 Cost :  19.9017064784217
282 Cost :  19.89421605686598
283 Cost :  19.886778884436033
284 Cost :  19.87939457920593
285 Cost :  19.87206276199754
286 Cost :  19.864783056361407
287 Cost :  19.85755508855697
288 Cost :  19.850378487532037
289 Cost :  19.843252884905407
290 Cost :  19.836177914945573
291 Cost :  19.829153214552708
292 Cost :  19.822178423239766
293 Cost :  19.81525318311342
294 Cost :  19.808377138855068
295 Cost :  19.801549937702735
296 Cost :  19.794771229432452
297 Cost :  19.788040666340184
298 Cost :  19.781357903223295
299 Cost :  19.77472259736273
300 Cost :  19.76813440850555
301 Cost :  19.761592998846147
302 Cost :  19.75509803300967
303 Cost :  19.74864917803364
304 Cost :  19.7422461

Load Testing data

In [71]:
m

array([-1.91869168e+01, -4.58748889e+00,  3.32176926e-01, -2.04188785e+00,
        5.30364693e+00,  1.08448250e+00,  2.18627357e-01,  1.67194023e-01,
        4.54431293e+02])

In [68]:
test_data = np.loadtxt('test_ccpp_x_test.csv', delimiter = ',')
test_data.shape

(2392, 4)

In [70]:
# adding square values of each columns
square = []
for i in test_data:
    square.append(i**2)
# convert into np array
np_square = np.array(square)
# add into x_test data
x_test = np.append(test_data, np_square, axis = 1)
# scaled
x_test_scaled = scaler.transform(x_test)
# adding 1 in x_test data
x_test_data = np.append(x_test_scaled, np.ones(len(x_test_scaled)).reshape(-1, 1), axis = 1)

In [73]:
y_pred = []
for i in x_test_data:
    y_pred.append(sum(i*m))

for i in y_pred:
    print(i)

np_y_pred = np.array(y_pred)

470.2540760391173
471.8335593983464
433.9155642467158
456.52313015075725
464.0909027434784
447.21278641761063
479.6030909325336
445.4150588076512
486.3193023942314
439.4766774053633
435.0580445944447
432.5277552452792
472.2689142916145
462.74862479271735
443.66446834845476
455.5563300962266
490.03529766078066
446.2622226883395
428.40202819685345
438.84598232846446
439.2066592289636
484.91132677382126
458.5903849814583
475.96723642866283
432.3284201503194
434.93753072922215
467.5896602273649
470.3053289500299
434.1828905843963
477.171980916283
442.0714596690465
432.5144163729854
448.60530950462885
470.56259295690063
469.04750440332924
472.8456241269437
446.00380447553863
454.2799530435525
444.20826866927115
482.82025761685753
465.3675192840217
434.3012393184586
473.6976432951594
467.34024489116877
460.44251404259444
484.9110327799154
437.0883112604414
431.193655081158
439.61477277324934
476.47759277385927
473.4153919322867
465.274375108943
467.8680600255415
435.8788751050689
464.3669551

491.48033953098155
473.42601803634705
464.2782333560929
476.03176398423295
448.70477159948575
459.56600494872305
440.25268908268157
428.71494862065896
440.91062471600594
450.7656718619027
463.8600131280922
470.1356841613352
452.34384244097714
433.82186301435047
441.6707496369328
445.43466005272916
430.61494873022923
447.81000716519077
461.162865644152
483.5030703984773
438.661316703637
457.0217859727271
449.0590432397671
447.2098307551579
447.17812960470843
484.25137070097645
479.8556071777326
477.2274883298668
437.48128868866394
460.98003185228646
443.08036092749336
473.4323563545832
451.72073173483534
444.0933006309794
435.09389342453784
453.9278476263389
442.2373909877074
434.8551231083208
481.50588460680484
443.2169231392325
433.33657609908664
447.2161606610934
479.3530716013722
461.14175283874647
454.06637047903877
450.9179192639915
438.20336963664533
471.64324636330554
454.3866413907701
480.1693414288438
473.1384865447481
468.9638470132029
478.8523862214049
467.82287466965715
439

448.52464648278186
447.99619327990945
464.57831176270463
441.33806991131456
450.0118874720824
469.57780831850516
430.70634288592737
477.5755716194329
454.0076229414626
447.45941475888276
442.6740794808068
477.2864309140734
431.15293490690345
446.6220385971862
473.83934964063906
434.88174829433575
437.5165116051936
460.96825579971437
430.2166614572971
453.2983269158795
444.392664813654
441.5910260641185
434.4238209876225
474.4368033840108
457.9400935246423
469.5517446303855
437.8115786906147
463.2406041028862
441.89424777245193
445.21718420850135
436.6578610035039
445.5762173092549
431.01205061138205
467.95829672658675
466.27114748839057
481.9294251036028
448.01558512769765
448.29737363275564
434.30923449627943
469.362483173996
461.51006461043403
434.1767512317703
479.13824032030465
461.0996269530429
444.1777206744784
473.3298120960188
448.8799795607875
453.8207903489988
451.7583050239297
480.8461270375011
429.0503772700308
442.5052957516603
481.2656753286458
475.9172481163718
449.34269

In [74]:
np.savetxt('ccpp_x_test_predicted.csv', np_y_pred, delimiter = ',', fmt = '%.5f')