# Gradient Descent

    1.What is Gradient Descent?
    
        Optimization is a big part of machine learning. Almost every machine learning,algorithm has an 
        optimization algorithm at its core.
        
        Gradient descent is an optimization algorithm used to find the values of parameters (coefficients) of a 
        function (f) that minimizes a cost function (cost).
        
        Gradient descent is best used when the parameters cannot be calculated nalytically or when you need an 
        optimized way to calculate those parameters.
        
    2.Importance of Learning Rate:
    
        Learning rate controls how much the coefficients can change on each iteration. Therefore, it is very 
        important to decide a good learning rate. 
        
        A bad learning rate can make the cost at next iteration higher than the cost at previous iteration, and 
        eventually the cost will become infinity. This process is known as over-shooting.

        Generally, learning rates should be as small as possible so that the cost will decrease slowly and 
        eventually reaches its minimum.The learning rate value is a small real value such as 0.1, 0.001 or 
        0.0001. Try different values for your problem and see which works best.


# 1. One feature gradient descent

In [66]:
import numpy as np

In [67]:
data = np.loadtxt("data.csv", delimiter=",")
data.shape

(100, 2)

In [68]:
def step_gradient(data, learning_rate, m , c):
    m_slope = 0
    c_slope = 0
    M = len(data)
    for i in range(M):
        x = data[i, 0]
        y = data[i, 1]
        m_slope += (-2/M)* (y - m * x - c)*x
        c_slope += (-2/M)* (y - m * x - c)
    new_m = m - learning_rate*m_slope
    new_c = c - learning_rate*c_slope
    return new_m, new_c

def gd(data, learning_rate, num_iterations):
    m = 0
    c = 0
    for i in range(num_iterations):
        m, c = step_gradient(data, learning_rate, m , c)
        print(i, " Cost: ", cost(data, m, c))
    return m, c

def cost(data, m, c):
    total_cost = 0
    M = len(data)
    for i in range(M):
        x = data[i, 0]
        y = data[i, 1]
        total_cost += (1/M)*((y - m*x - c)**2)
    return total_cost

def run():
    data = np.loadtxt("data.csv", delimiter=",")
    learning_rate = 0.0001
    num_iterations = 100
    m, c = gd(data, learning_rate, num_iterations)
    print(m, c)

In [69]:
run()

0  Cost:  1484.5865574086486
1  Cost:  457.8542575737672
2  Cost:  199.5099857255389
3  Cost:  134.50591058200533
4  Cost:  118.1496934223995
5  Cost:  114.0341490603815
6  Cost:  112.99857731713657
7  Cost:  112.73798187568467
8  Cost:  112.6723843590911
9  Cost:  112.65585181499745
10  Cost:  112.65166489759581
11  Cost:  112.6505843615011
12  Cost:  112.65028544701502
13  Cost:  112.65018320293967
14  Cost:  112.650130445072
15  Cost:  112.65009013922885
16  Cost:  112.6500529669463
17  Cost:  112.65001658353178
18  Cost:  112.64998039901865
19  Cost:  112.64994426496071
20  Cost:  112.64990814400622
21  Cost:  112.64987202675677
22  Cost:  112.64983591084761
23  Cost:  112.64979979568368
24  Cost:  112.64976368111523
25  Cost:  112.64972756710469
26  Cost:  112.64969145364236
27  Cost:  112.64965534072611
28  Cost:  112.64961922835512
29  Cost:  112.64958311652944
30  Cost:  112.64954700524868
31  Cost:  112.64951089451318
32  Cost:  112.64947478432279
33  Cost:  112.64943867467744

# 2. N features gradient descent

In [1]:
import numpy as np
import pandas as pd
from sklearn import preprocessing
train_data=np.genfromtxt('ccpp_x_y_train.csv',delimiter=',')
test_data=np.genfromtxt('ccpp_x_test.csv',delimiter=',')

In [2]:
train_data.shape

(7176, 5)

In [3]:
test_data.shape

(2392, 4)

In [4]:
x_train=train_data[:,0:4]
x_train=preprocessing.scale(x_train)
y_train=train_data[:,4]

In [5]:
x_train=pd.DataFrame(x_train)
x_train[4]=1
x_train=np.array(x_train)
x_train.shape

(7176, 5)

In [6]:
test_data=preprocessing.scale(test_data)
test_data=pd.DataFrame(test_data)
test_data[4]=1
test_data=np.array(test_data)
test_data.shape

(2392, 5)

In [7]:
def single_gradient(x_train,y_train, learning_rate,m):  
    m_slope = [0 for i in range(x_train.shape[1])]
    M = x_train.shape[0]
    N = x_train.shape[1]
    for j in range(N):
        for i in range(M):
            x = x_train[i]
            y = y_train[i]
            a = np.dot(m,x)
            m_slope[j] += (-2/M)* (y - a)*x[j]
    m_slope=np.array(m_slope)
    new_m=m-m_slope*learning_rate   
    return new_m
            
def gd(x_train,y_train, learning_rate, num_iterations):
    m = [0 for i in range(x_train.shape[1])]
    for i in range(num_iterations):
        m = single_gradient(x_train,y_train,learning_rate,m)
        #print(i, " Cost: ", cost(x_train,y_train,m))
    return m

def cost(x_train,y_train, m):
    total_cost = 0
    M = x_train.shape[0]
    for i in range(M):
        x = x_train[i]
        y = y_train[i]
        a = np.dot(m,x)
        total_cost += (1/M)*((y - a)**2)
    return total_cost

def run():
    learning_rate = 0.4
    num_iterations = 125
    m = gd(x_train,y_train, learning_rate, num_iterations)
    return m

In [8]:
run()

array([-1.49253583e+01, -2.91522033e+00,  3.64818800e-01, -2.32802951e+00,
        4.54431293e+02])

In [9]:
def predict(x_test):
    m=run()
    y_predict=[]
    for i in range(len(x_test)):
        a=sum(x_test[i]*m)
        y_predict.append(a)
    return y_predict

In [10]:
def score(y_true,y_predict):
    u=sum((y_true-y_predict)**2)
    v=sum((y_true-y_true.mean())**2)
    return 1-u/v

In [11]:
y_predict=predict(x_train)
y_predict

[478.8411929804532,
 450.5051681999032,
 460.65060214877195,
 428.8707406535999,
 475.69464743137587,
 440.2271388420766,
 477.51949564420295,
 476.713616893481,
 429.3159790427993,
 454.2465180077911,
 458.3839793948944,
 467.24402889517216,
 469.7810633284716,
 487.2852760198615,
 466.6975180813703,
 431.17202051173143,
 461.8122798023015,
 444.3218509690722,
 453.36355727205006,
 437.3339045956542,
 439.0449174597304,
 466.26722950095524,
 473.38496457147073,
 440.00341238773103,
 463.508771024802,
 446.27209046594686,
 432.2720207787327,
 442.75182992496275,
 480.9661089125511,
 473.06565953109475,
 439.4787778799652,
 439.69622343198705,
 447.1317423370078,
 477.24798977831944,
 442.23460434758016,
 476.9694941693071,
 428.51461377997464,
 448.93772615846035,
 452.3631516332362,
 460.25905515662953,
 473.7185695953103,
 443.8500399199295,
 461.5510646020667,
 443.31980778498536,
 467.25948625545294,
 483.8990257650675,
 441.4855933227399,
 460.07906423229116,
 430.50753498672776,


In [12]:
y_true=y_train
score(y_true,y_predict)

0.928751691014062