# Gradient Descent<hr>

Neural networks need to find optimal parameters(weight and bias) in the learning phase

Using gradient to find the minimum of a function (cost or loss function) is a gradient descent method

### Here is the formula
![gradient_descent_fomula](./images/gradient_descent_fomula.PNG)

This is a one-time update and iterates to minimize function values.
<br>\\(\eta\\) is learning rate in Neural networks<br>
It can be implemented as follows


In [8]:
def numerical_gradient(f, x):
    h = 1e-4
    grad = np.zeros_like(x)
    
    for idx in range(x.size):
        # save origin value
        tmp_val = x[idx]
        
        # calculate f(x+h)
        x[idx] = tmp_val + h
        fxh1 = f(x)
        
        # calculate f(x-h)
        x[idx] = tmp_val - h
        fxh2 = f(x)
        
        grad[idx] = (fxh1 - fxh2) / (2*h)
        
        # restore origin value
        x[idx] = tmp_val
        
    return grad

def gradient_descent(f, init_x, lr=0.01, step_num=100):
    x = init_x
    
    for i in range(step_num):
        grad = numerical_gradient(f, x)
        x -= lr * grad
        if i % 10 == 0:
            print('step: ', i, 'x: ', x)
    return x

*def numerical_gradient* finds gradients from function.<br>
In *def gradient_descent*, f is the function you want to optimize, lr is learning rate.

### Example 1

In [14]:
import numpy as np 

def function_2(x):
    return x[0]**2 + x[1]**2
init_x = np.array([-3.0, 4.0])
gradient_descent(function_2, init_x=init_x, lr=0.1, step_num=100)

step:  0 x:  [-2.4  3.2]
step:  10 x:  [-0.25769804  0.34359738]
step:  20 x:  [-0.02767012  0.03689349]
step:  30 x:  [-0.00297106  0.00396141]
step:  40 x:  [-0.00031901  0.00042535]
step:  50 x:  [-3.42539446e-05  4.56719262e-05]
step:  60 x:  [-3.67798930e-06  4.90398573e-06]
step:  70 x:  [-3.94921094e-07  5.26561458e-07]
step:  80 x:  [-4.24043296e-08  5.65391061e-08]
step:  90 x:  [-4.55313022e-09  6.07084029e-09]


array([-6.11110793e-10,  8.14814391e-10])

The result are almost zero.<br>
zero is minimize value so we just got correct result from gradient descent