# Gradient descent (vanilla)

The main idea of this method is to find minimum following the anti-gradient.

The **gradient** of the function (in a common sense) is a **vector** of derivatives with respect to different arguments of the function. Each partial derivative in this vector shows the speed of growth of the function by argument. The vector itself shows the direction in which the function grows fastest


Let's consider an example $f(x) = x^2 + x - 6$

It has only one argument so its gradient will consist of only one variable

$\nabla f (x) = [2x + 1]$


In [24]:
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.animation import FuncAnimation
%matplotlib notebook

In [25]:

def f_x(x):
    return x ** 2 + x - 6
def grad_f(x):
    return 2*x + 1

In [26]:

fig, ax = plt.subplots(figsize=(5,5))
t = np.linspace(-7, 7, 100)
x = []
y = []
# function that draws each frame of the animation
ax.clear()
def animate(i):
    x.append(t[i])
    y.append(f_x(t[i]))

    ax.clear()
    ax.set_xlim(-7,7)
    ax.set_ylim(-10, 60)
    ax.plot(x, y)
    ax.arrow(t[i],f_x(t[i]), grad_f(t[i]), 0, color='r', head_width=0.7, lw=2)


ani = FuncAnimation(fig, animate, frames=len(t), interval=50, repeat=False)
plt.show()

<IPython.core.display.Javascript object>

So if we follow anti-gradient, we will come to **local** minimum

Let $w = [x_1,..,x_d]$  


$w_0 = initial$

The step is:

$w_{i+1} = w_i - \alpha * \nabla f(w_i)$


where $\alpha$ is a coefficient called "learning rate"
there are many approaches of using this coefficient

So let's try to get to the minimum of the function $ f $ via GD

We will set alpha to 0.05


In [64]:
alpha = 0.05

x_history = [10]
y_history = []
MAX_STEPS = 500

fig, ax = plt.subplots(figsize=(5,5))
def animate(i):
    current_x = x_history[i]
    y_history.append(f_x(current_x))
    
    

    ax.clear()
    ax.set_xlim(-7,7)
    ax.set_ylim(-10, 60)
    ax.plot(t, f_x(t))
    ax.plot(x_history, y_history)
    ax.arrow(current_x, f_x(current_x), - alpha*grad_f(current_x), 0, color='r', head_width=0.7, lw=2)
    x_history.append(current_x - alpha * grad_f(current_x))

t = np.linspace(-7, 7, 100)
ax.plot(t, f_x(t))


ani = FuncAnimation(fig, animate, frames=MAX_STEPS, interval=100, repeat=False)
plt.show()

<IPython.core.display.Javascript object>

In [63]:
x_history[-1]

-0.49996948362512994