In [1]:
import matplotlib.pyplot as plt
import numpy as np
from matplotlib import animation

# Gradient Descent intuition

## Preparing with helper functions
In order to show a visual representation of the gradient descent, we need a representation of some cost function $J(w)$, a way to represent the derivative of the cost function at some point based on the tangent line that passes at that point, and... we need to animate this whole process.

In [13]:
def j(w):
    return w**2 - 10*w + 30

def derivative(f, x):
    h = 0.001 
    return (f(x + h) - f(x))/h

def draw_cost_function(ax):
    x = np.linspace(0, 10, 1000)
    y = j(x)
    ax.set_ylim([0, 33])
    ax.plot(x, y, 'gray')
    ax.set_ylabel("J (W) - Error function")
    ax.set_xlabel("W")
    
def draw_tangent_line(ax, w):
    x = np.linspace(w - 1, w + 1, 100)
    y = derivative(j, w) * (x - w) + j(w)
    ax.plot(x, y, 'g')
    
def draw_annotations(ax, w, min_w):
    ax.plot(min_w, j(min_w), 'ko')
    ax.plot(w, j(w), 'ro')
    ax.annotate('we are here', xy=(w, j(w)), xytext=(5, 25),
                arrowprops=dict(facecolor='black', shrink=0.001))
    ax.annotate('minimum error', xy=(min_w, j(min_w)), xytext=(1, 1),
                arrowprops=dict(facecolor='black', shrink=0.001))

In [14]:
start_w = 9
minimum_w = 5    
step = 0.5

def animate(i):
    ax.cla()
    
    current_w = start_w - step*i
    
    draw_cost_function(ax)
    draw_tangent_line(ax, current_w)
    draw_annotations(ax, current_w, minimum_w)

In [15]:
%%capture
plt.rcParams["animation.html"] = "jshtml"
plt.ioff()

fig, ax = plt.subplots(figsize=(6, 6))

final_animation = animation.FuncAnimation(fig,
                                          animate,
                                          frames=int((start_w-minimum_w)/step)+1,
                                          interval=800,
                                          repeat=True)

In [18]:
final_animation.save('test.gif', writer='imagemagick', fps=2)

MovieWriter imagemagick unavailable; using Pillow instead.


## Seeing the final animation in action

In [16]:
final_animation

# Adaline: The math behind the train formula


The training process consists in updating connection weights by a $\Delta$ that will increase the prediction ability of the network:

$
w = w + \Delta
$

&nbsp;

Using gradient descent we compute $\Delta$ based on:
- A cost function that expresses how much is the network's error going to be based on a weight value chosen; 
- A learning rate parameter $\mu$ that determines the step size to apply to weight updates.

$
w = w - \mu \cdot \frac{\partial}{\partial w}J(w)
$

&nbsp;

In the context of Adaline we can use Mean Squared Error (MSE) as cost/error function, where $i$ identifies a sample from the training dataset:

$J(w) = MSE = \frac{1}{n}\sum_{i=1}^{n}(y-t)^2$

$
w = w - \mu \cdot \frac{\partial}{\partial w}\frac{1}{n}\sum_{i} (y - t)^2
$

$
\Leftarrow\kern-4pt\Rightarrow w = w - \mu \cdot \frac{1}{n}\sum_{i} \frac{\partial}{\partial w} (y - t)^2
$

$
\Leftarrow\kern-4pt\Rightarrow w = w - \mu \cdot \frac{1}{n}\sum_{i} 2(y-t)\frac{\partial}{\partial w} (y - t)
$

$
\Leftarrow\kern-4pt\Rightarrow w = w - \mu \cdot \frac{1}{n}\sum_{i} 2(y-t)\frac{\partial}{\partial w} (y - t)
$

$
\Leftarrow\kern-4pt\Rightarrow w = w - \mu \cdot \frac{1}{n}\sum_{i} 2(y-t)\frac{\partial}{\partial w} (y - f(wx + b))
$

Adaline's activation function $f(z)$ is the linear function, meaning $f(z)=z$

$
\Leftarrow\kern-4pt\Rightarrow w = w - \mu \cdot \frac{1}{n}\sum_{i} 2(y-t)\frac{\partial}{\partial w} (y - wx + b)
$

$
\Leftarrow\kern-4pt\Rightarrow w = w - \mu \cdot \frac{1}{n}\sum_{i} 2(y-t) \cdot x
$

&nbsp;

Final formula (for a single weight, for each sample of the dataset):

$
w = w - \mu \cdot 2(y-t) \cdot x
$