<a href="https://colab.research.google.com/github/adammoss/MLiS2/blob/master/intro/backprop.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Let us again attempt to learn the XOR function using the same MLP network,  this time starting with random initial weights and using back-propogation with simple gradient descent.

The error functions for each neuron are
\begin{eqnarray}
\Delta_1^{(2)} &=& {\partial{J} \over \partial a_1^{(2)}}\,, \\
\Delta_1^{(1)} &=&  \Delta_1^{(2)} W_{11}^{(2)}   \Theta ( z_1^{(1)} )  \\
\Delta_2^{(1)} &=&  \Delta_1^{(2)} W_{21}^{(2)}  \Theta ( z_2^{(1)} )  \,,
\end{eqnarray}
since the Heavisde step function $\Theta$ is the derivative of the ReLU activation function.

In [0]:
import numpy as np
import matplotlib.pyplot as plt
from matplotlib import cm

In [0]:
num_epochs = 100
learning_rate = 0.01

In [0]:
X = np.array([[0,0], [0,1], [1,0], [1,1]])
X = np.hstack((np.ones(shape=(X.shape[0], 1)), X))
print(X)

In [0]:
Y = np.array([[0], [1], [1], [0]])
print(Y)

In [0]:
np.random.seed(0)
W = np.random.random(size=(3,2)) - 0.5
W = np.array([[0, -1], [1,1], [1,1]], dtype=np.float)
print(W)

In [0]:
w = np.array([[0], [1], [-2]], dtype=np.float)
w += 0.5 * (0.5 * np.random.random(size=(3,1)) - 0.5)
#w[2] = -1.0
print(w)

In [0]:
for epoch in range(num_epochs):
  loss = 0
  delta_2_1 = 0
  delta_1_1 = 0
  delta_1_2 = 0
  for i in range(4):
    x = X[i, :]
    y = Y[i]
    #print(x, y)
    z_1 = np.matmul(x, W)
    #print(z_1)
    a_1 = np.maximum(z_1, 0)
    #print(a_1)
    a_1 = np.hstack((1, a_1))
    #print(a_1)
    yhat = np.matmul(a_1, w)
    loss += 0.25 * (y - yhat)**2
    #print(yhat)
    delta = 0.5 * yhat * ( yhat - y)
    delta_2_1 += delta
    #print(delta_2)
    delta_1_1 += delta * w[1, 0] * np.heaviside(z_1[0], 0) * a_1[1]
    delta_1_2 += delta * w[2, 0] * np.heaviside(z_1[1], 0) * a_1[2]
    #print(delta_2_1, delta_1_1, delta_1_2)
  w[0, 0] -= learning_rate * delta_2_1
  w[1, 0] -= learning_rate * delta_1_1 
  w[2, 0] -= learning_rate * delta_1_2
  if epoch % 10 == 0:
    print(epoch, loss)

In [0]:
print(w)