In [2]:
import numpy as np
import matplotlib.pyplot as plt

# Multi-Layered Neural Networks and the Backpropagation Algorithm

For easy computing potential on a neuron, the weights of incoming
synapses of the neuron are stored as a row vector.
 
Let us take a neural network with the topology [2,2,1], i.e., the network
has 2 input neurons, 2 hidden neurons in a single hidden layer, and one
output neuron. Let the weights of synapses between the input and the
hidden layer be in the following matrix:

In [3]:
w_i_h = np.array([[0.5, -0.5],
                  [1.5,  0.5]])

`w_i_h[i,j]` is the weight of the synapse from the input `i` into the
hidden neuron `j`. I.e., each row of the weight matrix corresponds to
the weights of synapses leading **from** one neuron!

Let the synaptic weights between the hidden and the output layer
be in the matrix:

In [4]:
w_h_o = np.array([[2.0], [-1.0]])

`w_h_o[i,0]` is the weight of the connection from the hidden neuron `i` 
to the output neuron. Thresholds of the hidden neurons are in the vector:

In [5]:
 b_h = np.array([0, 0.5])

and the threshold of the outout neuron is:

In [6]:
b_o = np.array([-0.5])

Hence the weights from the input layer into the hidden layer with added 
virtual neuron with fixed output 1 (for representing thresholds) are:

In [7]:
# note that r_ is not a method of numpy array!
w_i_hb = np.r_[w_i_h, b_h.reshape(1,-1)]

The weights from the hidden layer into the output layer
with added virtual neuron with output 1 are:

In [8]:
w_h_ob = np.r_[w_h_o, b_o.reshape(1,-1)]
print(w_h_ob)

[[ 2. ]
 [-1. ]
 [-0.5]]


A sigmoidal transfer function $$logsig(x) = \frac{1}{1 + e^{-\lambda x}}$$ can be implemented as

In [9]:
def sigmoid(x, lam=1.0):
    # sigmoid transfer function
    #     sigmoid(x) = 1/(1 + exp{-lam * x)
    return 1 / (1 + np.exp(-lam * x))

This is the sigmoid function with the slope $\lambda$. The default value for the slope is $\lambda = 1$.

## Tasks:

* *Compute the output of the network for the input patterns `p1` and `p2`.*

In [10]:
p1 = np.array([-1, 1])
p2 = np.array([ 1,-1])

In [113]:
# your code goes here
print("w_i_hb\n", w_i_hb)
print("p1")
print("p1 extended\n",np.r_[p1, 1])
print(sigmoid(np.dot(np.r_[p1, 1], w_i_hb[:,0])))    # outputs on the first hidden neuron
print(sigmoid(np.dot(np.r_[p1, 1], w_i_hb[:,1])))    # outputs on the second hidden neuron

y_h = sigmoid(np.dot(np.r_[p1, 1], w_i_hb))    # outputs on the hidden layer
print("y_h\n",y_h)
y_o = sigmoid(np.dot(np.r_[y_h, 1], w_h_ob))
print("y_o\n",y_o)
print("p2")
print(sigmoid(np.dot(np.r_[p2, 1], w_i_hb[:,0])))    # outputs on the hidden layer
print(sigmoid(np.dot(np.r_[p2, 1], w_i_hb[:,1])))    # outputs on the hidden layer

y_h = sigmoid(np.dot(np.r_[p2, 1], w_i_hb))    # outputs on the hidden layer
print("y_h\n",y_h)
y_o = sigmoid(np.dot(np.r_[y_h, 1], w_h_ob))
print("y_o\n",y_o)


w_i_hb
 [[ 0.5 -0.5]
 [ 1.5  0.5]
 [ 0.   0.5]]
p1
p1 extended
 [-1  1  1]
0.7310585786300049
0.8175744761936437
y_h
 [0.73105858 0.81757448]
y_o
 [0.53607289]
p2
0.2689414213699951
0.3775406687981454
y_h
 [0.26894142 0.37754067]
y_o
 [0.4158926]


* *Compute the output of the network for the whole training set `X` consisting of the patterns `p1` and `p2`.*

In [119]:
X = np.vstack((p1,p2))
print("X\n",X)
print(np.c_[X, np.ones(X.shape[0])])
# your code goes here
y_h = sigmoid(np.dot(np.c_[X, np.ones(X.shape[0])], w_i_hb))
print("y_h\n",y_h)
y_o = sigmoid(np.dot(np.c_[y_h, np.ones(y_h.shape[1])], w_h_ob))
print("y_o\n",y_o)

X
 [[-1  1]
 [ 1 -1]]
[[-1.  1.  1.]
 [ 1. -1.  1.]]
y_h
 [[0.73105858 0.81757448]
 [0.26894142 0.37754067]]
y_o
 [[0.53607289]
 [0.4158926 ]]


The input pattern  `p1` is a training vector with the desired
output 0.9 and the input pattern `p2` is also a trianing pattern with the desired output 0.8. Hence the desired outputs we can store in an array, where row `d[i]` are the desired output for the pattern `X[i]`.

In [120]:
d = np.array([[0.9],[0.9]])
print("d\n",d)

d
 [[0.9]
 [0.9]]


* *What is the error of the network on each of the patterns `p1` and `p2`?*

In [17]:
# your code goes here
...
print(E)

[[0.06622147]
 [0.07376925]]


* *What is the mean squared error (MSE) of the network on the whole training set?*

In [18]:
# your code goes here
...
print(MSE)

0.06999535995430395


* *How will change the weights of the network after one step of the
  backpropagation learning algorithm (without momentum) with the training pattern `p1`
  with the learning rate $\alpha = 0.2$?*

In [19]:
alpha = 0.2

In [121]:
# your code goes here

# delta_o 0.09050822
# w_h_ob1
# [[ 2.01323336]
# [-0.98520056]
# [-0.48189836]]
#
# delta_h
# 0.03558999 -0.01349898 -0.
#
# w_i_hb1
# [[ 0.492882  -0.4973002]
# [ 1.507118   0.4973002]
# [ 0.007118   0.4973002]]


   
* How will change the output of the network for input `p1` after the first 
  iteration of the backpropagation algorithm?*

In [None]:
# your code goes here
# y_h1
#  0.73523626 0.81636337
# y_o1
#  0.5483539

...
print("y_h1\n", y_h1)
print("y_o1\n", y_o1)

* *Estimate the number of iterations over the pattern `p1` necessary to obtain*

In [53]:
alpha = 0.2
lam = 1.0



**Notation:**

Using `numpy` for working with vectors and matrices when we train a neural network has some problems:
* Input: input patterns are stored as rows in a 2D matrix $X$, but one input pattern is a 1D vector.
* Output, desired output: output patterns are stored as rows in a 2D matrix $Y$, however one output pattern is a 1D vector.
* Output of hidden neurons: can be stored in rows of a 2D matrix if we compute output for more than one pattern, but it is a 1D vector if we compute with one input vector.

A possible solution: is to *store vectors as two-dimensional arrays*:
* Then we can distinguish row and column vectors.
* If we work with a single vector, we will convert it into a row vector.

In [106]:
p1_2d = p1.reshape(1,-1)
print("p1_2d\n",p1_2d)

p1_2d
 [[-1  1]]


In [122]:
# output of the hidden neurons
...
print("y_h\n", y_h)

y_h
 [[0.73105858 0.81757448]]


In [123]:
# output of the network 
...
print("y_o\n", y_o)

y_o
 [[0.53607289]]


In [109]:
pat = 0
delta_o = (d[pat] - y_o) * lam * y_o * (1 - y_o)
print("delta_o\n", delta_o)

delta_o
 [[0.09050822]]


Note that `delta_o` **is a row vector**? Why?

In [None]:
print("np.c_[y_h,[[1]]]\n", np.c_[y_h,[[1]]])

w_h_ob1 = w_h_ob + ...
print("w_h_ob1\n", w_h_ob1)

In [125]:
delta_h = ...
print("delta_h\n", delta_h)

delta_h
 [[ 0.04624401 -0.01753996 -0.        ]]


In [126]:
print(np.c_[p2_2d, [[1]]].T)
print(delta_h[:2,:].T)
w_i_hb1 = w_i_hb + alpha * np.c_[p1_2d, [[1]]].T @ delta_h[:,:2]
print("w_i_hb1\n", w_i_hb1)

[[ 1]
 [-1]
 [ 1]]
[[ 0.04624401]
 [-0.01753996]
 [-0.        ]]
w_i_hb1
 [[ 0.4907512  -0.49649201]
 [ 1.5092488   0.49649201]
 [ 0.0092488   0.49649201]]


In [82]:
p2_2d = p2.reshape(1,-1)
print("p2_2d\n",p2_2d)

p2_2d
 [[ 1 -1]]


In [84]:
# output of the hidden neurons
print("np.r_[p2_2d,[[1]]]\n", np.c_[p2_2d,[[1]]])
print("w_i_hb\n", w_i_hb)
print("np.r_[p2_2d,[[1]]] @ w_i_hb\n", np.c_[p2_2d,[[1]]]  @ w_i_hb)
y_h = sigmoid(np.c_[p2_2d,[[1]]]  @ w_i_hb)
print("y_h\n", y_h)

np.r_[p2_2d,[[1]]]
 [[ 1 -1  1]]
w_i_hb
 [[ 0.5 -0.5]
 [ 1.5  0.5]
 [ 0.   0.5]]
np.r_[p2_2d,[[1]]] @ w_i_hb
 [[-1.  -0.5]]
y_h
 [[0.26894142 0.37754067]]


In [85]:
y_o = sigmoid(np.c_[y_h, [[1]]] @ w_h_ob)
print("y_o\n", y_o)

y_o
 [[0.4158926]]


In [88]:
pat = 1
delta_o = (d[pat] - y_o) * lam * y_o * (1 - y_o)
print("delta_o\n", delta_o)

delta_o
 [[0.11760225]]


Note that `delta_o` **is a row vector**? Why?

In [89]:
print("np.c_[y_h,[[1]]]\n", np.c_[y_h,[[1]]])

w_h_ob1 = w_h_ob + alpha * np.c_[y_h,[[1]]].T @ delta_o
print("w_h_ob1\n", w_h_ob1)

np.c_[y_h,[[1]]]
 [[0.26894142 0.37754067 1.        ]]
w_h_ob1
 [[ 2.00632562]
 [-0.99112007]
 [-0.47647955]]


In [102]:
delta_h = delta_o @ w_h_ob.T * lam * np.c_[y_h,[[1]]] * (1 - np.c_[y_h, [[1]]])
print("delta_h\n", delta_h)

delta_h
 [[ 0.04624401 -0.02763696 -0.        ]]


In [105]:
print(np.c_[p2_2d, [[1]]].T)
print(delta_h[:2,:].T)
w_i_hb1 = w_i_hb + alpha * np.c_[p2_2d, [[1]]].T @ delta_h[:,:2]
print("w_i_hb1\n", w_i_hb1)

[[ 1]
 [-1]
 [ 1]]
[[ 0.04624401]
 [-0.02763696]
 [-0.        ]]
w_i_hb1
 [[ 0.5092488  -0.50552739]
 [ 1.4907512   0.50552739]
 [ 0.0092488   0.49447261]]
