In [2]:
import numpy as np
import matplotlib.pyplot as plt

# Multi-Layered Neural Networks and the Backpropagation Algorithm

For easy computing potential on a neuron, the weights of incoming
synapses of the neuron are stored as a row vector.
 
Let us take a neural network with the topology [2,2,1], i.e., the network
has 2 input neurons, 2 hidden neurons in a single hidden layer, and one
output neuron. Let the weights of synapses between the input and the
hidden layer be in the following matrix:

In [3]:
w_i_h = np.array([[0.5, -0.5],
                  [1.5,  0.5]])

`w_i_h[i,j]` is the weight of the synapse from the input `i` into the
hidden neuron `j`. I.e., each row of the weight matrix corresponds to
the weights of synapses leading **from** one neuron!

Let the synaptic weights between the hidden and the output layer
be in the matrix:

In [4]:
w_h_o = np.array([[2.0], [-1.0]])

`w_h_o[i,0]` is the weight of the connection from the hidden neuron `i` 
to the output neuron. Thresholds of the hidden neurons are in the vector:

In [5]:
b_h = np.array([0, 0.5])

and the threshold of the outout neuron is:

In [6]:
b_o = np.array([-0.5])

Hence the weights from the input layer into the hidden layer with added 
virtual neuron with fixed output 1 (for representing thresholds) are:

In [7]:
# note that r_ is not a method of numpy array!
w_i_hb = np.r_[w_i_h, b_h.reshape(1,-1)]
print(w_i_hb)

[[ 0.5 -0.5]
 [ 1.5  0.5]
 [ 0.   0.5]]


The weights from the hidden layer into the output layer
with added virtual neuron with output 1 are:

In [8]:
w_h_ob = np.r_[w_h_o, b_o.reshape(1,-1)]
print(w_h_ob)

[[ 2. ]
 [-1. ]
 [-0.5]]


A sigmoidal transfer function $$logsig(x) = \frac{1}{1 + e^{-\lambda x}}$$ can be implemented as

In [9]:
def sigmoid(x, lam=1.0):
    # sigmoid transfer function
    #     sigmoid(x) = 1/(1 + exp{-lam * x)
    return 1 / (1 + np.exp(-lam * x))

This is the sigmoid function with the slope $\lambda$. The default value for the slope is $\lambda = 1$.

## Tasks:

* *Compute the output of the network for the input patterns `p1` and `p2`.*

In [10]:
p1 = np.array([-1, 1])
p2 = np.array([ 1,-1])

In [11]:
# your code goes here
o1 = 


* *Compute the utput of the network for the whole training set `X` consisting of the patterns `p1` and `p2`.*

In [54]:
# your code goes here
X = np.vstack((p1,p2))
print("X\n",X)
y_1 = np.append(X, np.ones((len(X), 1)), axis=1)

X
 [[-1  1]
 [ 1 -1]]


In [55]:
y_h = sigmoid(y_1 @ w_i_hb)
print(y_h)
y_o = sigmoid(np.append(y_h, np.ones((len(y_h), 1)), axis=1) @ w_h_ob)

[[0.73105858 0.81757448]
 [0.26894142 0.37754067]]


The input pattern  `p1` is a training vector with the desired
output 0.9 and the input pattern `p2` is also a trianing pattern with the desired output 0.8. Hence the desired outputs we can store in an array, where row `d[i]` are the desired output for the pattern `X[i]`.

In [13]:
d = np.array([[0.9],[0.8]])
print("d\n",d)

d
 [[0.9]
 [0.8]]


* *What is the error of the network on each of the patterns `p1` and `p2`?*

In [29]:
# your code goes here
E = 0.5 * (y_o - d) ** 2
E

array([[0.06622147],
       [0.07376925]])

* *What is the mean squared error (MSE) of the network on the whole training set?*

In [15]:
# your code goes here
MSE = E.mean()


* *How will change the weights of the network after one step of the
  backpropagation learning algorithm (without momentum) with the training pattern `p1`
  with the learning rate $\alpha = 0.2$?*

In [31]:
lam = 1.0
alpha = 0.2

In [32]:
delta_o = (d[0] - y_o[0]) * lam * y_o[0] * (1 - y_o[0])
delta_o

array([0.09050822])

In [40]:
print(w_h_ob)
alpha * delta_o * np.append(y_h[0], 1)

[[ 2. ]
 [-1. ]
 [-0.5]]


array([0.01323336, 0.01479944, 0.01810164])

In [42]:
w_h_ob1 = w_h_ob + (alpha * delta_o * np.append(y_h[0], 1)).reshape(-1, 1)
w_h_ob1

array([[ 2.01323336],
       [-0.98520056],
       [-0.48189836]])

In [52]:
delta_h = delta_o @ w_h_ob.T * lam * np.append(y_h[0], 1) * (1-np.append(y_h[0], 1))
delta_h

array([ 0.03558999, -0.01349898, -0.        ])

In [65]:
w_i_hb1 = w_i_hb + alpha * y_1[0][:, np.newaxis] @ delta_h[:-1][np.newaxis, :]
w_i_hb1

array([[ 0.492882 , -0.4973002],
       [ 1.507118 ,  0.4973002],
       [ 0.007118 ,  0.4973002]])

** for `p2`**

In [17]:
# your code goes here



   
* How will change the output of the network for input `p1` after the first 
  iteration of the backpropagation algorithm?*

In [18]:
# your code goes here



* *Estimate the number of iterations over the pattern `p1` necessary to obtain*

In [66]:
y_h1 = sigmoid(y_1 @ w_i_hb1)
y_o= sigmoid(np.append(y_h1, np.ones((X.shape[0], 1)), axis=1) @ w_h_ob1)
y_o

array([[0.54835394],
       [0.42168797]])