Before you turn this problem in, make sure everything runs as expected. First, **restart the kernel** (in the menubar, select Kernel$\rightarrow$Restart) and then **run all cells** (in the menubar, select Cell$\rightarrow$Run All).

Make sure you fill in any place that says `YOUR CODE HERE` or "YOUR ANSWER HERE", as well as your name and collaborators below:

In [None]:
NAME = "Camille Hascoët"
COLLABORATORS = ""

---

In [1]:
import numpy as np
import matplotlib.pyplot as plt
from icecream import ic

# Multi-Layered Neural Networks and the Backpropagation Algorithm

For easy computing potential on a neuron, the weights of incoming
synapses of the neuron are stored as a row vector.
 
Let us take a neural network with the topology [2,2,1], i.e., the network
has 2 input neurons, 2 hidden neurons in a single hidden layer, and one
output neuron. Let the weights of synapses between the input and the
hidden layer be in the following matrix:

In [2]:
w_i_h = np.array([[0.5, -0.5],
                  [1.5,  0.5]])

`w_i_h[i,j]` is the weight of the synapse from the input `i` into the
hidden neuron `j`. I.e., each row of the weight matrix corresponds to
the weights of synapses leading **from** one neuron!

Let the synaptic weights between the hidden and the output layer
be in the matrix:

In [3]:
w_h_o = np.array([[2.0], [-1.0]])

`w_h_o[i,0]` is the weight of the connection from the hidden neuron `i` 
to the output neuron. Thresholds of the hidden neurons are in the vector:

In [4]:
b_h = np.array([0, 0.5])

and the threshold of the outout neuron is:

In [5]:
b_o = np.array([-0.5])

Hence the weights from the input layer into the hidden layer with added 
virtual neuron with fixed output 1 (for representing thresholds) are:

In [6]:
# note that r_ is not a method of numpy array!
w_i_hb = np.r_[w_i_h, b_h.reshape(1,-1)]
w_i_hb

array([[ 0.5, -0.5],
       [ 1.5,  0.5],
       [ 0. ,  0.5]])

The weights from the hidden layer into the output layer
with added virtual neuron with output 1 are:

In [7]:
w_h_ob = np.r_[w_h_o, b_o.reshape(1,-1)]
w_h_ob

array([[ 2. ],
       [-1. ],
       [-0.5]])

A sigmoidal transfer function $$logsig(x) = \frac{1}{1 + e^{-\lambda x}}$$ can be implemented as

In [8]:
def sigmoid(x, lam=1.0):
    # sigmoid transfer function
    #     sigmoid(x) = 1/(1 + exp{-lam * x)
    return 1 / (1 + np.exp(-lam * x))

In [9]:
1/(1+np.exp(-3))

0.9525741268224334

In [10]:
sigmoid(3)

0.9525741268224334

This is the sigmoid function with the slope $\lambda$. The default value for the slope is $\lambda = 1$.

## Tasks:

* *Let $\lambda=1$. Compute the output of the network for the input patterns `p1` and `p2`.*

In [11]:
lamb = 1.0
p1 = np.array([-1, 1])
p2 = np.array([ 1,-1])

In [28]:
def neural_network(x, w_i_hb, w_h_ob, lamb=1.0):
    if len(x) == w_i_hb.shape[0] - 1:
        x = np.append(x, 1)
    x_h = sigmoid(np.append(np.dot(x, w_i_hb), 1))
    x_o = sigmoid(np.dot(x_h, w_h_ob))
    return x_o

In [29]:
print(w_i_hb)
print("p1")
print(neural_network(p1, w_i_hb, w_h_ob, lamb))
print("p2")
print(neural_network(p2, w_i_hb, w_h_ob, lamb))


[[ 0.5 -0.5]
 [ 1.5  0.5]
 [ 0.   0.5]]
p1
[0.56930433]
p2
[0.44888244]


* *Compute the utput of the network for the whole training set `X` consisting of the patterns `p1` and `p2`.*

In [33]:
X = np.vstack((p1,p2))
print(np.c_[X, np.ones(X.shape[0])])
y = []
for x in X:
    y.append(neural_network(x, w_i_hb, w_h_ob, lamb))
y = np.array(y).flatten()
print(y)
    

[[-1.  1.  1.]
 [ 1. -1.  1.]]
[0.56930433 0.44888244]


The input pattern  `p1` is a training vector with the desired
output 0.9 and the input pattern `p2` is also a trianing pattern with the desired output 0.8. Hence the desired outputs we can store in an array, where row `d[i]` are the desired output for the pattern `X[i]`.

In [34]:
d = np.array([[0.9],[0.8]])
print("d\n",d)

d
 [[0.9]
 [0.8]]


* *What is the error of the network on each of the patterns `p1` and `p2`?*

In [36]:
def error(y, d):
    return np.sum((y - d)**2) / 2
print("error\n",error(y, d))

error
 0.24468535739351713


* *What is the mean squared error (MSE) of the network on the whole training set?*

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

* *How will change the weights of the network after one step of the
  backpropagation learning algorithm (without momentum) with the training pattern `p1`
  with the learning rate $\alpha = 0.2$?*

In [None]:
alpha = 0.2

In [None]:
pat = 0
delta_o = ...               # delta terms at the output layer
delta_h = ...               # delta terms at the hidden layer
w_h_ob1 = ...               # new weights from the hidden to the output layer
w_i_hb1 = ...               # new weights form the input to the output layer
# YOUR CODE HERE
raise NotImplementedError()

** for `p1`**

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

   
* How will change the output of the network for input `p1` after the first 
  iteration of the backpropagation algorithm?*

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

* *Estimate the number of iterations over the pattern `p1` necessary to obtain an error "close" to 0*

In [None]:
alpha = 0.2
lam = 1.0



<a style='text-decoration:none;line-height:16px;display:flex;color:#5B5B62;padding:10px;justify-content:end;' href='https://deepnote.com?utm_source=created-in-deepnote-cell&projectId=936847e6-8769-4eca-a578-6e1b2af8cac4' target="_blank">
 </img>
Created in <span style='font-weight:600;margin-left:4px;'>Deepnote</span></a>