In [27]:
NAME = "Camille Louis Hascoët"      # insert your full name

---

In [28]:
import numpy as np
from numpy.testing import assert_approx_equal, assert_allclose

# Assignment 2: Multilayered Neural Networks

## Task 1: Weight initialization for a single neuron (4 points)

Let us have a simple neural network with $N$ inputs and a single output neuron **without bias**. The neuron uses the sigmoidal transfer function
        $$\hspace{5em} f(x) = \frac{1}{1+e^{-x}}\ .$$
Its potential is
        $$\hspace{5em} \xi = \sum_{i=1}^{N} w_i x_i\,, $$
where $w_i$ and $x_i$ are the weight and value of $i$-th input, respectively, for $i=1, \dots, N$.

For fast learning of this neuron, the absolute value of the potential should not be too high. The input attributes are real values without restriction, but we know they have a normal probability distribution with mean value $\mu = 0$ and standard deviation $\sigma, \;\sigma > 0$. We will initialize the neuron weights with values from the uniform distribution on an interval $\langle -a,a \rangle$. 

How should we set the value $a$ with respect to $\sigma$ if we require that the potential on the neuron should have zero expected value and standard deviation $A = 1$?

*Hint: A similar problem when the input values were from a uniform distribution was solved within the lecture on multilayered neural networks.*

#### What to submit:
A complete derivation of the proper value of the variable $a$ with respect to $\sigma$ and $A$. The derivation can be written by hand and submitted as a scanned picture (or an image captured by, e.g., a mobile phone) in a separate file.


# Task 2: Manual design of a neural network for computing a function (3 points)

Suggest weights of a multilayered neural network computing the function $f(x_1,x_2) = 2 - x_1 + x_2$, where $x_1, x_2$ are input bits (of value 0 or 1 each). The network's neurons should use the sigmoidal transfer function with slope 1, the weights and biases should be **"small" integers** (with absolute value at most 20). In contrast to Task 1, the neurons **have biases**. The topology of the network must be 2-2-2. That is:
* two input neurons -- inputs are bits (with value 0 or 1),
* two neurons in a single hidden layer, and
* two neurons in the output layer.

Outputs of the network (at the output layer only!) will be interpreted as two-bit binary numbers in the following way:
* output greater or equal to 0.5 will be considered as logical 1,
* output less than 0.5 will be considered as logical 0.

#### What to submit 
Extended weight matrices:

In [29]:
# The extended weight matrix between the input and hidden layer
w_i_hb = np.array([[20, -20], [20,20], [20,20]])  #np.array with dimension 3 x 2

# The extended weight matrix between the hidden and output layer
w_h_ob = np.array([[20,20],[20,-20],[20,0]])  #np.array with dimension 3 x 2

*Hint: The weights can be proposed "manually" by assuming that the hidden neurons compute suitable logical functions. However, the output of the hidden neurons will not be rounded!*

Your proposed weights will be tested below.

In [30]:
def sigm(x, slope=1.0):
    return 1/(1+np.exp(-slope * x))

def sigm_deriv(x, slope=1.0):
    #sigm_x = sigm(x, slope)
    return slope * x * (1 - x)

assert(w_i_hb.dtype == int)
assert(w_h_ob.dtype == int)

extended_input = np.array([0,0,1])
print(sigm(extended_input @ w_i_hb))
print(np.round(sigm(np.r_[sigm(extended_input @ w_i_hb),1] @ w_h_ob)))

# for input (0,0), the output should be (1,0)
assert (np.round(sigm(np.r_[sigm(extended_input @ w_i_hb),1] @ w_h_ob)) == np.array([1,0])).all()


[1. 1.]
[1. 0.]


## Task 3: Backpropagation algorithm (3 points)
We have a multilayered neural network with the topology 2-4-2, i.e., it has two input neurons, one hidden layer containing four neurons, and two output neurons. All neurons use the sigmoidal transfer function with the slope $\lambda =2.0$.

In [31]:
lam = 2.0

The extended weight matrix of weights between the input and the hidden layer is

In [32]:
w_i_hb = np.array([[ 1.1, -2.2,  1.0,  0.5],
                   [ 0.5,  0.9,  2.0, -1.0],
                   [ 0.0, -0.4, -1.0, -0.7]])

and the extended weight matrix between the hidden and the output layer is

In [33]:
w_h_ob = np.array([[ 2.0,  0.9],
                   [-1.0,  1.1],
                   [-2.2, -0.8],
                   [ 1.5,  0.0],
                   [ 0.5, -0.5]])

In [34]:
p = np.array([-1,1])

with the desired output

In [35]:
d = np.array([0.2, 0.4])

and the learning rate

In [36]:
alpha = 1.5

To solve this assignment, you must **not use any library for learning neural networks**!

#### What to submit
A Python code in the cell below computing the new extended weight matrices `w_i_hb1` and `w_h_ob1` after one iteration of the backpropagation algorithm.

In [39]:
def bp_iteration(p, d, w_i_hb, w_h_ob, alpha, lam):
    w_h_ob, w_i_hb = w_h_ob.astype(np.float64), w_i_hb.astype(np.float64)
    lam, alpha = float(lam), float(alpha)
    extended_input = np.append(p, 1)  # Adding bias term to input
    hidden_input = extended_input @ w_i_hb
    hidden_output = sigm(hidden_input, lam)
    hidden_output_with_bias = np.append(hidden_output, 1)  # Adding bias term to hidden layer output

    final_input = hidden_output_with_bias @ w_h_ob
    final_output = sigm(final_input, lam)

    # Backward pass
    output_error = d - final_output
    output_delta = output_error * sigm_deriv(final_output, lam)

    hidden_error = output_delta @ w_h_ob[:-1].T  # Removing bias weights from calculation
    hidden_delta = hidden_error * sigm_deriv(hidden_output, lam)

    # Weight updates
    w_h_ob += alpha * np.outer(hidden_output_with_bias, output_delta)
    w_i_hb += alpha * np.outer(extended_input, hidden_delta)

    return w_i_hb, w_h_ob

w_i_hb1, w_h_ob1 = bp_iteration(p, d, w_i_hb, w_h_ob, alpha, lam)

In the cell below, your results will be checked using several hidden tests. 

In [40]:
y_o = sigm(np.r_[sigm(np.r_[p,1] @ w_i_hb, lam),1] @ w_h_ob, lam)
print("Ouput before training:", y_o)
y_o1 = sigm(np.r_[sigm(np.r_[p,1] @ w_i_hb1, lam),1] @ w_h_ob1, lam)
print("Ouput after training:", y_o1)
print("w_i_hb1:\n", w_i_hb1)
print("w_h_ob1:\n", w_h_ob1)

assert_allclose(y_o, [0.09720079, 0.69141942])


Ouput before training: [0.09720079 0.69141942]
Ouput after training: [0.07185376 0.44076691]
w_i_hb1:
 [[ 1.14047168 -2.1979209   0.95515682  0.49902726]
 [ 0.45952832  0.8979209   2.04484318 -0.99902726]
 [-0.04047168 -0.4020791  -0.95515682 -0.69902726]]
w_h_ob1:
 [[ 2.00626436  0.85682281]
 [-0.97305893  0.91430817]
 [-2.18646862 -0.89326526]
 [ 1.50032823 -0.00226232]
 [ 0.52706275 -0.68653052]]
