# Homework 12

This homework is about **forward and back propagation** in neural networks

Consider a fully connected $3 x 3 x 1$ NN 
![nn3x3x1](../figures/nn3x3x1.png)

with a sigmoidal activation function in the nodes of the hidden layer. (Input and output activations are simply the identity function)
The weight matrices are (from left to right):

$$
W_{i,h} = \left( \begin{array}{ccc}
 1 &   4 &   7 \\ 
    2 &   5 &   8 \\ 
    3 &   6 &   9 \end{array} \right) ,
W_{h,o} = \left( \begin{array}{c}
  -1 \\ 
  0.1 \\ 
  1  \end{array} \right) 
$$
We follow the convention of the MYONN book which defines the weight matrix as the transpose of the weights depicted in the diagrams, i.e. the $(i,j)$ element $w_{(i,j)}$ is the weight going from left node j to right node i.
(To leave no room for ambiguity: $W_{(i,h)}(2,1) = 2$ is the weight going from input node 1 to hidden node 2). The advantage is the simpler matrix multiplication.

We ignore the bias term for this exercise, i.e. we assume that the three input nodes take $x_{i,1}, x_{i,2},x_{i,3}$, respectively.

The first row of your training data is given as 
$$
\left( \begin{array}{ccc|r}
 x_1 &   x_2 &   x_3 & y \\ 
    0.5 &   -0.25 &   0.25 & 0.3\end{array} \right) 
$$

**import libraries**

In [2]:
import numpy as np
import scipy

## Forward propagation

We are passing the data "from left to right".

Compute 

1. The input vector to the hidden layer, $\vec{z} = W_{i,h} \cdot \vec{x}$
2. The output of the hidden layer, $\vec{o} = sigmoid(\vec{z})$
3. The prediction of the network, $\hat{y} = W_{h,o} \cdot \vec{o}$

## Backward propagation

We are propagating errors "from right to left" and update the weights.

Assume a learning rate $\alpha = 0.1$.

Compute 

1. The prediction error $e = y - \hat{y}$
2. (Based on this error) the updates to the right-most weights $\Delta W_{h,o}$
3. The updates to the left-most weights $\Delta W_{i,h}$. (This involves the derivative of the sigmoid function!)

Do not use the matrix simplification of the book.

## Updated Weights

With the freshly updated weights, compute

1. The new prediction of the network.
2. The (hopefully reduced) new error

## Extra Credit

1. Think about scaling of the inputs. What happens to the gradient for large input values, e.g. 
$$
\left( \begin{array}{ccc|r}
 x_1 &   x_2 &   x_3 & y \\ 
    50 &   -25 &   25 & 30\end{array} \right) 
$$
2. Think about initialization of the weights. Would it be a good idea to start with zero weights, e.g. 
$$
W_{i,h} = \left( \begin{array}{ccc}
 0 &   0 &   0 \\ 
    0 &   0 &   0 \\ 
    0 &   0 &   0 \end{array} \right)
$$

**Tips**

1. Use `numpy.dot` for matrix multiplications.
2. You could use `scipy.special.expit(x)` for the sigmoid function or simply define it yourself.