# Assignment 2: Multilayered Neural Networks
Assignment [here](NNAssignment2_2022.pdf).

## 1. Weights initialization for a single neuron
Expected value of the potential:
$$\text{E}\{\xi\}=\text{E}\Biggl\{\sum_{i=1}^N w_ix_i\Biggr\} = \sum_{i=1}^N \text{E}\{w_i\} \text{E}\{x_i\}$$
since weights are independent of the input.
From uniformly distributed $w_i$ in interval $\langle -a, a \rangle$ or $\mu=0$ we have $\text{E}\{\xi\}=0$.

Variance of the potential:
$$\sigma_\xi^2 = \text{E}\{\xi^2\} - \text{E}^2\{\xi\}$$
since expected value of the potential will be 0:
$$\sigma_\xi^2 = \text{E}\Biggl\{\biggl(\sum_{i=1}^N w_ix_i\biggr)^2\Biggr\} - 0 = \sum_{i,j=1}^N \text{E}\{(w_iw_jx_ix_j)\}$$
and from mutual independence:
$$\sigma_\xi^2 = \sum_{i=1}^N \text{E}\{(w_i)^2\}\text{E}\{(x_i)^2\}$$


Since weights are random variables with zero mean and are uniformly distributed in $\langle -a, a \rangle$:
$$\text{E}\{(w_i)^2\} = \int_{-a}^a w_i^2 \frac{1}{2a}\mathrm{d}w_i = \left. \frac{w_i^3}{6a} \right\rvert_{-a}^{a} = \frac{a^2}{3}$$

We know $x$ has normal probability distribution with $\mu = 0$ and $\sigma > 0$.

Also we know variance formula:
$$\sigma_x^2 = \text{E}\{x^2\} - \text{E}^2{x}$$

From that we can derive:

$$ \text{E}\{x^2\} = \sigma_x^2 + 0$$

So we need to calculate:
$$\sigma_\xi^2 = \sum_{i=1}^N \text{E}\{(w_i)^2\}\text{E}\{(x_i)^2\} = 1$$
since we require $\sigma_\xi = 1$.

From that:
$$\sigma_\xi^2 = \sum_{i=1}^N \frac{a^2}{3}\sigma_{x_i}^2 = 1$$

$$a^2\frac{N\sigma_x^2}{3} = 1$$

$$a = \frac{1}{\sigma_x}\sqrt{\frac{3}{N}}$$

For std $A$:
$$\sqrt{a^2\frac{N\sigma_x^2}{3}} = A$$
$$a = \frac{A}{\sigma_x}\sqrt{\frac{3}{N}}$$


## 2. Manual design of a neural network for computing a function

Lets say extended input to a layer is [x1, x2, 1] where 1 is is the extension for the bias.  
Also lets say we use BigEndianity so 2 => 10, 1 => 01


Then extended weight vectors for the input-hidden layers are: 
<pre>-7  5  
-7  5
 3 -7 </pre>

And for hidden-output layer:
<pre>-9 10  
 5 10
 4 -5</pre>

I.e. biases/thresholds are the last rows and output_of_the_first_layer for vector [1, 0]
will be sigmoid([-4, -2]).  
    -4 = 1\*-7 + 0\*-7 + 1\*3

Then the final outputs will be:
| Vector | Output |
| ---: | ---: |
| [0, 0] | [0.01026587, 0.98938538] |
| [1, 0] | [0.98827385, 0.02587887] |
| [0, 1] | [0.98827385, 0.02587887] |
| [1, 1] | [0.99984357, 0.98929105] |



In [3]:
import numpy as np


def get_w():
    return [
        np.array([[-7, 5], [-7, 5], [3, -7]], dtype=float),
        np.array([[-9, 10], [5, 10], [4, -5]], dtype=float),
    ]


def sigmoid(ksi):
    return 1 / (1 + np.exp(-ksi))

w1, w2 = get_w()
for inp in [[0, 0], [1, 0], [0, 1], [1, 1]]:
    inp = np.array([*inp, 1])
    h_out = np.dot(inp, w1)
    h_out = sigmoid(h_out)
    o_in = np.array([*h_out, 1])
    o_out = np.dot(o_in, w2)
    o_out = sigmoid(o_out)
    print(o_out)


[0.01026587 0.98938538]
[0.98827385 0.02587887]
[0.98827385 0.02587887]
[0.99984357 0.98929105]


## 3. Weights update

In [66]:
import numpy as np

lambda_ = 2.0

def sigmoid2(ksi):
    return 1 / (1 + np.exp(-lambda_ * ksi))

w1 = np.array([[ 1.1, -2.2],
[ 0.5, 0.9],
[ 0.0, -0.4]])

w2 = np.array([[ 2.0, 0.9],
[-1.0, 1.1],
[ 0.5, -0.5]])

p = np.array([-1,1])
d = np.array([0.2, 0.4])

lr = 1.5

# through first layer
i0 = np.array([*p, 1.])
h = np.dot(i0, w1)
h_out = sigmoid(h)

# through second layer
i1 = np.array([*h_out, 1.])
o = np.dot(i1, w2)
out = sigmoid(o)

error2 = d - out

# backpropagation
d2 = lambda_ * out * (1 - out) * error2
w2 += lr * np.outer(i1, d2)

error1 = d2 @ np.transpose(w2[:-1])
d1 = lambda_ * h_out * (1 - h_out) * error1

w1 += lr * np.outer(i0, d1)

print("New in-h weights")
print(w1)
print("New h-out weights")
print(w2)


New in-h weights
[[ 1.40785247 -2.21943588]
 [ 0.19214753  0.91943588]
 [-0.30785247 -0.38056412]]
New h-out weights
[[ 1.90411386  0.83298221]
 [-1.25356135  0.92277813]
 [ 0.22939793 -0.68913215]]
