# Neural Network introduction

### First step: apply Jacobian for vectors of variables and multivariate chain rule

If function has a vector of variables as input and each element of this vector is a function of $t$ hence if we want to get a derivative $\frac{df}{dt}$ => it can be calculated like this:
$$
f(x_1,x_2,...) = f(\vec{x})
$$
$$
[x_1(t) = ..., x_2(t) = ...]
$$
$$
\frac{df}{dt} = \frac{df}{dx}*\frac{dx}{dt} = \vec{J}*\frac{dx}{dt}
$$

In neural network often we use chain of functions. For example:
$$
f(\vec{x}) = f(x_1,x_2)
$$
$$
\vec{x}(\vec{u}) = 
\begin{bmatrix}
x_1(u_1,u_2) \\
x_2(u_1,u_2)
\end{bmatrix}
$$
$$
\vec{u}(t) = 
\begin{bmatrix}
u_1(t) \\
u_2(t)
\end{bmatrix}
$$

In this example derivative of function $f(\vec{x})$ is Jacobian vector, derivative of function $\vec{u}(t)$ is vector of derivatives. More complicated situation with function $\vec{x}(\vec{u})$ - there is each line - Jacobian vector with respect to $\vec{u}$ variables, each vector is derivative of $\vec{x}$ elements (functions):
$$
\frac{df}{dt} = \frac{\delta f}{\delta \vec{x}}*\frac{\delta \vec{x}}{\delta \vec{u}}*\frac{d\vec{u}}{dt} = 
\begin{bmatrix}
\frac{\delta f}{\delta x_1} & \frac{\delta f}{\delta x_2}
\end{bmatrix} *
\begin{bmatrix}
\frac{\delta x_1}{\delta u_1} & \frac{\delta x_1}{\delta u_2} \\
\frac{\delta x_2}{\delta u_1} & \frac{\delta x_2}{\delta u_2}
\end{bmatrix} *
\begin{bmatrix}
\frac{du_1}{dt} \\
\frac{du_2}{dt}
\end{bmatrix} = \vec{J_f} *
\begin{bmatrix}
\vec{J_{x_1}} \\
\vec{J_{x_2}}
\end{bmatrix} *
\begin{bmatrix}
\frac{du_1}{dt} \\
\frac{du_2}{dt}
\end{bmatrix}
$$

### Concept of Neural Network

Simpliest example of Neural Network is function which called sigma (or activation function), one input (activity), one output and 2 numbers - weight and bias.
$$
a^1 = \sigma (w^1 a^0 + b)
$$

<img src="img/Simpliest_NN.png" alt="Alternative text" style="display:block;margin-left: auto;margin-right: auto;width: 30%;" />

Formula describes 1 neuron of Neural Network. If we replace input from scalar to vector, for example with 2 input elements:

<img src="img/2IN_1N.png" alt="2 inputs 1 neuron" style="display:block;margin-left: auto;margin-right: auto;width: 30%;" />

General formula of 1 Layer Neural Network:

$$
a^1 = \sigma (\vec{w^1}*\vec{a^0} + b)
$$

<img src="img/1Layer_NN.png" alt="1 Layer NN" style="display:block;margin-left: auto;margin-right: auto;width: 30%;" />

This formula still describes 1 neuron, if we want to add more neurons (or more outputs in this context) => output is vector and input is a matrix like weights for each element of output vector respectively, our formula looks like this:
$$
\vec{a^1} = \sigma (W^1 * \vec{a^0} + \vec{b^1})
$$

This Neural Network has only one layer. If we want to add one more layer, our formula looks like this:

$$
\vec{a^1} = \sigma (W^1 * \vec{a^0} + \vec{b^1})
$$
$$
\vec{a^2} = \sigma (W^2 * \vec{a^1} + \vec{b^2})
$$

<img src="img/2Layer_NN.png" alt="2-Layers NN" style="display:block;margin-left: auto;margin-right: auto;width: 20%;" />

This is example of 2-layer Neural Network. General formula for $L$ layers:
$$
\vec{a^L} = \sigma (W^L * \vec{a^{L-1}} + \vec{b^L})
$$

<img src="img/L-Layer_NN.png" alt="L Layer NN" style="display:block;margin-left: auto;margin-right: auto;width: 30%;" />

Sigma is a function (sigmoid) which return active output if input value reached some threshold.

<img src="img/sigma_f.png" alt="Sigmoid function" style="display:block;margin-left: auto;margin-right: auto;width: 30%;" />



### Backpropagation

Any Neural Network requires correct weights and bias. For calculating it we need to determine cost function which provides us tool for evaluating of NN output result. Sum of squares is a popular cost function:
$$
C = \sum{_i(a_i^L - y_i)^2}
$$

Here we can apply our knowledge about multivariate chain rule (see above) for calculating derivative of cost function with respect to $\vec{w}$ vector. Doing it we can calculate a gradient for searching a minimum of cost function.

<img src="img/cost.png" alt="2 inputs 1 neuron" style="display:block;margin-left: auto;margin-right: auto;width: 30%;" />

Let's repeat all functions which we need to consider for calculating derivatives, for example 1-layer NN:
$$
z^1 = \vec{w^1}*\vec{a^0} + b
$$
$$
a^1 = \sigma (z^1)
$$
$$
C = \sum{_i(a_i^1 - y_i)^2}
$$
Great, it looks exactly as chain rule. Derivative of cost function with respect to weights:
$$
\frac{\delta C}{\delta w} = \frac{\delta C}{\delta a^1}*\frac{\delta a^1}{\delta z^1}*\frac{\delta z^1}{\delta w}
$$
Derivative of cost function with respect to bias:
$$
\frac{\delta C}{\delta b} = \frac{\delta C}{\delta a^1}*\frac{\delta a^1}{\delta z^1}*\frac{\delta z^1}{\delta b}
$$
General form of derivatives:
$$
\frac{\delta C}{\delta w} = \frac{\delta C}{\delta a^L}*\frac{\delta a^L}{\delta z^L}*\frac{\delta z^L}{\delta w}
$$
$$
\frac{\delta C}{\delta b} = \frac{\delta C}{\delta a^L}*\frac{\delta a^L}{\delta z^L}*\frac{\delta z^L}{\delta b}
$$

### Simple Neural Network example

N-hidden layers, one input and one output:

In [17]:
import numpy as np

class NeuralNetwork:
    def __init__(self, numHidLayers, layerSz):
        self.NN_w = np.ndarray((numHidLayers,layerSz,layerSz), dtype="float")
        self.NN_w.fill(0.01)
        self.NN_b = np.ndarray((numHidLayers,layerSz), dtype="float")
        self.NN_b.fill(0.01)
        self.NN_sw = np.ndarray((layerSz), dtype="float")
        self.NN_sw.fill(0.01)
        self.NN_sb = 0.01
        self.NN_lw = np.ndarray((layerSz), dtype="float")
        self.NN_lw.fill(0.01)
        self.NN_lb = 0.01
        self.tmpVector = np.ndarray(layerSz, dtype="float")
        self.inputVector = np.ndarray(layerSz, dtype="float")

    def calc(self, inputValue):
        def sig(x): return 1/(1+np.exp(-x))
        for row in range(len(self.NN_sw)):
            self.inputVector[row] = sig(inputValue * self.NN_sw[row] + self.NN_sb)
        for l in range(len(self.NN_w)):
            for row in range(len(self.NN_w[l])):
                self.tmpVector[row] = self.inputVector @ self.NN_w[l][row]
                self.tmpVector[row] += self.NN_b[l][row]
            self.inputVector[:] = sig(self.tmpVector[:])
        return sig(self.inputVector @ self.NN_lw + self.NN_lb)

test = NeuralNetwork(1, 12)
print(test.calc(10.5))

0.5180427613584655


Let's train our Neural network

For doing it we need to apply backpropagation method to weights and bias. For example there is 3-Layer NN:
$$
\frac{\delta C}{\delta w^3} = \frac{\delta C}{\delta a^3}*\frac{\delta a^3}{\delta z^3}*\frac{\delta z^3}{\delta w^3}
$$
$$
\frac{\delta C}{\delta a^3} = 2*(a^3 - y)
$$
$$
\frac{\delta a^3}{\delta z^3} = (1+e^{-x})^{-1} = e^{-x} * (1+e^{-x})^{-2}
$$
$$
\frac{\delta z^3}{\delta w^3} = a^2
$$
If we want to calculate derivative of cost function with respect to $w^2$, we need to add partial derivative of z^3 with respect to a^2.
$$
\frac{\delta C}{\delta w^2} = \frac{\delta C}{\delta a^3}*\frac{\delta a^3}{\delta z^3}*\frac{\delta z^3}{\delta a^2}*\frac{\delta a^2}{\delta z^2}*\frac{\delta z^2}{\delta w^2}
$$
If we want to calculate derivative of cost function with respect to $w^1$, we need to add partial derivative of z^2 with respect to a^1.
$$
\frac{\delta C}{\delta w^1} = \frac{\delta C}{\delta a^3}*\frac{\delta a^3}{\delta z^3}*\frac{\delta z^3}{\delta a^2}*\frac{\delta a^2}{\delta z^2}*\frac{\delta z^2}{\delta a^2}*\frac{\delta a^2}{\delta z^2}*\frac{\delta z^2}{\delta w^1}
$$

In [None]:
def d_sig(x):
    tmpVal = np.exp(-x)
    return tmpVal / ((1+tmpVal)**2)

def latest_deriv_w(weights, desireA, lA, preA):
    lA-= desireA
    lA *= 2*d_sig(lZ)*preA
    return lA