# Shallow Neural Network


### Table of Contents

* [1. Neural Network Representation](#chapter1)
    * [1.1 Representation](#section_1_1)
    * [1.2  Computing a Neural Network output's](#section_1_2)
        * [1.2.1  ](#section_1_2_1)
        * [1.2.2 ](#section_1_2_2)



# 1. Neural Network Representation <a class="anchor" id="chapter1"></a>

## 1.1 Representation <a class="anchor" id="section_1_1"></a>

<center><img src="images/03-shallow neural network/NN-representation.PNG" width = "400px"></center>

- We have the input features, x1, x2, x3 stacked up vertically. And this is called the <b>input layer</b> of the neural network.
- Then there's another layer of circles. And this is called a <b>hidden layer</b> of the neural network. 
- The final layer here is formed by, in this case, just one node. And this single-node layer is called the <b>output layer</b>, and is responsible for generating the predicted value

> This neural network has <b>Two layers</b>. Indeed we don't count the input layer in a neural network.

## 1.2 Computing a Neural Network output's <a class="anchor" id="section_1_2"></a>

A node in a layer does two steps of computation:

- 1: It computes z = w.T x + b
- 2: It computes the activation function 

<center><img src="images/03-shallow neural network/single-node.PNG" width = "300px"></center>

So each node in neural network and so in the hidden layer will compute these two steps.

By convention the notation are the following:

$$ a^{[l]}_i $$
- a represents the activation of the layer (input layer a[0] | hidden layer a[1] | output layer a[2])
- l means the layer l
- i means the node i

> Lets compute the activation of the first node in the hidden layer

<center><img src="images/03-shallow neural network/hidden-layer-node1.PNG" width = "300px"></center>

The representation of X and W are:
$$X =\begin{bmatrix} x_1 \\ x_2 \\ x_3  \end{bmatrix} $$
$$W^{[1]}_1 =\begin{bmatrix} w_{11} \\ w_{12} \\ w_{13}  \end{bmatrix} $$

We have two steps to compute in this node:

$$ Z^{[1]}_1 = W^{[1]T}_1 X + b^{[1]}_1$$
$$ a^{[1]}_1 = \sigma(Z^{[1]}_1) $$ 

> We repeat this method on each node on the hidden layer

<center><img src="images/03-shallow neural network/hidden-layer-node2.PNG" width = "300px"></center>

$$ 
    \begin{cases}
    Z^{[1]}_1 = W^{[1]T}_1 X + b^{[1]}_1 \\
    Z^{[1]}_2 = W^{[1]T}_2 X + b^{[1]}_2 \\
    Z^{[1]}_3 = W^{[1]T}_3 X + b^{[1]}_3 \\
    Z^{[1]}_4 = W^{[1]T}_4 X + b^{[1]}_4 
    \end{cases}
$$

$$ 
    \begin{cases}
    a^{[1]}_1 = \sigma(Z^{[1]}_1) \\
    a^{[1]}_2 = \sigma(Z^{[1]}_2) \\
    a^{[1]}_3 = \sigma(Z^{[1]}_3) \\
    a^{[1]}_4 = \sigma(Z^{[1]}_4) 
    \end{cases}
$$

 When we're vectorizing, one of the rules of thumb that might help you navigate this, is that while we have different nodes in the layer, we'll stack them vertically.

$$Z^{[1]} = W^{[1]} X + b^{[1]} = \begin{bmatrix} ---w^{[1]T}_1---\\ ---w^{[1]T}_2--- \\ ---w^{[1]T}_3--- \\ ---w^{[1]T}_4---\end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \\ x_3  \end{bmatrix} 
+ \begin{bmatrix} b^{[1]T}_1\\ b^{[1]T}_2 \\ b^{[1]T}_3 \\ b^{[1]T}_4 \end{bmatrix} 
= \begin{bmatrix} Z^{[1]}_1\\ Z^{[1]}_2 \\ Z^{[1]}_3 \\ Z^{[1]}_4\end{bmatrix}$$

$$ a^{[1]} = \sigma(Z^{[1]})= \begin{bmatrix} a^{[1]}_1\\ a^{[1]}_2 \\ a^{[1]}_3 \\ a^{[1]}_4\end{bmatrix}$$

We have complete the computation of the hidden layer. Now we need to realize the same process and the output layer.


> Output layer

<center><img src="images/03-shallow neural network/output-node.PNG" width = "300px"></center>

There is only one node in the output layer, so the notation will be:
$$W^{[2]} =\begin{bmatrix} w_{21} \\ w_{22} \\ w_{23}  \end{bmatrix}^T = \begin{bmatrix} w_{21} & w_{22} & w_{23}  \end{bmatrix} $$
$$ Z^{[2]} = W^{[2]}a^{[1]} + b^{[2]}$$ 

$$ a^{[2]} = \sigma(Z^{[2]}) $$ 


Neural Network with one hidden layer -  Equations for one example:
$$Z^{[1]} = W^{[1]} X + b^{[1]} $$

$$ a^{[1]} = \sigma(Z^{[1]})$$

$$ Z^{[2]} = W^{[2]}a^{[1]} + b^{[2]}$$ 

$$ a^{[2]} = \sigma(Z^{[2]}) $$ 

## 1.3 Vectorizing across multiple examples <a class="anchor" id="section_1_2"></a>

Now we want to vectorize the previous equations of our 2 layers neuron network.

> We consider an input X with m examples. Each example have n features :

$$X=\begin{bmatrix} .. & .. & .. & ..\\ .. & .. & .. & .. \\ X^{(1)} & X^{(2)} & .. & X^{(m)}  \\.. & .. & .. & ..\\ .. & .. & .. & .. \end{bmatrix} \in (n \times m)$$

> So if we consider a hidden layer with k nodes:

 $$ W^{[1]} = \begin{bmatrix} ---w^{[1]T}_1---\\ ---w^{[1]T}_2--- \\ ... \\ ---w^{[1]T}_{k-1}--- \\ ---w^{[1]T}_{k}---\end{bmatrix} \in (k \times n) $$ 

 $$ b^{[1]} = \begin{bmatrix} b^{[1]T}_1\\ b^{[1]T}_2\\ ... \\ b^{[1]T}_{k-1} \\ b^{[1]T}_{k}\end{bmatrix}  \in (k \times 1)$$

The result for the hidden layer will be: 

$$Z^{[1]} = W^{[1]} X + b^{[1]} =\begin{bmatrix} h_{11} & .. & .. & h_{1m}\\ .. & .. & .. & .. \\ Z^{[1](1)} & Z^{[1](2)} & .. & Z^{[1](m)}   \\.. & .. & .. & ..\\ .. & .. & .. & .. \end{bmatrix} \in (k \times m)$$

- Vertically we have the hidden units
- horizontally we have the training examples

    - h11 is the first hidden unit (node 1) of the first example
    - h1m is the first hidden unit (node 1) of the m-last example

We apply the activation:

$$a^{[1]} = \sigma(Z^{[1]})=\begin{bmatrix} .. & .. & .. & ..\\ .. & .. & .. & .. \\ a^{[1](1)} & a^{[1](2)} & .. & a^{[1](m)}   \\.. & .. & .. & ..\\ .. & .. & .. & .. \end{bmatrix} \in (k \times m)$$