<a href="https://colab.research.google.com/github/PaulToronto/Stanford-Andrew-Ng-Machine-Learning-Specialization/blob/main/2_1_2_Neural_network_model.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 2.1.2 Neural network model

## 2.1.2.1 Neural network layer

<img src='https://drive.google.com/uc?export=view&id=1E16ACEMFRfLNBA4MLac7wpULyboDcVgJ'>

- a **layer** of neurons is a fundamental building block of most modern neural networks
- the layer above inputs 4 numbers and these 4 numbers are inputs to each of 3 neurons
- each of these 3 neurons is just implementing a little **logistic regression unit** or **logistic regression function**
    - a **unit** is a single neuron in the layer
- the 3 neurons output $0.3$, $0.7$ and $0.2$
    - these 3 values are called **activation values**
    - $\vec{a}$ is a vector of activation values, denoted $\vec{a}^{[1]}$ since these are the activation values in layer 1.
- the input layer is called **layer 0** and the layer number increases from left to right
- superscripts enclosed in square brackets are also used to denoted $\vec{w}$ and $b$: $\vec{w}^{[1]}$, $b^{[1]}$
- $\vec{a}^{[1]}$ is passed to the **output layer** of this neural network

<img src='https://drive.google.com/uc?export=view&id=1i1T8-Dz0id56rnzj2LtK49TlHT0rnt6T'>

- the output of layer 1 is the input to layer 2, which is $\vec{a}^{[1]}$

$$
\vec{a}^{[1]} = \begin{bmatrix}0.3 \\ 0.7 \\ 0.2 \end{bmatrix}
$$

- the output of layer 2 is $a^{[2]}$ which in this example is equal to $0.84$
    - this is a probability

<img src='https://drive.google.com/uc?export=view&id=1KNeQ7VwIv6I05cUhcM9h1P2L22KBJu2z'>

- $a^{[2]}$, along with a threshold value can be used to make a prediction
    - in this case, $\widehat{y} = 1$

## 2.1.2.2 More complex neural networks

<img src='https://drive.google.com/uc?export=view&id=1MDNBzgvdu8IigCRcnVHgoZBJVDxUvCqK'>

- We say this neural network has 4 layers. Layer 0 is not counted
 - Layer 0 is the input layer
 - Layers 1, 2 and 3 are hidden layers
 - Layer 4 is the output layer

### Zoom into Layer 3

<img src='https://drive.google.com/uc?export=view&id=1LBaObWeIXC_9irTqjiuPCtfhyTIP0Cb_'>

- the input to Layer 3 is $\vec{a}^{[2]}$
- the output of Layer 3 is $\vec{a}^{[3]}$
- Layer 3 has 3 hidden neurons which are also called **hidden units**

$$
\begin{align}
\vec{a}^{[3]} &= \begin{bmatrix}a_1^{[3]} \\ a_2^{[3]} \\ a_3^{[3]}\end{bmatrix} \\
&= \begin{bmatrix}
g\left(\vec{w}_1^{[3]} \cdot \vec{a}^{[2]} + b_1^{[3]}\right) \\
g\left(\vec{w}_2^{[3]} \cdot \vec{a}^{[2]} + b_2^{[3]}\right) \\
g\left(\vec{w}_3^{[3]} \cdot \vec{a}^{[2]} + b_3^{[3]}\right)
\end{bmatrix}
\end{align}
$$



### General equation for an arbitrary unit $j$ and an arbitrary layer $l$

$$
a_{j}^{[l]} = g\left(\vec{w}_{j}^{[l]} \cdot \vec{a}^{[l - 1]} + b_{j}^{[l]}\right)
$$

- $g$ is the **activation function**, which might be the sigmoid function, but there are other activation functions
- an activation function outputs an **activation value**
- the input values, $\vec{x}$ is also denoted as $\vec{a}^{[0]}$, so this formula can be applied to layer 1

## 2.1.2.3 Inference: Making Predictions

To make inferences an algorithm called **forward propagation** is used
- it is is called *forward* propagation because it goes from left to right
- **backward propagation** or **back progagation** is used for learning

### Example: Handwritten digit recognition

- for simplicity, we are only differentiating between the digits, $0$ and $1$
- the image of digit is an $8 \times 8$ grid of intensity values
    - ranging from $0$ (black) to $255$ (white)
- here is a $1$:

<img src='https://drive.google.com/uc?export=view&id=1ghIVLSU1bcpV-NVcqdq-PZImpeZstd2n'>

- here is the neural network:

<img src='https://drive.google.com/uc?export=view&id=1WcOqh0l6Ewvd38Dlu3Q6zp-VRjNgHazd'>

- $\vec{a}^{[3]}$ is the probablity of the digit being a 1


### Steps

1. The first computation is to go from $\vec{x}$ to $\vec{a}^{[1]}$

$$
\vec{a}^{[1]} = \begin{bmatrix}
g\left(\vec{w}_{1}^{[1]} \cdot \vec{x} + b_{1}^{[1]}\right) \\
\vdots \\
g\left(\vec{w}_{25}^{[1]} \cdot \vec{x} + b_{25}^{[1]}\right)
\end{bmatrix}
$$

2. The next step is to compute $\vec{a}^{[2]}$

$$
\vec{a}^{[2]} = \begin{bmatrix}
g\left(\vec{w}_{1}^{[2]} \cdot \vec{a}^{[1]} + b_{1}^{[2]}\right) \\
\vdots \\
g\left(\vec{w}_{15}^{[2]} \cdot \vec{a}^{[1]} + b_{15}^{[2]}\right)
\end{bmatrix}
$$

3. Compute $\vec{a}^{[3]}$

$$
\vec{a}^{[3]} = \begin{bmatrix}
g\left(\vec{w}_{1}^{[3]} \cdot \vec{a}^{[2]} + b_{1}^{[3]}\right)
\end{bmatrix}
$$

4. Is $a_1^{[3]} \ge 0.5$ ?
 - if True: $\widehat{y} = 1$, image is digit $1$
 - if False: $\widehat{y} = 0$, image is digit $0$

 ### Note:

 Starting on the left with layers with a large number of units (in our case 25) moving right to layers with less units (15 in our case) is a typical choice in neural network architectures.


## 2.1.2.4 Lab - Neurons and Layers

https://colab.research.google.com/drive/1E2PBxhZHaD5VY69Ug713sFXKC25foFsu