# Theory of Neural Networks

What have seen the theory of a neuron through the perceptron model. 
The perceptron model is a simple model of a neuron that takes multiple inputs, applies weights to them, and produces an output based on a threshold function. 
We saw that the perceptron can be used to classify data into two categories.

As soon as we tackle more sophisticated problems, this kind of model quickly shows its limits — it's simply not powerful enough to deliver good results

## Machine Learning Approach

To overcome this, what we traditionally do in machine learning is improve the model by adding, for example, squared variables like $x_1^2$ and $x_2^2$. This allows us to build a polynomial model — a process known as *feature engineering*. Essentially, it's the art of crafting new input features from the existing ones, and it can require a lot of time, intuition, and expertise.


## Deep Learning Approach

But we are doing deep learning and not machine learning.

In deep learning, we are asking ourselves the question : **What is going on if we link this neuron with other neurons?**

Instead of manually crafting features, we let the model learn them automatically. The idea is to let the machine learn how to do its own feature engineering.  We do this by stacking multiple layers of neurons on top of each other, creating a deep neural network. Each layer learns to extract increasingly complex features from the input data, allowing the model to capture intricate patterns and relationships.

This is the essence of deep learning: using multiple layers of neurons to learn hierarchical representations of data. Each layer transforms the input data into a more abstract representation, enabling the model to learn complex functions and make accurate predictions.

Sounds incredible and hard, right? So let's fight fire with fire.

> We are doing a neural network from scratch, using mathematics, matrix calculations, and 30min. Let's go.

# Let's go

We aren't here to waste out time. So let's take the previous neuron that we built, and duplicate this bad boy.

Let's call the first neuron $N_1$ and the second one $N_2$, and put them together in a layer.

### Layer
We can think of a layer as a collection of neurons that work together to process the input data. \
Each neuron in the layer receives the same input, applies its own transformation, and produces an output.\
So there are as many outputs as there are neurons in the layer. \
In this case, we have two neurons in the layer, $N_1$ and $N_2$, so we have two outputs $z_1$ and $z_2$.

### Neuron
Each neuron in the layer is tranforming the input data, so for each neuron in the layer, we have a weight vector $w$ and a bias $b$.
By taking the same neurons model we had before, we can write the output of each neuron as:
$$
z = W \cdot x + b
$$
Where $W$ is the weight vector, $x$ is the input vector, and $b$ is the bias.

### Output Equation: The same but with matrix

The same input data $x = (x_1, x_2)$ are going through both neurons at the same time, so we have two outputs $z_1$ and $z_2$.

We can write this as a matrix multiplication.
$$
Z = Wx + b
$$
$$
\begin{bmatrix}
z_1 \\
z_2
\end{bmatrix}
=
\begin{bmatrix}
W_{11} & W_{12} \\
W_{21} & W_{22}
\end{bmatrix}
\begin{bmatrix}
x_1 \\
x_2
\end{bmatrix}
+
\begin{bmatrix}
b_1 \\
b_2
\end{bmatrix}
$$
Where $W_{ij}$ is the weight of the $i$-th neuron for the $j$-th feature, and $b_i$ is the bias of the $i$-th neuron.

We now recover the same equation we found for the neuron in the previous notebook — but this time, we have two neurons working in parallel.

To confirm this, let's look at the expanded equation for neuron $N_1$:
$$
z_1 = W_{11} \cdot x_1 + W_{12} \cdot x_2 + b_1
$$

And for neuron $N_2$:
$$
z_2 = W_{21} \cdot x_1 + W_{22} \cdot x_2 + b_2
$$

We clearly see that both neurons are working in parallel, each with its own weights and bias.

### Activation function
Now, we need to apply an activation function to the output of each neuron. \
Remember, The activation function is transforms the output of the neuron to introduce non-linearity into the model. \
We can use the same activation function for both neurons, or we can use different activation functions. \
In this case, we will use the sigmoid activation function for both neurons.
$$
a(z) = \frac{1}{1 + e^{-z}}
$$
Where $a(z)$ is the activation function, and $z$ is the output of the neuron.
We can write this as:
$$
A = a(Z)
$$
$$
\begin{bmatrix}
a_1 \\
a_2
\end{bmatrix}
=
\begin{bmatrix}
a(z_1) \\
a(z_2)
\end{bmatrix}
=
\begin{bmatrix}
\frac{1}{1 + e^{-z_1}} \\
\frac{1}{1 + e^{-z_2}}
\end{bmatrix}
$$
Where $a_1$ and $a_2$ are the outputs of the activation function for neurons $N_1$ and $N_2$, respectively.

### Very important
>The output of the activation function is the final output of the layer. \
In this case, we have two outputs $a_1$ and $a_2$, which are the outputs of the first layer.

### Do you see why we use matrix?
What's stopping us from adding a new row in the matrix? \
Creating a new neuron means adding a new row in the weight matrix and a new bias. \

Like this:
$$

\begin{bmatrix}
W_{11} & W_{12} \\
W_{21} & W_{22} \\
W_{31} & W_{32}
\end{bmatrix}  
, and 
\begin{bmatrix}
b_1 \\
b_2 \\
b_3
\end{bmatrix}
$$

The output equation didn't change:
$$
Z = Wx + b
$$

But if developp this, we have:

$$
\begin{bmatrix}
z_1 \\
z_2 \\
z_3
\end{bmatrix}
=
\begin{bmatrix}
W_{11} & W_{12} \\
W_{21} & W_{22} \\
W_{31} & W_{32}
\end{bmatrix}
\begin{bmatrix}
x_1 \\
x_2
\end{bmatrix}
+
\begin{bmatrix}
b_1 \\
b_2 \\
b_3
\end{bmatrix}
$$
Where $W_{ij}$ is the weight of the $i$-th neuron for the $j$-th feature, and $b_i$ is the bias of the $i$-th neuron.
>So we can add as many neurons as we want in the layer, and the output equation will remain the same.

### A layer is a collection of neurons that work together to process the input data: it can be seen as a huge and complex neuron.


## Adding a second layer