# Perceptron

A perceptron is a neural network unit (an artificial neuron). It is the most basic unit of a neural network.

The figure below depicts a model of a perceptron. At first glance, this may look complicated but we'll dive deeper into this, step by step.


## General
A general explanation is the following:
The inputs $x$ are represented in green. In blue, you see one weight that is attributed to each input.
In orange, you have the sum of the input/weight combination with a new term, $b$, which represents the bias.
All of this is fed into an activation function, in red, which outputs an output $y$.

If we make the abstraction of the perceptron and call a function, like $P$, we can say that the output $y$ is a function of the inputs $x$.
Mathematically, we can formulate this as 
$$ y =P(\mathbf{x}) $$.



![Complete perceptron [Image]](./assets/Perceptron_ALL.png)

## Sum

In the previous cell, we said that a *weight that is attributed to each input*. 

Actually, each input $x_i$ is multiplied by it's corresponding weight $w_i$.

The sum operator $\Sigma$ sums up every inputs-weight multiplication. This can be written as 
$$\sum^{m}_{i=1}x_iw_i$$
The bias $b$ is just a simple constant that we add to this whole sum. We will see later what it does.

At the end of the summation operation, we end up with the following line, which we'll call $z$ to make it easier.
 $$z = \sum^{m}_{i=1}x_iw_i + b$$

![Sum part of perceptron [Image]](./assets/Perceptron_SUM.png)

## Activation function

The next (and last) operation is the activation function, $f$.
Just like a normal function in a programming context, $f$ takes as argument $z$ and returns the output value $y$.

You can see one example of $f$ on the graph on the right. Here, the activation function is called the *heaviside step-function*.

This function returns 0 for every $z$ that is smaller than $0$,  and for every $z$ greater than $0$, it returns 1.
Mathematically, that can be formulated as

$$y=\begin{cases} 0, & \text{if } z < 0, \\ 1, & \text{if } z > 0, \end{cases}$$

If we expand, let's say the case when $z > 0$, we have:
$$\begin{align} 
z &> 0 \\ 
\Leftrightarrow \sum^{m}_{i=1}x_iw_i + b &> 0 && \text{(We replace z by }\sum^{m}_{i=1}x_iw_i + b) \\ 
\Leftrightarrow \sum^{m}_{i=1}x_iw_i &> -b && \text{(We subtract b on each side of the equation)} \\ 
\end{align}$$

If we do the same for the case $z<0$, we have
$$y=\begin{cases} 
0, & \text{if } \sum^{m}_{i=1}x_iw_i &< -b \\
1, & \text{if } \sum^{m}_{i=1}x_iw_i &> -b 
\end{cases}$$

What does this last equation tell us ?
It tells us that if the sum of the inputs times the weights is greater than some number $-b$, then the output $y$ is 1, else it is 0. 

This will be easier to understand with an example. Let's say the bias $b = -1$.

$$y=\begin{cases} 
0, & \text{if } \sum^{m}_{i=1}x_iw_i & < 1 \\
1, & \text{if } \sum^{m}_{i=1}x_iw_i & > 1 
\end{cases}$$

This is like moving the boundary of the activation function, the one in red on the graph below, over by one.
To have the boundary at 0, we just have to set the bias to 0.
If we want the boundary to be at 5, we set the bias to -5.

What this lets us do is control at which **threshold**  the inputs activate the neuron.
You can see this binary output, 1 or 0, as **on** or **off**. Thus (using this particular activation function) a neuron can either be activated or not activated.



![Activation part of perceptron [Image]](./assets/Perceptron_activation.png)

If you don't want the neuron to either be **on** or **off**, but also somewhere in between, you can use any other activation functions, some of which are shown below.

![Activation functions (Image)](./assets/activation-functions.jpg)

The best part of all this is that you don't have to specify the weights and the biases, **they will be modified automatically.** That is what it is meant when a neural network **learns**.
The error, from the output, gets propagated back to the input to modify the weights. We will learn more about this later.

![Weight update (Image)](./assets/weight_update.png)

## Multiple layer perceptron (MLP)

Now that we've seen how a single neuron unit works, we can put multiple ones in a stack to create a layer of neurons, like shown in the GIF below.
We have 784 inputs on the left, and every input is connected to every one of the 16 neurons on the right. 
The weights are represented as the fine line making the connection between the input and the neuron.
Thus, we have the following sum **for each of the 16 neurons you see below**.

$$z = \sum^{784}_{i=1}x_iw_i + b$$

![](./assets/part1.gif)

Now that we know how to create a layer, when can put layers in parallel, to create multiple layers. This is where the name *Multi layer perceptron*(MLP  in short) comes from.

This means that the **output of a neuron from the first layer is the input of a neuron from the second layer.**

![](./assets/part2.gif)

## Supplementary reading material
I highly encourage you to watch the following video (from which the gifs were extracted). This video is related to this chapter, but the rest of the videos in the playlist are more advanced, and we will not see it in that detail.

- [3blue1brown But what is a neural network?](https://www.youtube.com/watch?v=aircAruvnKk)