# The Humble Perceptron

## History
---

In 1957, Frank Rosenblatt pioneered the first machine learning algorithm through the invention of the perceptron, modelled after biological neurons. Later unveiled to the public in 1958, the perceptron was touted as the next big thing with many believing that it would soon be able to "walk, talk, see, write, reproduce itself and be conscious of its existence"<sup>[1]</sup> (with some of the former only coming to fruition some 60 years later). The Navy claimed it was the first machine capable of "receiving, recognizing and identifying its surroundings witouth any human training or control"<sup>[1]</sup>, despite the overhyped expectations, the perceptron paved the way for the field of AI. However, rather unfortunately its shortcomings like its inability to learn patterns that were not linearly-sperable led AI research to stall for the coming decades, until the advent of the multilayer perceptron and backpropagation in the 1980s. 

<small>
[1]: <a href="nytimes.com/timesmachine/1958/07/08/83417341.html?pageNumber=25">“New Navy Device Learns By Doing,” New York Times, July 8 1958.</a>
</small>

## Math behind it
---

**Perceptron Model**

The perceptron is a binary classifier that maps an input vector $ \mathbf{X} $ to a binary output $ \hat{y} \in \{0, 1\} $. It uses a simple linear model followed by a step activation function, defined as follows: 

$$
z = \mathbf{W} \cdot \mathbf{X} + b 
$$

$$
\hat{y} = \mathrm{H}(z) = \mathrm{H}( \mathbf{W} \cdot \mathbf{X} + b  )

= 

\begin{cases}
    1, & \text{if  } \mathbf{W} \cdot \mathbf{X} + b  \ge 0, \\
    0, & \text{otherwise}
\end{cases}
$$

Where: 
- $ \mathbf{W} = [w_1, w_2, \dots, w_n] $: weights vector
- $ \mathbf{X} = [x_1, x_2, \dots, x_n] $: input vector (features)
- $ b $: bias term 
- $ \hat{y} $: predicted label (0 or 1)
- $ \mathrm{H}() $: [step activation function](../activation_functions/step/step.ipynb)

We can do a neat little simplification by absorbing the bias into weights by defining $ x_0 = 1 $, giving us: 

$$
\hat{y} = \mathrm{H}( \mathbf{W} \cdot \mathbf{X}) = \mathrm{H}( \sum_{i=0}^n w_i x_i )
$$

Where: 
- $ x_0 = 1 $
- $ w_0 = b $ 

<br><br>
**Activation Function**

The activation function is a Heaviside step function: 

$$
\mathrm{H}(z) = \begin{cases}
    1, & \text{if  } z  \ge 0, \\
    0, & \text{otherwise}
\end{cases}
$$
> For a more in-depth intuition I have implemented the step function [here](../activation_functions/step/step.ipynb)


<br><br>
**Perceptron Update Rule**

When training a perceptron we try to nudge the predicted label $ \hat{y} $ towards the actual label $ y $, by updating the weights as follows: 

$$
w_i \leftarrow w_i +  \eta \cdot (y - \hat{y}) \cdot x_i 
$$

Where: 
- $ \eta $: the learning rate


> note:  
> The error $ (y - \hat{y}) $ will resolve to one of 3 cases
> - 0: the perceptron correctly classifies the input, so do nothing
> - 1: the output is too low (0 when it should have been a 1), so increase weights in direction of input
> - -1: the output is too high (1 when it should have been a 0), so decrease weights in direction of input

## Code
---