# Perceptrons

A perceptron models a biological neuron with an algorithm for the supervised learning of a binary classification pattern. Put simply, a perceptron is an algorithm for learning a threshold function — a function that maps a vector x to either 0 or 1, hence classifying that vector  

- Single-layer perceptrons are only capable of learning *linearly-separable* patterns




### McCulloch and Pitts Neuron:
This single-layer perceptron is the simplest feedforward neural network.


Single-Layer Perceptron             |  Alternative Diagram
:-------------------------:|:-------------------------:
![](images/perceptron-annotated.png)  |  ![](images/perceptron-alt.png)


- *Transfer function* or *activation function* — translates the input signals to an output signal. Usually used interchangeably with the term 'activation function'

- *Bias* — a positive number means this neuron is more predisposed to firing

### Perceptron Learning Algorithm:
The process of 'learning' in a single-layer perceptron is the process of adjusting the weights until the correct output.

A prediction is made by taking the dot product of the input vector and weight vector and adding the bias, then passing that through an activation function (eg. the Heaviside step function).


<img src="images/perceptron-learning-rule.png" alt="Rule" style="width: 50%;"/>

#### Possible Cases After Prediction:
1. If the prediction was 0 when it should have been 1: then we need <em>higher weights</em> for the edges corresponding to <em>higher input values</em>, which is why we add $\eta x_k$ to each weight.
2. If the prediction was 1 when it should have been 0: then we need <em>lower weights</em> for the edges corresponding to <em>higher input values</em>, which is why we subtract $\eta x_k$ from each weight.
3. If the prediction was correct: the weights are left alone. No learning happens in this case


#### Initial setup:
Initially, the weights are small random values.

$\eta$ — learning rate. Takes value between 0 and 1. Determines the magnitude of weight changes

### Linear Separability:

Rosenblatt proved mathematically that this algorithm guarantees that a decision boundary will be 'learned', provided that the data it works with is linearly separable.

The algorithm never learns the correct decision boundary if the learning set is not linearly separable. Ie. the input vectors that are positive and the input vectors that are negative cannot all be separated by a single hyperplane.

- The classic example of this is the XOR function. XOR can't be learned by a single-layer perceptron, but it can be learned by a multi-layer perceptron. The development of neural networks was mostly abandoned in the 1960s before a resurgence in the 1980s.
- **About multi-layer perceptrons:**
    - Multi-layer perceptrons can refer generally to any feedforward neural network, that is, any network where the connections between nodes don't form a cycle such that information 'flows' forward
    - With multi-layer perceptrons containing a hidden layer, we need to use backpropagation to be able to attribute error to specific weights
    -

$$g(s) = \begin{cases}
            1, & s \geq 0,\\
            0, & s < 0.
         \end{cases}$$