# The Perceptron

The *Perceptron* is on of the simplest Artificial neural network architectures, proposed in 1957 by Frank Rosenblatt. It is based on a *threshold logic unit (TLU)* and it computes a weighted sum of its inputs

$$ z = w_1x_1 + \cdots + w_nx_n = \textbf{x}^{\intercal}\textbf{w} $$

then applies a step function to that sum and outputs the result: $h_w(\textbf{x})=\text{step}(\textbf{x})$. One of the most common step function used is the *Heaviside step function*

$$ \text{heaviside}(z) = \begin{cases} 0 & \text{if } z<0 \\ 1 & \text{if } z\gt0 \end{cases}$$

A single TLU can be used for binary classification; it computes a linear combination of its inputs and if the output reaches a threshold, it outputs a positive class, otherwise outputs the negative class.

A perceptron is composed of a single layer of TLUs, with each TLU connected to all the inputs. When all the neurons in a layer are connected to every neuron in the previous layer the layer is called a *fully connected* or *dense* layer. *Input Neurons* are simple inputs that output whatever they are fed and all input neurons form the *input layer*. A bias neuron is generally added, tipycally represented by a *bias neuron*, which outputs 1 all the time. (e.g. architecture pg 286 fig 10-5)

We can then write the outputs of a fully connected layer as 
$$ h_{\textbf{W, b}}(\textbf{X}) = \phi(\textbf{XW + b})$$
Where
- $\textbf{X}$ is the matrix of input features (one row per instance, one col per feature)
- $\textbf{W}$ contains the connection weights, except the ones from the bias neuron (one row per input neuron, one column per artificial neuron in the layer)
- $\phi$ is called the *activation function* (when the neurons are TLU, this is a step function)

The perceptron learning rule reinforces connections between neurons tha help reduce the error: the perceptron is fed one training instance at a time, and for each instance it makes its predictions. For every output neuron that produced a wrong predictions, it reinforces the connection weights from the inputs that would have contributed to the correct prediction

$$ w_{i,j}^{\text{next step}} = w_{i,j} +\eta(y_j - \hat{y_j})x_i$$

where 
- $w_{i,j}$ is the weight between ith input neuron and jth output neuron
- $x_i$ is the ith input value of the current training instance
- $\hat{y_j}$ is the output of the jth output neuron 
- $y_j$ is the target output of the jth ouptut neuron
- $\eta$ is the learning rate

# The multilayer perceptron and back propagation