# The Perceptron
The perceptron exhibits a different artficial neuron called <mark>*Threshold logic unit* (TLU)</mark>, or sometime a <mark>*Linear Threshold Unit* (LTU)</mark>. The inputs and outputs are numbers. The TLU coputes a weighted sum of its inputs ($z=w_1x_1+w_2x_2+...+w_nx_n = x^Tw)$, then applies a <mark>*step function*</mark> to that sum and outputs the result: $h_w(x) = step(z)$ where $z = x^Tw$.

The most common step function used in Perceptron is **Heaviside step function** while somethimes sign function is also preferred.
$$\text{heaviside}\,(z)\> = \> \begin{cases} 0 \>\text{if} \>z<0 \\ 1 \>\text{if} \>z >=0 \end{cases}  \qquad \text{sgn}\, (z)\> = \> \begin{cases} -1 \>\text{if} \> z<0 \\ 0 \quad\text{if}\> z=0 \\ +1 \> \text{if}\> z>0 \end{cases} $$

A perceptron is simply composed of a single layer of TLUs.<mark> When all the neurons in a layer are connected to every neuron in the <b>previous layer</b>, the layer is called a **fully connected layer**</mark> or a *dense layer*. The inputs of the perceptron are fed to special passthrough neurons called *input neurons*, and all these input neurons form *input layer*. An extra bias feature is generally added ($x_0\>=\>1)$ represented by *bias neuron*, which outputs 1 all the time.

Computing output of a fully connected layer.
- $h_{W,\, b} = \Phi\>(XW\, + \, b)$

where,
- $X$ represents the matrix of input feature. <mark>One row per instance</mark> and <mark>one column for per feature.</mark>
- Weight Matrix $W$ except for the one from the bias neuron. <mark>One row per neuron</mark> and <mark>One column per artificial neuron in the layer.</mark>
- Bias vector $b$ contains all the connection weights between the bias neuron and the artificial neurons. <mark>One bias term per artificial neuron.</mark>
- Activation function $\Phi$: when the artificial neurons are TLUs, it is a *step function*.

### Training Algorithm 
*Hebb's rule* "Cells that fire togther, wire together"; the connection weight between two neurons tends to increase when they fire simultaneously.

Perceptron training is also done in the same that resembles the above mentioned rule. For every output neuron that produced a wrong perdiction, it reinforces the connection weights from the inputs that would have contributed to the correct prediction.

Perceptron learning rule (weight update):
- $w_{i,\,j}^{(\text{next}\;\text{step})}\> = \> w_{i,\,j} \> +\> \eta\big(y_j\, -\,\hat{y_j}\big)x_i$

where,

- $w_{i,\,j}$ is the <mark>connection weight between the $i^{\text{th}}$ input neuron and  $j^{\text{th}}$ output neuron.</mark>
- $\eta$ is <mark>learning rate.</mark>
- $x_i$ is  <mark>$i^{\text{th}}$ input value of the current training instance.</mark>
- $y_j$ is the <mark>target output of the  <b>$j^{\text{th}}$ output neuron</b></mark> for the current training instance.
- $\hat{y_j}$ is the <mark>output of the  $j^{\text{th}}$ output neuron</mark> for the current training instance.


In [1]:
import numpy as np
from sklearn.datasets import load_iris
from sklearn.linear_model import Perceptron

iris = load_iris()0lllo
X = iris.data[:, (2, 3)] # petal length, petal width
y = (iris.target == 0).astype(np.int) # Iris setosa

per_clf = Perceptron()
per_clf.fit(X, y)

y_pred = per_clf.predict([[2, 0.5]])
y_predj

array([0])

Perceptrons do not output a class probability; rather, they make predictions based on a hard threshold. o