# Biological to Artificial Neuron

## Introduction

*Artificial Neural Networks* (ANNs) is a Machine Learning Model inspired by the networks of biological neurons found in our brains.

ANNs are the very core of Deep Learning. They are versatile, powerful, and scalable. They are able to tackle complex problems like classifying large number of images (Google Images), speech recognition services ("OK Google"), recommending movies (Netflix), etc.

We'll use popular Keras API: this is a beautifully designed and simple high-level API for building, training, evaluating, and running neural networks. 

## History of ANNs

ANNs first introduced back in 1943 by Neurophysiologist Warren McCulloch and the mathematician Walter Pitts in their paper "A Logical Calculus of Ideas Immanent in Nervous Activity".

In 1960, idea of making intelligent machines seems impossible (for quite a while), ANNs entered a long winter.

In early 1980, new architectures were build, interest rises in *connectionism* (the study of neural networks), progress was slow. And around 1990, more powerful Machine Learning Algorithms was built like SVMs. So they put on hold again.

But now we see that ANNs are rising and this time they keep rising. Here are few reasons why:

- Huge quantity of data available.
- Increase in Computing powers, and thanks to Gaming industry to give us GPU.
- Training algorithms have been improved.
- More funding in building amazing products and research.

## Logical Computations with Neurons

McCulloch and Pitts proposed a very simple model of the biological neuron, which later became known as *artificial neuron*: it has one or more binary inputs and one binary output. The artificial neuron activates its output when more than a certain number of its inputs are active. 

You can compute complex logical expressions by changing the inputs or combining them.

## The Perceptron

The *Perceptron* is one of the simplest ANN architectures, invented in 1957 by Frank Rosenblatt. It is based on slightly different artificial neuron called a *threshold logic unit* (TLU), or sometimes a *linear threshold unit* (LTU).

The inputs and output are numbers, and each connection holds some value called *weights*. The TLU computes a weighted sum of its inputs:
$$
z = w_1x_1 + w_2x_2 + ... + w_nx_n = X^TW
$$
 Then applies a *step function* and results an output:
$$
h_w(X) = step(z), where\ z = X^TW
$$
The most common step function used in Perceptron is the *Heaviside step function*:
$$
heaviside(z) = \{{0, if\ z < 0 \\ 1, if z >= 0}
$$
Sometimes a sign function:
$$
sign(z) = \{{-1, if\ z < 0 \\ 0, if\ z = 0 \\ 1, if\ z > 0}
$$
A single TLU can be use for simple linear binary classification. Training a TLU means finding the right values of weights.

A Perceptron is simply composed of a single layer of TLUs, with each TLU connected to all the inputs.

When all the neurons in a layer are connected to every neuron in previous layer, the layer is called *fully connected layer* or *dense layer*.

The inputs of Perceptron are simply passthrough neurons called *input neurons*: they output whatever they are fed. All the input neurons form the *input layer*. Moreover, an extra bias feature is added: represented as a *bias neuron*, always outputs 1. 

Computing the outputs of a fully connected layer:
$$
h_{W, b} = \phi(XW + b)
$$
 In this equation,

- **X**: inputs matrix
- **W**: weights matrix
- **b**: bias
- $\phi$: it represents *activation function*

The training of Perceptron was largely inspired by *Hebb's rule*: which is summarized as "cells that fire together, wire together"; that is connection weights between two neurons tends to increase when they activates together.

The Perceptron trains with slight different variant that takes into account of error made by the network. It reinforces the connection when it makes an error. The rule is:
$$
w_{i,j}^{(next\ step)} = w_{i, j} + \eta(y_j - \hat y_j)x_i
$$
In this equation:

- $w_{i,j}$ is the connection weight between the $i^{th}$ input neuron and the $j^{th}$ output neuron.
- $x_i$ is the $i^{th}$ input value of current training instance.
- $y_j$ is the target output of the $j^{th}$ output neuron.
- $\hat y_j$ is the output of the $j^{th}$ output neuron.
- $\eta$ is the learning rate.

The decision boundary of the TLUs are linear, so Perceptron are incapable of learning complex patterns. Overall this algorithm is called *Perceptron convergence theorem*.

You can use Sklearn's `Perceptron` to do this. However, it is same as the SGD classifier of Sklearn with hyperparameters `loss="perceptron", learning rate="constant"`.

Perceptrons do not output a class probability as Logistic regression. Rather, they output a hard threshold.

In 1969 monograph *Perceptrons*, there are weakness of Perceptrons like it cannot solve XOR problem. 

It turns out limitations of Perceptron can be replaced by stacking multiple Perceptrons. The resulting ANN is called *Multilayer Perceptron* (MLP). An MLP can solve XOR problem easily.


## Perceptron

In [2]:
import numpy as np
from sklearn.datasets import load_iris
from sklearn.linear_model import Perceptron

In [3]:
iris = load_iris()
X = iris.data[:, (2, 3)] # petal length and width
y = (iris.target == 0).astype(np.int) # Iris setosa?

In [4]:
per_clf = Perceptron()
per_clf.fit(X, y)

Perceptron()

In [5]:
y_pred = per_clf.predict([[2, 0.5]])
y_pred

array([0])