# Introduction to Artificial Neural Network

Birds inspired us to fly, Brain’s architecture inspirad to build an intelligent machine.

ANN : is a Machine Learning model inspired by the networks of biological
neurons found in our brains

ANN are versatile, powerful, and scalable, making them ideal to tackle large and highly complex
Machine Learning tasks : 
- classifying billions of images (e.g., Google Images)
- powering speech recognition services (e.g., Apple’s Siri)
- recommending the best videos to watch to hundreds of millions of users every day (e.g., YouTube)
- learning to beat the world champion at the game of Go (DeepMind’s AlphaGo).


## From Biological to Artificial Neurons

ANNs have been around for quite a while: they were first introduced back in 1943, McCulloch and Pitts presented a simplified computational model of how biological neurons might
work together in animal brains to perform complex computations using
propositional logic.

In the early 1980s, new architectures were invented and better training techniques were developed, sparking a revival of interest in *connectionism*. But progress was slow, and by the 1990s other powerful Machine Learning techniques were invented, such as *Support Vector Machines* These techniques seemed to offer better results and stronger theoretical foundations than ANNs, so once again the study of neural networks was put on hold.

The tremendous increase in computing power since the 1990s now
makes it possible to train large neural networks in a reasonable
amount of time  and ANNs frequently outperform other ML techniques on very
large and complex problems.

### Biological Neurons
Biological neurons produce short electrical impulses called action potentials (APs, or just signals) which travel along the axons and make the
synapses release chemical signals called neurotransmitters. When a neuron receives a sufficient amount of these neurotransmitters within a few milliseconds, it fires its own electrical impulses. (some are inhibit), each neuron typically connected to thousands of other neurons.

<img src="neuron_cell.PNG" width=500px />

### Logical Computations with Neurons
McCulloch and Pitts proposed a very simple model also known as an *artificial neuron* : it has one or more binary(on/off) inputs and one binary output. the neuron activate when output has value more than a certain numbers of inputs are active.

just simple like : and, or, not logic
- and : outputs activate when input1 and input2 activate
- or : outputs activate when input1 or input2 activate

### The Perceptron

is one of the simplest ANN architectures, invented in 1957 by Frank Rosenblatt, slightly different from artificial neuron called a *threshold logic unit (TLU)* 

instead of binary on/off values, each input connect with a **weight** then sum of each input (Z = w1x1 + w2x2 + ... = wx.T), then applied step function to sum and outputs 

the result : $$h_w(X) = step(z)$$

the most common step function is the *Heaviside step juction* 

- heaviside(z) = 0 if z<0 else 1 (use threshold = 0)

computing the ouputs of a fully connected layer

$$h_w,_b(X) = Φ(XW + b)$$

- X is matrix of inputs features. 1 row per instance and 1 column per feature. ex ;(0.2, 0.3, 0.5, 1.2) of 1 instance have 4 feature.
- W is a Weight ; has 1 row per inputs neuron
- b is bias connect with Weight 
- the function Φ is called the *activation function* (step function in TLU)

Perceptron learning rule (weight upfdate)
- $$W_i,_j(nextstep) = W_i,_j + a(y_i - y_p)X_i$$

a is a learning rate

the decision boundary of each output neuron is linear, so Perceptrons are incapable with complex pattern (just like Logistic Regression Classifier)

However, if traing instaces ate linearly seperable this algorithm would converge to a solution. This is called the *Perceptron Convergence Theorem*.

for example with iris datasets.

In [6]:
import numpy as np
from sklearn.datasets import load_iris
from sklearn.linear_model import Perceptron

Iris = load_iris()
X = Iris['data'][:, (2,3)] # petal length and petal width
y = (Iris['target'] == 0)*1 # True or False to 1 or 0 (Iris setosa?)
X.shape, y.shape, X[:5], y[:5]

((150, 2),
 (150,),
 array([[1.4, 0.2],
        [1.4, 0.2],
        [1.3, 0.2],
        [1.5, 0.2],
        [1.4, 0.2]]),
 array([1, 1, 1, 1, 1]))

In [7]:
Perceptron_clf = Perceptron(max_iter=1000, tol=1e-3, random_state=42)
Perceptron_clf.fit(X, y)

y_pred = Perceptron_clf.predict([[2, 0.5]])
y_pred

array([1])

> Perceptrons do not
output a class probability; rather, they make predictions based on a hard
threshold.

<img src="percepton_sep.png" />

### The Multilayer Perceptron and Backpropagation