# The perceptron

Perceptrons, invented in the [1957]( https://en.wikipedia.org/wiki/Perceptron#History) by Frank Rosenblatt, are the simpliest form of feedforward networks. They are [linear classifiers](https://en.wikipedia.org/wiki/Linear_classifier) because they find a linear function to predict if a piece of data belongs to a determined class or not. A perceptron is basically formed by a layer of input units and a layer of output units. In the simpliest case the output layer is formed by just one unit:

![](perceptron.png)

<div>Table of contents</div>
<div id="toc"></div>

### Spreading  of activations
At each timestep of the simulation the first layer of units $\mathbf{x}=[x_0,\dots,x_n]$ is filled up with an input vector $\mathbf{p}_k=[p_{k0},\dots,p_{kn}]$. One unit of this layer, the bias, is not included in this update. instead it is permanently set to 1.
Each connection from a unit of the first layer to the output unit has a weight that is initially set to 0. The activation of the output unit is given by the weighted sum of the input units plus the weighted bias:
$$
y = \sum_{i=1}^{n+1} w_i x_i + w_0
$$
In python you can write:

```python
y = w[0]
for i in xrange(n): 
    y += w[i]*x[i]
```
Using linear algebra we can rewrite it in a shorter form as:
$$
\mathbf{y} = \mathbf{W}\mathbf{\tilde{x}}
$$
where $\mathbf{\tilde{x}} = (1, x_0, \dots, x_n)$ and $\mathbf{W}\mathbf{\tilde{x}}$ is is the [dot product](https://en.wikipedia.org/wiki/Dot_product#Algebraic_definition), a linear algebra operator that allows to calculate the  weighted sum at once.
In python it becomes:

```python
tx = hstack([1, x])
y = dot(w,tx)
```
*Using linear algebra in a neural network implementation is far simpler than writing loops, it is less error prone and also produces a much efficient code in terms of speed!!*

### Learning
Learning consists in updating the weights so that the weighted sum $y$ is more and more similar to a desired output $o_k$ when we give the input $p_k$ to the network.
In perceptrons learning can be done online, meaning that we can update the weights af
In practice learning is given at each timestep by:
$$
\Delta w_i = \eta (o_k - y)\tilde{x}_i
$$
or, in linear algebra notation:
$$
\Delta \mathbf{w} = \eta(o_k - y)\mathbf{\tilde{x}} 
$$
where $\eta$ is a value determining the rate of weight change per timestep (it is typically very little) and $o_k - y$ is the error in reproducing the desired value.
In python we write:
```python
w += eta*(o - y)*tx
```

### A simple simulation

Let us start by implementing a very simple network. 
Our input 

In [89]:
%matplotlib inline
from pylab import *

In [90]:
# constants
n = 2         # number of input elements
np = 2        # unmber of input patterns
eta = 0.01    # learning rate
stime = 1000  # number of timesteps

# each row is an input pattern
P = array([ [1.0, 0.0], [0.0, 1.0] ])
# each element is the desired output 
# relative to an input pattern
o = array([0.5, 1.0])

#initialize weights
w = zeros(n+1)

for t in xrange(stime) :
    
    # reiterate the input pattern 
    # sequence through timesteps
    k = t%np
    # bias-plus-input vector
    tx = hstack([1,P[k]])
    
    # weighted sum - dot product
    y = dot(w,tx)
    
    # learning
    w += eta*(o[k] - y)*tx
    
    # print weights every 100th timestep
    # (the '*' operator unpacks a container 
    # into its elements)    
    if t % 100 == 0:
        print "w = [ {:4.2f}   {:4.2f} {:4.2f} ] - " \
            "timestep = {:4.0f}" \
            .format( *hstack([w.round(2), t]) )

print

# print tests on the two input patterns
for k in xrange(np) :
    print 
    tx = hstack([1,P[k]])
    y = dot(w,tx)
    print "x = [ {:4.2f}   {:4.2f} {:4.2f} ]" \
        .format( *tx.round(2) )
    print "w = [ {:4.2f}   {:4.2f} {:4.2f} ]" \
        .format( *w.round(2) )
    print "y =          {:4.2f} ".format( y.round(2) )




w = [ 0.00   0.00 0.00 ] - timestep =    0
w = [ 0.39   0.10 0.29 ] - timestep =  100
w = [ 0.48   0.08 0.40 ] - timestep =  200
w = [ 0.49   0.05 0.44 ] - timestep =  300
w = [ 0.50   0.03 0.47 ] - timestep =  400
w = [ 0.50   0.02 0.48 ] - timestep =  500
w = [ 0.50   0.01 0.49 ] - timestep =  600
w = [ 0.50   0.01 0.49 ] - timestep =  700
w = [ 0.50   0.00 0.50 ] - timestep =  800
w = [ 0.50   0.00 0.50 ] - timestep =  900


x = [ 1.00   1.00 0.00 ]
w = [ 0.50   0.00 0.50 ]
y =          0.50 

x = [ 1.00   0.00 1.00 ]
w = [ 0.50   0.00 0.50 ]
y =          1.00 


### What's behind the learning rule
Intuitively we see that the change of a weight depends on the error in the activation of the output unit *and* on the activation of the input unit that is connected to the output through that weight.

### A real-world simulation
#### Init the dataset
First we initialize the dataset:


In [91]:
#### download the dataset 
# get the script from internet
! wget https://raw.githubusercontent.com/sorki/python-mnist/master/get_data.sh > /dev/null 2>&1  
# run it to dovnload all files in a local dir named 'data'
! bash get_data.sh >/dev/null 2>&1
# we do not need the script anymore, remove it
! rm get_data.sh* > /dev/null 2>&1

# initialize the dataset variables

%run utils

In [92]:
# set the number of patterns 
n_patterns = 40

# take 'n_patterns' rows
patterns = array(mndata.train_images)[:n_patterns]
# we rescale all patterns between 0 and 1
patterns = sign(patterns/255.0)
labels = array(mndata.train_labels)[:n_patterns]

In [93]:
trials = 500
eta = 0.01
m = 10
w = zeros([m, n+1])
x = zeros(n)
y_target = zeros(m)
y = zeros(m)

windows = 50

for t in xrange(trials) :
    pattern_index = t%n_patterns
    x = hstack([ 1, patterns[pattern_index] ])
    y_target *= 0
    y_target[labels[pattern_index]] = 1
    
    y = dot(w,x) 
    w += eta*outer(y_target - y, x);

    if t < 100 or t >=trials-100:
        win_count = t%windows

        if win_count == 0:
            fig = figure(figsize = (15, 4))
        
        ax1 = fig.add_subplot(2,windows,win_count + 1)
        ax1.imshow(to_mat(x[1:]), interpolation = 'none', 
               aspect = 'auto', cmap = cm.binary )
        ax1.set_axis_off()
        ax2 = fig.add_subplot(2,windows,win_count+windows+1)
        m = vstack([y_target, y]).T
        ax2.imshow(m, interpolation = 'none', 
               aspect = 'auto', cmap = cm.copper,
                   vmin=0, vmax=1)
        ax2.set_axis_off()
        fig.canvas
        
        
        
        
        .draw()
         
        if win_count == windows - 1 :
            show()

SyntaxError: invalid syntax (<ipython-input-93-0565ef5efc10>, line 41)

<br><br><br><br><br><br><br><br><br><br><br><br><br><br>
The next cell is just for styling

In [95]:
from IPython.core.display import HTML
def css_styling():
    styles = open("../style/ipybn.css", "r").read()
    return HTML(styles)
css_styling()