# The Perceptron

What are our learning objectives for this lesson?

* Learn about the Perceptron achitecture
* Learn how to train a Perceptron model 

Content used in this lesson is based upon information in the following sources:
* Marsland, Stephen. Machine Learning: An Algorithmic Perspective 2nd ed. (2015).

## Lab Tasks

1. Initialize weight vector $\vec{w}$ with random values
2. Implement an activation function
3. Implement an update function

### Initialize a weight vector

#### What is the dimensionality of this vector?

Let's first think through the dimensionality of our weight vector. How many entries should it have? Recall that the sum $h$ is given by

$$h = \sum_{i=1}^{m}w_i x_i $$

where $w_i$ is the $i^{th}$ entry of the weight vector $\vec{w}$ and $x_i$ is the $i^{th}$ entry of the input vector $\vec{x}$. This means that for every input value we will need a weight value that goes with it. In another word, if there are $n$ entries in our input vector $\vec{x}$, we will need a weight vector $\vec{w}$ of length $n$. 

Recall that we want to include an additional bias node in our model where the input is always a non-zero value. Since we will be alterning the weight $w_b$ associated with the bias node at the same time as all other nodes in our network, it might be convinient to dedicate a spot in our weight vector $\vec{w}$ that represents the weight $w_0$. Thus, for a problem with $n$ inputs, $\vec{w}$ might look something like this:

$$\vec{w}^T = [w_0, w_1, w_2, ..., w_n]$$

Notice that $\vec{w}$ has a length of $n+1$. 


#### What initial value do we assign the vector?

The initial value we assign the entries in our weight vector $\vec{w}$ matters. If we assign very large values, then $h$ will also be a very large value regardless of our input values (given that input values are not all zeros). If we assign extremely small values, we run into the opposite problem, where $h$ will also be a extremely small value regardless of the input values. We also want to avoid initializing it with all zeros. If you're interested in understanding why, please take a look at this [stackexcahnge response](https://datascience.stackexchange.com/questions/26134/initialize-perceptron-weights-with-zero) that gives a concise explanation. 

In conclusion, it is ideal to have weights that are neither too big or too small. For neural networks (which we will get to soon), The last desired quaility in our initialization values would be non-uniformity. We don't want to assign a single value across the board, and the way to achieve this is by using pseudorandom generators. 

In [1]:
import numpy as np
import math
import random

# define inputs
inputs = np.array([[0,0],[0,1],[1,0],[1,1]])
targets = np.array([[0],[1],[1],[1]])
#targets = np.array([0,1,1,1])

# capture how many nodes are needed by examining
# how many entries is in a set of input
n_in = np.shape(inputs)[1]
        
# how many values are we expecting as output?

# in this case, we want a single value as our output
n_out = 1
        
# η is the rate of which we learn from our data
# typically, 0.1 < η < 0.4
eta = 0.2

In [2]:
# TODO: insert a bias value of 1 into the inputs
# > done
def insert_bias(_inputs):
         
    return np.concatenate((_inputs,-np.ones((4,1))),axis=1)
    
    
inputs = insert_bias(inputs)
print(inputs)

[[ 0.  0. -1.]
 [ 0.  1. -1.]
 [ 1.  0. -1.]
 [ 1.  1. -1.]]


In [3]:
# TODO: initialize a weight vector

# how long does it have to be? 
# > number of inputs m (+ 1 for the bias) = 3
# > number of neurons n = 1
# > weight vector is m x n = 3 x 1
# Don't forget the bias node that we are including
# what initial values would you assign it?
# > just a random weight like all the others

# complete the initialization below

# > initialization complete

weights = np.random.rand(np.shape(inputs)[1],np.shape(targets)[1])*0.1-0.05

print(f'weight vector:\n{weights}')

weight vector:
[[0.04179439]
 [0.00580561]
 [0.03561558]]


### Implement an activation function

The activation of each neuron $j$ using activation function $g$ is given by:

$$y_j = g(\sum_{i=0}^m w_{ij}x_i)=\begin{cases}1 \quad \text{  if  }g(\sum_{i=0}^m w_{ij}x_i)>0 \\ 0 \quad \text{  if  } g(\sum_{i=0}^m w_{ij}x_i) \leq 0 \end{cases}$$

Implement an activation function according to the above equation. Here are several things to consider:
* What parameters do you need to pass to this function?
* How could you use numpy to compute $\vec{y}$ instead of writing a for loop to compute each $y_j$ individually? 
* What kind of object is this function going to return?


In [14]:
# TODO: Calculate the y_j and save it to an activation array
# > done
def calculate_activations(_inputs, _weights):
    
    activations = np.dot(inputs, weights)
    fire = np.where(activations>0,1,0)
    print(fire)
    return fire



### Update the weight vector

Let $y_j$ be the output of the $j^{th}$ neuron, and $t_j$ the target for that neuron, and $\eta$ (eta) the learning rate. 

$$w_{ij} \leftarrow w_{ij} - \eta(y_j - t_j) \cdot x_i$$

In [22]:
# TODO: adjust the weights based on the error between activation and target

def update_weights(_inputs, _targets, _activations, _weights):
    
    # complete this funtion
    # function complete
    t_inputs = np.transpose(_inputs)
    err = _activations -_targets
    aweights = _weights - eta*np.dot(t_inputs, err)
    
    return aweights
    

### Train the model

Now that you have imeplemented the ```calculate_activations``` function and ```update_weights``` function, we can start training the Perceptron model. There are 3 steps in training:
1. calculate the activation based on the current weights.
2. update the weight vector based on the gradient of the error for this time step.
3. repeat for $n$ iterations.



In [6]:
#np.shape(inputs)

#np.shape(weights)

#np.shape(activations)

np.shape(targets)

(4, 1)

In [7]:
iterations = 5

# TODO: implements the steps to be taken per iteration

for i in range(iterations):
    
    print(f"iteration {i}")
    activations = calculate_activations(inputs, weights)
    compare = activations == targets
    if(compare.all()):
        print("success!")
        print("the machine has learned!")
        print(activations)
        break
    else:
        weights = update_weights(inputs, targets, activations, weights)
        
    

iteration 0
iteration 1
iteration 2
success!
the machine has learned!
[[0]
 [1]
 [1]
 [1]]


### Test the model

Now we feed the trained model a test set to see how accurate the model is. Run the cell below to see the accuracy of the model. 

In [8]:
test = np.array([[1,0],[1,1],[0,1],[0,0]])
test_target = np.array([[1],[1],[1],[0]])

# insert bias into the test set
test = insert_bias(test)

# getting the activations array
activations = calculate_activations(test, weights)

# check the accuracy of your trained model
accuracy = 0.0
for i in range(len(test_target)):
    if test_target[i] == activations[i]: accuracy += 1        
print(f'accuracy: {(accuracy / len(test_target))*100.0}%')


accuracy: 50.0%


### Bonus Task

Train your model to learn the XOR operation. 

In [20]:
# initialize arrays:
# needs a 3rd dimension to be able to solve XOR
# 3rd dimension is 1 for [0,0] and 0 for the rest
# adds a Z-axis so a plane can separate them


xor_inputs = np.array([[0,0,1],[0,1,0],[1,0,0],[1,1,0]])
xor_targets = np.array([[0],[1],[1],[0]])

n_in = np.shape(xor_inputs)[1]

# insert the bias
xor_inputs = insert_bias(xor_inputs)

# fire up some weights
xor_weights = np.random.rand(np.shape(xor_inputs)[1],np.shape(xor_targets)[1])*0.1-0.05


iterations = 5

print("xor inputs")
print(xor_inputs)
print("xor_weights0")
print(xor_weights)

xor inputs
[[ 0.  0.  1. -1.]
 [ 0.  1.  0. -1.]
 [ 1.  0.  0. -1.]
 [ 1.  1.  0. -1.]]
xor_weights
[[-0.02152635]
 [-0.00469036]
 [ 0.02908308]
 [-0.04836709]]


In [29]:
for i in range(iterations):
    
    print(f"iteration {i}\n")
    print("xor_weights 0.5")
    print(xor_weights)
    xor_activations = calculate_activations(xor_inputs, xor_weights)
    print("xor_weights 1")
    print(xor_weights)
    compare = xor_activations == xor_targets
    if(compare.all()):
        print("success!")
        print("the machine has learned!")
        print(xor_activations)
        break
    else:
        print("xor_weights2")
        print(xor_weights)
        xor_weights = update_weights(xor_inputs, xor_targets, xor_activations, xor_weights)
        print(f"updated xor weights: {xor_weights}")

# For some reason it wont learn!!
# I redid it in python outside of
# the jupiter notebook and it
# works fine
# I'll attach that as another file
# labzor.py

iteration 0

xor_weights 0.5
[[-6.00021526e+02]
 [-6.00004690e+02]
 [ 2.90830825e-02]
 [ 5.99951633e+02]]
[[0]
 [1]
 [1]
 [1]]
xor_weights 1
[[-6.00021526e+02]
 [-6.00004690e+02]
 [ 2.90830825e-02]
 [ 5.99951633e+02]]
xor_weights2
[[-6.00021526e+02]
 [-6.00004690e+02]
 [ 2.90830825e-02]
 [ 5.99951633e+02]]
updated xor weights: [[-6.00221526e+02]
 [-6.00204690e+02]
 [ 2.90830825e-02]
 [ 6.00151633e+02]]
iteration 1

xor_weights 0.5
[[-6.00221526e+02]
 [-6.00204690e+02]
 [ 2.90830825e-02]
 [ 6.00151633e+02]]
[[0]
 [1]
 [1]
 [1]]
xor_weights 1
[[-6.00221526e+02]
 [-6.00204690e+02]
 [ 2.90830825e-02]
 [ 6.00151633e+02]]
xor_weights2
[[-6.00221526e+02]
 [-6.00204690e+02]
 [ 2.90830825e-02]
 [ 6.00151633e+02]]
updated xor weights: [[-6.00421526e+02]
 [-6.00404690e+02]
 [ 2.90830825e-02]
 [ 6.00351633e+02]]
iteration 2

xor_weights 0.5
[[-6.00421526e+02]
 [-6.00404690e+02]
 [ 2.90830825e-02]
 [ 6.00351633e+02]]
[[0]
 [1]
 [1]
 [1]]
xor_weights 1
[[-6.00421526e+02]
 [-6.00404690e+02]
 [ 2.9083