## Perceptrons

*(Credit: Leon Derczynski, IT University of Copenhagen)*

Let's build a little perceptron! It'll be on its own, which means it can only really do linearly separable problems. But that's OK; it'll try as hard as it can.

First, we'll set our version of Python so other coders (and shell interpreters) can see what we're doing, and import two handy things: a random numbers module; and some extensions that help with many different kinds of numerical math. You might even say mathS, in fact. Together they're called numpy, pronounced Numb Pie. We will also add a module for plotting our results.

In [1]:
#!/usr/bin/env python3
from random import choice                 # we will use this to choose random example when training
from numpy import array, dot, random      # numpy will help us when manipulating vectors
import matplotlib.pyplot as plt           # to plot our results

Next, we'll define our training data. The format is thus, for each example:
* an array containng the two input features, often together called *X*, followed with a bias value, which will be 1
* the output label, *y*

In our first case, we will model a boolean function, AND. In the AND function, if both of the two inputs are 1 (true), then the output will also be 1 (true):

In [None]:
training_data = [
	(array([0,0,1]), 0),
	(array([0,1,1]), 0),
	(array([1,0,1]), 0),
	(array([1,1,1]), 1),
	]

Next, we'll set up our activation function. This is used to decide whether the output of our perceptron should be 1 or 0. We will use a very simple unit step function. If the result of our perceptron is less than zero, we will output zero. Otherwise, we will output 1.

In [None]:
# Activation function. If the input is less than zero,
# returns zero, otherwise return 1. This is a unit step
# activation function.
def activation(x):
  if x < 0:
    return 0
  else:
    return 1

Finally come the parameters that define how our perceptron behaves. We refer to these as hyperparameters, to distinguish them from the weights, which are sometimes called the parameters.

In [None]:
epochs = 100              # the number of training iterations, which we call epochs
learning_rate = 0.8       # the learning rate - scales how much we update on each iteration

Next, we will make a little list where we'll keep track of how well training has gone. Oh, and while we're at it, let's initialise the weights too.

In [None]:
errors = []               # an array to store the errors in
weights = random.rand(3)  # an array of initial weights

So next, we have the training process. At each epoch (iteration, step), we randomly select a training example. With that example, we work out the [dot product](https://www.mathsisfun.com/algebra/vectors-dot-product.html) of the features and our current weights. This gives us the activation potential - how much our neuron is trying to fire.

Next, we put this through our activation function to see what our neuron really does, and compare that to what the answer should be, for this example. The difference is our error; how far wrong were we? We'll store that error so we can view them later.

In the mean time, we'll update our weights, so they become closer to where they should have been. i.e. we try to reduce the error to zero. The learning rate scales how big that update is. Here's the code.

In [None]:
# Do this for every epoch
for i in range(epochs):

	# Choose a random training example, setting the variabel x to
	# be the input vector, and the variable expected to be the expected output
	x, expected = choice(training_data)

  # Find the dot product of the current weights and the inputs
	result = dot(weights, x)

  # Run the result through out activation function to get the output
	output = activation(result)

  # Find the error: how much this output differs from the expected output
	error = expected - output

	# Add the errors to the error list, so we can display it later
	errors.append(error)

  # Adjust the weights
	weights += learning_rate * error * x

So, how did we do? Let's go through the examples in the training set, and fire our weighted perceptron - using the learning weights, $w$ - for each eaxmple.

In [None]:
# Go through all of our data, and see what the perceptron
# outputs for it, now it is trained
for x, _ in training_data:
  result = dot(x, weights)
  output = activation(result)
  print("{}: {} -> {}".format(x[:2], result, output))


How does it look? Did we nail it?

Finally, let's print a graph of those errors, to see how the process went.

In [None]:
plt.ylim([-1,1])
plt.plot(errors)

### Exercises for you to try

Try adapting the training data array to model these functions, running again for each one:
* OR  - output is 1 if either or both inputs are 1
* NAND -  output is 1 if and only if both inputs are 0 (not and)
* XOR  -  ouput is 1 if only one of the inputs is 1, not if both or none are 1 (exclusive or)

What did you find? Did they all work? If not, why not?