# BUILDING PERCEPTRON FROM SCRATCH

Perceptron is a simple neural network algorithm for predicting binary outputs. It is contingent on the input data being linearly seperable.

In this notebook we'll look at the underlying maths of this algorithm and see how it can be implemented in Python 3.

In [1]:
import numpy as np

## PART I - BREAKING IT DOWN

### Simulate data.

For now, we'll work with a single vector (row) consisting of three features (columns). We'll also specify output $y$ for this vector, this is the outcome variable we want to predict.

In [2]:
data = np.array([2, 1, 2])

y = 1 

### Initialize the weight vector with zeros.

In [3]:
weights = np.zeros(len(data))

weights

array([0., 0., 0.])

### Compute the dot product.

To generate a prediction, we first need to calculate the dot product. The dot product is the sum of the weight vector $w_i$ multiplied by the feature vector $x_i$.

$$ \sum^{n}_{i=1} w_i x_i $$

In our case this becomes:

$$ (0 \times 2) + (0 \times 1) + (0 \times 2) = 0 $$

In [4]:
dot = sum(weights * data)

dot

0.0

We can also use the numpy function ```np.dot()``` to do the same as above.

In [5]:
dot = np.dot(weights, data)

dot

0.0

### Activation function.

Next, we compare the dot product against an activation function treshold. This tells our model to predict class 1 if the dot product is larger than 1.0, otherwise predict class 0. We store this prediction in ```y_hat```.

In [6]:
treshold = 1 # Specify treshold


if dot > treshold:
    y_hat = 1
else:
    y_hat = 0


y_hat

0

### Update the weights.

If our prediction doesn't match the actual output, we update the weights as follow:

$$ w_{i}^{n+1} = w_{i}^{n} + \eta (y_i - \hat y_i) x_i $$

Where $\eta$ (eta) denotes learning rate.

In [7]:
eta = 0.1 # Specify learning rate

weights = weights + eta * (y - y_hat) * data # Update the weights

weights

array([0.2, 0.1, 0.2])

These updated weights will be used in the next iteration.

### Loss function.

To evaluate the performance of our model we'll compute the sum of squared errors (SSE) as follows:

$$ \frac{1}{2} \sum^{n}_{i=1} (y_i - \hat y_i)^2 $$

In [8]:
error = [(y - y_hat) ** 2]  # Square the difference of y - y_hat
sse_loss = 0.5 * sum(error) # Sum the squared errors and multiply by 1/2 

sse_loss

0.5

As the model iterates over the dataset, learning new weights each time, we'll see the loss minimize.

## PART II - PUTTING IT ALL TOGETHER

### Define the perceptron function.

We define the model function with the following parameters:

```data``` - Dataset, inclusive of features.  
```y``` - Outcome variable.  
```treshold``` - Activation function treshold.  
```eta``` - Learning rate.  
```total_epochs``` - Number of times we want the model to go over the entire dataset.

The function returns the final weight vector and displays the progression of the loss.

In [2]:
def perceptron(data, y, treshold, eta, total_epochs):

    current_epoch = 0                # Keep track of iterations
    y_hat_vector = np.zeros(len(y))  # Vector for storing predictions
    sse_loss = []                    # Vector for SSE
    
    weights = np.zeros(len(data[0])) # Initialize weights with zeros
    
    while current_epoch < total_epochs:
        
        for i in range(0, len(data)): # For each row in data
            
            # Compute the dot product
            dot = np.dot(weights, data[i]) 
            
            # Activation Function
            if dot > treshold:
                y_hat = 1
            else:
                y_hat = 0
                
            y_hat_vector[i] = y_hat # Update predictions
            
            # Update weights
            weights = weights + eta * (y[i] - y_hat) * data[i]
            
        # Compute SSE
        errors = (y - y_hat_vector) ** 2   # Compute the squared difference y - y_hat
        sse_loss.append(0.5 * sum(errors)) # Append the computed loss
        
        # Update epoch count
        current_epoch += 1 

    return print("Weights: ", weights, "SSE: ", sse_loss)

### Specify data, output vector, and model parameters.

The first column of the dataset (all $1$s) represents the bias term (intercept), the two remaining columns are features.

In [3]:
data = np.array([[1, 0, 0], 
                 [1, 0, 1],
                 [1, 1, 0], 
                 [1, 1, 1]])

y = np.array([1, 1, 1, 0])

treshold = 1
eta = 0.1
total_epochs = 15

### Train the model.

And finally we call the perceptron function.

In [4]:
perceptron(data, y, treshold, eta, total_epochs)

Weights:  [ 1.3 -0.2 -0.1] SSE:  [1.5, 1.5, 2.0, 1.0, 1.5, 1.5, 1.0, 1.5, 1.5, 1.0, 0.5, 0.0, 0.0, 0.0, 0.0]


SSE tells us that the model converged and learned the correct weights during the 11th epoch. 