# Implementing a Perceptron Classifier from Scratch

After having seen several classifiers it's our turn to implement one of them by ourselves. A perceptron is a good choice to start implementing it from scratch because it is easy to understand. Defining algorithms from scratch allows us to gain a deeper understanding of what actually happens inside the black box.

Let's start by thinking about a perceptron as a function that maps an input `X` to an output `y`. In most of the cases, `X` is a set of features $x_1$, $x_2$, ... $x_n$ put together. For instance, in an image, our features are pixel values and in text processing, some characteristics like uppercased words, or part-of-speech tags (the word is a verb, or a noun) can be included as features. In order to do this exercise simple to understand, we'll create our own data and at the end, you will be able to play with a bigger dataset.

Let's display some random features in a table and create the data for them:

Features and labels:

 ![table](../data/table.png)


In [None]:
import numpy as np
from matplotlib import pyplot as plt
%matplotlib inline

In [None]:
#Define your x-matrix and your y-vector

x = np.array([[2., 5.],
              [0., 2.],
              [2., 3.],
              [5., 1.]])
# Labels 
y = np.zeros(4)
y[0] = 1
y[1] = 1
   
print(x, y)

Now, let's take a look at our function:

![funct](../data/funct.png)

We need to find a set of weights, that multiplied with the feature vectors return a value that would help us to classify the input data.

**How?**
- Define threshold, learning rate and number of iterations

In [None]:
# Define parameters
threshold = 0.0     
alpha = 0.1   
epochs = 50   

Last step: we need to initialize our weight vector

## Training the perceptron
It's time to build your perceptron function, remember:

- Generate a weight vector $w$ and a prediction vector $\widehat{y}$ 
- Initialize $w$

And let's start our iterations!

- Define your function $f$
- Assign a value to $\widehat{y}$
- Compare $\widehat{y}$ and $y$
- Update weights $w$

In [None]:
def train_perceptron(x, y, threshold, alpha, epochs): 
    # Define your weight vector and prediction vector
    w = np.zeros(len(x[0]))
    
    yhat_vec = np.ones(len(y))  
    
    
    for iteration in range(epochs):   
        converge = True
        for i in range(0, len(x)):                 

            # Elementwise multiplication step
            f = np.dot(x[i], w)                      

            # prediction
            if f > threshold:                               
                yhat = 1.                               
            else:                                   
                yhat = 0.
            yhat_vec[i] = yhat                              

            # weights update
            if yhat != y[i]:
                w = w + alpha*(y[i]-yhat)*x[i]
                converge = False
        if converge == True:
            break


            iteration += 1             
    return w, yhat_vec

In [None]:
w, y_train = train_perceptron(x, y, threshold, alpha, epochs)
print("Here are my weights:", w)

We also want to plot our classification. We can use our weights to plot our decision boundary.

Let's consider the function $f(x)= w*x$. Please notice, that this example is deliberately designed to work even without considering a bias, we'll use that in the later example. Now, let's take a look of what happens inside this elementwise multiplication: $f(x)= w_0*x_0+w_1*x1$.
In order to figure out our decision boundary, we need to find at least two point that match exactly our line. 

- We also need the minimum and maximum coordinates in $x_0$ of all our vector points. 

- Now, let's set our previous equation to zero and solve regarding $x_1$. This will deliver the coordinates  $x_1$ of the minima and maxima in our hyperplane. 

- A Plot of a line between those two vectors would display our decision boundary


### $x_1=\frac{-w_0*x_0}{w_1}$


In [None]:
# Find min and max for x_0
p1_x = np.min(x[:,0])
p2_x = np.max(x[:,0])

# Find coordinates in x_1
p1_y = -w[0]*p1_x/w[1]
p2_y = -w[0]*p2_x/w[1]

In [None]:
# Plot
classification = np.linspace(1,len(w),len(w)) 
fig1, ax1 = plt.subplots()
ax1.scatter(x[:,0], x[:,1], edgecolors=(1, 0, 0))
ax1.plot([p1_x, p2_x], [p1_y,p2_y], lw=2)
plt.show()

## What's next?

We need to test if those weights actually work, let's write a test function for the perceptron. 

**Remember!**

Here we do not iterate, and do not update weights, we just need a prediction. This means only the $\widehat{y}$-vector

In [None]:
def test_perceptron(x, threshold, alpha, w):
    yhat_vec = [] 
    
    for i in range(0, len(x)):              
        
        f = np.dot(x[i], w) 
        
        if f > threshold:                               
            yhat = 1.                               
        else:                                   
            yhat = 0.
        
        yhat_vec.append(yhat)
        
    return yhat_vec

Almost there! We just need to feed some test our perceptron.

In [None]:
x = np.array([[1., 4.],
              [1., 2.],
              [4., 2.],
              [3., 1.]])
# Labels 
test_y = np.zeros(4)
test_y[0] = 1
test_y[1] = 1

pred_y = test_perceptron(x, threshold, alpha, w)
print(pred_y)


It's time to compare the prediction delivered with our gold labels. Let's compute the accuracy of the model and see how does the perceptron work on unseen data.

In [None]:
accuracy = np.sum(np.equal(test_y, pred_y))/len(test_y)
print(accuracy)

Wow! We made it! It looks like a peceptron can classify very good our data. What about creating some more data and testing it again? For that, we can use the function `multivariate_normal` provided in `numpy.random`, this function takes as input a vector as *mean* and a matrix as *covariance matrix* and delivers a lot of random samples around the mean. The covariance matrix determines the magnitude and the main axes of the samples' distribution.

In [None]:
samples = 500

class_1 = np.random.multivariate_normal([6,-6], [[1.,.5],[.5,1.]], samples)
class_2 = np.random.multivariate_normal([2, 2], [[1.,.5],[.5,1.]], samples)

x_data = np.vstack((class_1, class_2))


Let's see how these samples look like:

In [None]:
fig2, ax2 = plt.subplots()
ax2.scatter([class_1[:,0]], [class_1[:,1]], c="b")
ax2.scatter([class_2[:,0]], [class_2[:,1]], c="r")
plt.show()

## What's next?

Let's create the labels include a bias and merge all data together. The bias is just a value which is constant for all data points.

In [None]:
# Create labels
labels_1 = np.ones((samples,1))
labels_2 = np.zeros((samples,1))

# Create bias and merge everything together
bias = np.ones((samples*2, 1)).astype(np.float32)
labels = np.vstack((labels_1, labels_2))
features = np.concatenate((class_1, class_2), axis=0)
biased = np.hstack((bias, features))
dataset = np.hstack((biased, labels))

# Shuffle data and apply a 80/20 split
np.random.shuffle(dataset)
split = int(len(dataset[:,0]) * 0.8)
train = dataset[:split,:]
test = dataset[split:,:]

Define your parameters and train again to obtain the new weights.

In [None]:
# Defining parameters
epochs = 1000
alpha = 0.01
threshold = 0
train_data = train[:,0:3]
train_labels = train[:,3]

test_data = test[:,0:3]
test_labels = test[:,3:]

w, train_y = train_perceptron(train_data, train_labels, threshold, alpha, epochs)

print(w)

Let's just plot again our decision boundary, remember, that this time we have a bias, so our formula looks a bit different:   
## $x_1=\frac{-w_0-w_1*x_0}{w_2}$

In [None]:
p1_x = np.min(train_data[:,1])
p2_x = np.max(train_data[:,1])

p1_y = (threshold-w[0]-w[1]*p1_x)/w[2]
p2_y = (threshold-w[0]-w[1]*p2_x)/w[2]


fig3, ax3 = plt.subplots()
ax3.scatter(train_data[:,1], train_data[:,2], c="b")
ax3.plot([p1_x, p2_x], [p1_y, p2_y])
plt.show()


And what about accuracy? If this graph is correct, we have an accuracy of 100% Yay! 

In [None]:
pred_y = test_perceptron(test_data, threshold, alpha, w)
pred_y = np.reshape(pred_y,(len(pred_y),1))
accuracy = np.sum(np.equal(test_labels, pred_y))/len(pred_y)

print(accuracy)