# Implementing a Perceptron Classifier from Scratch

After having seen several classifiers it's our turn to implement one of them by ourselves. A perceptron is a good choice to start implementing it from scratch because it is easy to understand. Implementing algorithms from scratch allows us to gain a deeper understanding of what actually happens inside the black box.

Let's start by thinking about a perceptron as a function that maps an input `X` to an output `y`. In most of the cases, `X` is a set of features $x_1$, $x_2$, ... $x_n$ put together. For instance, in image processing our features are pixel values, while in text processing some characteristics like uppercased words or part-of-speech tags (the word is a verb, or a noun) can be included as features. In order to give you an easy start, we are going to create our own data at first, and at the end you will be able to play with a bigger dataset.

Let's display some random features in a table and create the data for them:

Features and labels:

 ![table](../data/table.png)


In [None]:
import numpy as np
from matplotlib import pyplot as plt
%matplotlib inline

In [None]:
#Define your x-matrix and your y-vector

x = np.array([[2., 5.],
              [0., 2.],
              [2., 3.],
              [5., 1.]])
# Labels 
y = [1, 1, 0, 0]
   
print("x=", x)
print("y=", y)

Now, let's take a look at our function that maps its input vector $x$ to an output real value $f(x)$

![funct](../data/funct.png)

where $\mathbf{w}$ is a vector of real-valued weights, $\mathbf {w} \cdot \mathbf {x}$ is the dot product $\sum _{i=1}^{m}w_{i}x_{i}$.

After we are provided the x and y, we need to find a set of weights that can be multiplied with the features x. The result of multiplication would help us to classify the input data.

The algorithm for perceptron:
![funct](../data/perceptron-alg.png)

## Build the function $f(x)$ 
*** Beware the common mix-up of indices: `i` for the i-th feature or the i-th row in the data set!***

In [None]:
def f(w, x, threshold):
    '''
    w : weight vector
    x : feature vector
    threshold : the threshold value
    y_hat : return value
    
    '''
    ## your code, 1 line ##
    pass

In [None]:
# test
w_tmp = np.random.randn(2)
x_tmp = x[0]
threshold_tmp = 0

print ("w_tmp", w_tmp)
print ("x_tmp", x_tmp)

print ("result of f(x_tmp) :", f(w_tmp, x_tmp, threshold_tmp))

## Training the perceptron
It's time to build your perceptron function, remember:

- Generate a weight vector $w$ and a prediction vector $\widehat{y}$ 
- Initialize $w$

Now, let's start our iterations!
- Assign a value to $\widehat{y}$
- Compare $\widehat{y}$ and $y$
- Update weights $w$

In [None]:
# This function trains a perceptron and returns 
# a weight vector and a prediction vector
def train_perceptron(x, y, threshold, alpha, epochs): 
    
    # Write here your contribution
    pass
          
    return w, yhat_vec

In [None]:
# Define your parameters
threshold = 0.0     
alpha = 0.1   
epochs = 50   

In [None]:
# Train perceptron
w, y_train = train_perceptron(x, y, threshold, alpha, epochs)
print("Here are my weights:", w)

We also want to plot our classification. We can use our weights to plot our decision boundary.

Let's consider the function $f(x)= w*x$. Please notice, that this example is deliberately designed to work even without considering a bias, we'll use that in the later example. Now, let's take a look of what happens inside this elementwise multiplication: $f(x)= w_1*x_1+w_2*x_2$.
In order to figure out our decision boundary, we need to find at least two point that match exactly our line. 

- We also need the minimum and maximum coordinates in $x_1$ of all our vector points. 

- Now, let's set our previous equation to zero and solve regarding $x_2$. This will deliver the coordinates  $x_2$ of the minima and maxima in our hyperplane. 

- A Plot of a line between those two vectors would display our decision boundary


### $x_2=\frac{-w_1*x_1}{w_2}$


In [None]:
# Find min and max for x_1
p1_x = np.min(x[:,0])
p2_x = np.max(x[:,0])

# Find coordinates in x_2
p1_y = -w[0]*p1_x/w[1]
p2_y = -w[0]*p2_x/w[1]

In [None]:
# Plot
classification = np.linspace(1,len(w),len(w)) 
fig1, ax1 = plt.subplots()
ax1.scatter(x[:,0], x[:,1], edgecolors=(1, 0, 0))
ax1.plot([p1_x, p2_x], [p1_y,p2_y], lw=2)
plt.show()

## What's next?

We need to test if those weights actually work, let's write a test function for the perceptron. 

**Remember!**

Here we do not iterate, and do not update weights, we just need a prediction. This means only the $\widehat{y}$-vector

In [None]:
# This function evaluates resulting weight form our perceptron
# It only returns a prediction vector
def test_perceptron(w, x, threshold):
    
    # Write here your contribution
    pass
        
    return yhat_vec

Almost there! We just need to feed some test our perceptron.

In [None]:
# test data
x_test = np.array([[1., 4.],
              [1., 2.],
              [4., 2.],
              [3., 1.]])
# Labels 
y_test = [1, 1, 0, 0]

y_pred = test_perceptron(w, x_test, threshold)
print("Predicted y =", y_pred)


It's time to compare the prediction delivered with our gold labels. Let's compute the accuracy of the model and see how does the perceptron work on unseen data.

In [None]:
# Calculate accuracy

Wow! We made it! It looks like a peceptron can classify very good our data. What about creating some more data and testing it again? For that, we can use the function `multivariate_normal` provided in `numpy.random`, this function takes as input a vector as *mean* and a matrix as *covariance matrix* and delivers a lot of random samples around the mean. The covariance matrix determines the magnitude and the main axes of the samples' distribution.

In [None]:
samples = 500

class_1 = np.random.multivariate_normal([6,-6], [[1.,.5],[.5,1.]], samples)
class_2 = np.random.multivariate_normal([2, 2], [[1.,.5],[.5,1.]], samples)

x_data = np.vstack((class_1, class_2))


Let's see how these samples look like:

In [None]:
fig2, ax2 = plt.subplots()
ax2.scatter([class_1[:,0]], [class_1[:,1]], c="b")
ax2.scatter([class_2[:,0]], [class_2[:,1]], c="r")
plt.show()

## What's next?

Let's create the labels include a bias and merge all data together. The bias is just a value which is constant for all data points.

In [None]:
# Create labels
labels_1 = np.ones((samples,1))
labels_2 = np.zeros((samples,1))

# Create bias and merge everything together
bias = np.ones((samples*2, 1)).astype(np.float32)
labels = np.vstack((labels_1, labels_2))
features = np.concatenate((class_1, class_2), axis=0)
biased = np.hstack((bias, features))
dataset = np.hstack((biased, labels))

# Shuffle data and apply a 80/20 split
np.random.shuffle(dataset)
split = int(len(dataset[:,0]) * 0.8)
train = dataset[:split,:]
test = dataset[split:,:]

Define your parameters and train again to obtain the new weights.

In [None]:
# Defining parameters

# Write here your contribution


w, train_y = train_perceptron(train_data, train_labels, threshold, alpha, epochs)

print("trained weight:", w)

Let's just plot again our decision boundary, remember, that this time we have a bias, so our formula looks a bit different:   
## $x_1=\frac{-w_0-w_1*x_0}{w_2}$

In [None]:
# Find min and max, calculate x_1 points and 
# plot a the decision boundary as we did previosly

# Write here your contribution


And what about accuracy? If this graph is correct, we have an accuracy of 100% Yay! 

In [None]:
pred_y = test_perceptron(w, test_data, threshold)
pred_y = np.reshape(pred_y,(len(pred_y),1))

In [None]:
# Calculate accuracy