In [1]:
import numpy as np 
import pandas as pd

def perceptron_learning_algorithm(X, y, w):
    misclassified_examples = predict(hypothesis, X, y, w) 
    while misclassified_examples.any(): # execute if any misclassified
        x, expected_y = pick_one_from(misclassified_examples, X, y)
        w = w + x * expected_y  # update rule
        misclassified_examples = predict(hypothesis, X, y, w) 
    return w

The **perceptron_learning_algorithm** uses several functions. The **hypothesis** function is just 'h(x)' written in Python code; as we saw before, it is the function that returns the label $y_{i}$ predicted for an example $x_{i}$ when classifying with the hyperplane defined by **w**. The **predict** function applies the hypothesis for every example and returns the ones that are misclassified. 

In [2]:
def hypothesis(x, w):
    return np.sign(np.dot(w, x)) # return 1,-1


# Make predictions on all data points 
# and return the ones that are misclassified.
def predict(hypothesis_function, X, y, w):
    predictions = np.apply_along_axis(hypothesis_function, 1, X, w)
    misclassified = X[y != predictions]
    return misclassified 

In [3]:
# Pick one misclassified example randomly 
# and return it with its expected label. 
def pick_one_from(misclassified_examples, X, y):
    np.random.shuffle(misclassified_examples)
    x = misclassified_examples[0]
    index = np.where(np.all(X == x, axis=1))
    return x, y[int(index[0])] 

In [17]:
x = np.array([[8,1,7], [4,-3,9], [-5,-2,6]])
w = np.array([4, 5, 3]) # slope values of 3 features
expected_y = [1,1,1] 
print('It returns weight: ')
perceptron_learning_algorithm(x,expected_y,w)
        

It returns weight: 


array([-1,  3,  9])

Once we have made predictions with **predict**, we know which examples are misclassified, so we use the function **pick_one_from** to select one of them randomly 

We then arrive at the heart of the algorithm: the update rule. For now, just remember that it changes the value of 'w'. Why it does this will be explained in detail later. We once again use the **predict** function, but this time, we give it the updated 'w'. It allows us to see if we have classified all data points correctly, or if we need to repeat the process until we do.



Also note that sometimes updating the value of 'w' for a particular example 'x' changes the hyperplane in such a way that another example 'x*' previously correctly classified becomes misclassified. So, the hypothesis might become worse at classifying after being updated. which shows us the number of classified examples at each iteration step. One way to avoid this problem is to keep a record of the value of  before making the update and use the updated  only if it reduces the number of misclassified examples. This modification of the PLA is known as the **Pocket algorithm** (because we keep  in our pocket). 

# Understanding the limitations of the PLA 

One thing to understand about the PLA algorithm is that because weights are randomly initialized and misclassified examples are randomly chosen, it is possible the algorithm will return a different hyperplane each time we run it.

To train a model, we pick a sample of existing data and call it the training set. We train the model, and it comes up with a hypothesis (a hyperplane in our case). We can measure how well the hypothesis performs on the training set: we call this the in-sample error (also called training error). Once we are satisfied with the hypothesis, we decide to use it on unseen data (the test set) to see if it indeed learned something. We measure how well the hypothesis performs on the test set, and we call this the out-of-sample error (also called the generalization error). 
 
Our goal is to have the smallest out-of-sample error. 

When using the Perceptron with a linearly separable dataset, we have the guarantee of finding a hypothesis with zero in-sample error, but we have no guarantee about how well it will generalize to unseen data (if an algorithm generalizes well, its out-of-sample error will be close to its in-sample error).  