* This is like the **Hello World** algorithm of the machine learning universe.  
* It may be one of the easiest of the lot to understand and use but it is by no means any less powerful.  
* Published in 1958 by Frank Rosenblatt, the **perceptron algorithm** gained much attention because of its guarantee to **find a separator in a separable data set.**  
* A perceptron is a function (or a simplified neuron to be precise) which takes a vector of real numbers as input and generates a real number as output.  
* Mathematically, a perceptron can be represented as:  
$$ y=f(w_{1}x_{1} + w_{2}x_{2}+ \cdots + w_{n}x_{n}+b)=f(w^{T}x+b)$$  
Where, $w_{1}, \cdots, w_{n}$ are weights, $b$ is a constant termed as bias, $x_{1},\cdots, x_{n}$ are inputs, and $y$ is the output of the function f, which is called the **activation function**.  

The algorithm is as follows:  
    1. Initialize weight vector $w$ and bias $b$ to **small random numbers**.  
    2. Calculate the output vector $y$ based on the function $f$ and vector $x$.  
    3. Update the weight vector $w$ and bias $b$ to counter the error.  
    4. Repeat steps 2 and 3 until there is no error or the error drops below a certain threshold.  
    
The algorithm tries to find a **separator** which divides the input into two classes by using a **labeled data set** called **the training data set** (the training data set corresponds to the experience E as stated in the definition for machine learning in the previous section). The algorithm starts by assigning **random weights** to the weight vector $w$ and the bias $b$. It then processes the input based on the function $f$ and gives a vector $y$. This generated output is then compared with the correct output value from the training data set and the updates are made to $w$ and $b$ respectively. To understand the weight update process, let us consider a point, say $p1$, with a correct output value of +1. Now, suppose if the perceptron misclassifies $p1$ as -1, it updates the weight $w$ and bias $b$ to move the perceptron by a small amount (movement is restricted by learning rate to prevent sudden jumps) in the direction of $p1$ in order to correctly classify it. The algorithm stops when it finds the correct separator or when the error in classifying the inputs drops below a certain user defined threshold.

> 30 random numbers between -1 and 1 which are uniformly distributed 

In [13]:
require(dplyr)

Loading required package: dplyr

Attaching package: ‘dplyr’

The following objects are masked from ‘package:stats’:

    filter, lag

The following objects are masked from ‘package:base’:

    intersect, setdiff, setequal, union



In [10]:
x1 = runif(30,-1,1) 
x2 = runif(30,-1,1) 

cat("x1:", x1, "\n\n")
cat("x2:", x2)

x1: -0.8867633 -0.4643468 -0.5607069 0.6440765 0.1876894 0.8524653 0.6543156 0.506631 -0.2453494 0.06905973 -0.5979175 0.5121985 -0.9524409 0.3144503 -0.4609409 0.9030271 -0.6397215 -0.8673726 0.7122474 -0.9353694 -0.9228383 -0.5022577 -0.03909001 0.8509461 -0.6569812 0.8400759 0.9756106 -0.8487911 -0.9956759 -0.1155026 

x2: -0.6283758 0.6614552 0.646591 -0.5244755 0.7338637 0.4299543 0.3861638 -0.8195267 0.3780531 -0.1928903 -0.3850876 0.5614653 0.250705 -0.09240213 -0.3902131 -0.325489 -0.23729 -0.1803061 0.6168502 -0.4926058 0.2699185 -0.2320396 -0.08200799 0.928431 -0.9271721 -0.7904208 -0.7331873 -0.2208686 0.5918803 -0.8736354

> form the input vector x 

In [14]:
x = cbind(x1,x2)
x %>% head()

x1,x2
-0.8867633,-0.6283758
-0.4643468,0.6614552
-0.5607069,0.646591
0.6440765,-0.5244755
0.1876894,0.7338637
0.8524653,0.4299543


> Now that we have the data, we need a function to classify it into one of the two categories.

In [15]:
#helper function to calculate distance from hyperplane
calculate_distance = function(x,w,b) { 
    sum(x*w) + b 
}

> linear classifier 

In [None]:
linear_classifier = function(x, w, b) { 
    distances = apply(x, 1, calculate_distance, w, b) 
    return(ifelse(distances < 0, -1, +1)) 
}