# DATA 4319: Statistical & Machine Learning 

## The Perceptron Learning Model (Classical Version) with Python
In this notebook we will implement the perceptron learning model in order to classify data from the [iris data set](https://en.wikipedia.org/wiki/Iris_flower_data_set). Our task is to predict the species of flower based off of measurements of sepeal length and width. This task is often referred to as the ''Hello World'' of machine learning.

You will need to import the following packages:
 * numpy [documentation](http://www.numpy.org)
 * pandas [documentation](https://pandas.pydata.org)

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# Read the desired dataset using pd.read_csv(dataset); note that this reads as a dataframe
df = pd.read_csv('iris_data.csv')
print(df)

     SepalLength  SepalWidth  PetalLength  PetalWidth    Species
0            5.1         3.5          1.4         0.2     setosa
1            4.9         3.0          1.4         0.2     setosa
2            4.7         3.2          1.3         0.2     setosa
3            4.6         3.1          1.5         0.2     setosa
4            5.0         3.6          1.4         0.2     setosa
5            5.4         3.9          1.7         0.4     setosa
6            4.6         3.4          1.4         0.3     setosa
7            5.0         3.4          1.5         0.2     setosa
8            4.4         2.9          1.4         0.2     setosa
9            4.9         3.1          1.5         0.1     setosa
10           5.4         3.7          1.5         0.2     setosa
11           4.8         3.4          1.6         0.2     setosa
12           4.8         3.0          1.4         0.1     setosa
13           4.3         3.0          1.1         0.1     setosa
14           5.8         

In [2]:
# We save the first two entries from the iris data set 
X = [np.array([1.0, df['SepalLength'][i], df['SepalWidth'][i]]) for i in range(101)]

# Convert the species label to a numeric valua
make_int = lambda label: 1 if label == 'setosa' else -1
Y = [make_int(df['Species'][i]) for i in range(101)]

In [8]:
# Set perceptron hypothesis: h(x) = sign(w^T*x)
def h(weight_vector, data_vector):
    """ Sign Function.
    
    Keyword arguments:
        weights     -- real valued numpy array
        data_vector -- real valued numpy array
        
    Output:
        x in {-1, 1}
    
    """
    if weight_vector @ data_vector > 0:
        return 1
    else:
        return -1

In [4]:
def PLA(input_data, input_labels, iterations = 1000):
    """ Perceptron Learning Algorithm.
    
    Keyword arguments:
        input_data   -- list of real valued data points stored as numpy arrays
        input_labels -- list of elements from {1, -1} 
        iterations   -- number of iterations of the perceptron update rule (default 1000)
        
    Output:
        weights      -- three dimensional weight vector stored as a numpy array 
    
    """
    
    weights = np.random.rand(3)
    number_of_data_entries = len(input_labels)
    
    for _ in range(iterations):
        i = np.random.randint(number_of_data_entries)
        if h(weights, input_data[i]) != input_labels[i]:
            weights += input_labels[i]*input_data[i]
    
    return weights

In [5]:
# Iterate the perceptron learning algorithm 1000 times 
w = PLA(X, Y, 1000)

In [6]:
def predict(w, i):
    if h(w, X[i]) == 1:
        return 'Setosa'
    else:
        return 'Versicolor'

In [7]:
predict(w, 55)

'Versicolor'