# DATA 4319: Statistical & Machine Learning 

## Lecture 1: The Perceptron Learning Model (Classical Version)
In this notebook we will implement the perceptron learning model in order to classify data from the [iris data set](https://en.wikipedia.org/wiki/Iris_flower_data_set). Our task is to predict the species of flower based off of measurements of sepeal length and width. This task is often referred to as the ''Hello World'' of machine learning.

You will need to add the following packages:
 * CSV [documentation](https://juliadata.github.io/CSV.jl/stable/)
 * Plots [documentation](http://docs.juliaplots.org/latest/)
 

In [11]:

using CSV
""" Provided you have a saved and valid .csv file in your current working directory, you may 
    load this file as a Dataframe using the following syntax. 
"""
iris = CSV.read("iris_data.csv")
iris = iris[1:100,1:5]

Unnamed: 0_level_0,SepalLength,SepalWidth,PetalLength,PetalWidth,Species
Unnamed: 0_level_1,Float64,Float64,Float64,Float64,String
1,5.1,3.5,1.4,0.2,setosa
2,4.9,3.0,1.4,0.2,setosa
3,4.7,3.2,1.3,0.2,setosa
4,4.6,3.1,1.5,0.2,setosa
5,5.0,3.6,1.4,0.2,setosa
6,5.4,3.9,1.7,0.4,setosa
7,4.6,3.4,1.4,0.3,setosa
8,5.0,3.4,1.5,0.2,setosa
9,4.4,2.9,1.4,0.2,setosa
10,4.9,3.1,1.5,0.1,setosa


In [12]:
# We will only use the sepal length and width for our analysis 
data = [x for x in zip(iris[1], iris[2], iris[5])]

│   caller = top-level scope at In[12]:1
└ @ Core In[12]:1
│   caller = top-level scope at In[12]:1
└ @ Core In[12]:1
│   caller = top-level scope at In[12]:1
└ @ Core In[12]:1


100-element Array{Tuple{Float64,Float64,String},1}:
 (5.1, 3.5, "setosa")    
 (4.9, 3.0, "setosa")    
 (4.7, 3.2, "setosa")    
 (4.6, 3.1, "setosa")    
 (5.0, 3.6, "setosa")    
 (5.4, 3.9, "setosa")    
 (4.6, 3.4, "setosa")    
 (5.0, 3.4, "setosa")    
 (4.4, 2.9, "setosa")    
 (4.9, 3.1, "setosa")    
 (5.4, 3.7, "setosa")    
 (4.8, 3.4, "setosa")    
 (4.8, 3.0, "setosa")    
 ⋮                       
 (5.6, 3.0, "versicolor")
 (5.5, 2.5, "versicolor")
 (5.5, 2.6, "versicolor")
 (6.1, 3.0, "versicolor")
 (5.8, 2.6, "versicolor")
 (5.0, 2.3, "versicolor")
 (5.6, 2.7, "versicolor")
 (5.7, 3.0, "versicolor")
 (5.7, 2.9, "versicolor")
 (6.2, 2.9, "versicolor")
 (5.1, 2.5, "versicolor")
 (5.7, 2.8, "versicolor")

In [13]:
using Plots
scatter([x[1:2] for x in data if x[3] == "setosa"], label = "setosa")
scatter!([x[1:2] for x in data if x[3] != "setosa"], label = "versicolor")
plot!(title = "Iris 2-D Data", xlabel = "Sepal Length", ylabel = "Sepal Width")

ErrorException: error compiling _plot!: error compiling _display: could not load library "libGR.dll"
The specified module could not be found.


In [17]:
# Our data set D consists of two vectors of information. 
# Assign X: input data
# Assign Y: known values 
X, Y = [[x[1], x[2]] for x in data], [x[3] == "setosa" ? 1 : -1 for x in data]
Y

100-element Array{Int64,1}:
  1
  1
  1
  1
  1
  1
  1
  1
  1
  1
  1
  1
  1
  ⋮
 -1
 -1
 -1
 -1
 -1
 -1
 -1
 -1
 -1
 -1
 -1
 -1

In [5]:
# Assign random weights
w = rand(3)

# Perceptron Hypothesis Function 
function h(w, x)
    x_new = [1.0, x[1], x[2]]
    return w'x_new > 0 ? 1 : -1
end

h (generic function with 1 method)

In [6]:
# Perceptron Learning Algorithm 
function PLA(w, x, y)
    if h(w, x) != y
        w += y*[1.0, x[1], x[2]]
    end
    return w
end

PLA (generic function with 1 method)

In [27]:
# Iterate the PLA 
for i = 1:3000
    # Choose random entries to update (if possible )
    J = []
    j = rand(1:100)
    append
    w = PLA(w, X[j], Y[j])
end


46
50
34
56
53
8
79
97
80
48
59
6
24
64
22
95
55
2
88
78
74
36
83
27
46
54
66
18
74
72
30
10
57
57
13
21
52
23
7
80
47
83
86
38
85
100
53
4
62
90
30
22
72
59
93
35
2
29
80
65
58
27
60
12
6
25
30
3
20
60
80
28
6
29
55
88
18
70
80
99
62
27
75
4
6
65
59
11
61
10
35
96
69
32
21
76
76
35
22
75
10
98
52
6
93
25
29
80
2
67
51
79
39
22
53
23
55
7
27
3
87
11
44
61
5
55
6
39
32
93
2
18
58
14
89
46
67
33
90
8
55
75
30
52
10
9
9
67
55
31
27
90
65
95
77
8
11
86
39
98
69
59
80
84
44
12
58
99
54
52
84
38
72
14
6
64
33
64
51
21
16
46
56
96
71
28
3
76
99
69
69
77
77
89
70
38
92
3
76
32
42
44
92
43
21
62
10
60
33
97
74
5
51
9
66
31
67
77
10
88
37
98
53
67
56
26
21
62
77
56
46
31
67
81
27
4
11
11
1
8
38
95
78
32
25
11
9
39
48
80
97
9
93
57
26
70
81
32
9
16
15
62
11
24
98
42
12
18
55
4
54
34
22
64
73
61
95
49
36
61
65
42
63
54
1
59
99
91
44
74
79
38
84
41
36
54
87
96
9
9
40
30
77
72
73
86
60
54
95
24
1
67
3
32
31
5
28
75
58
3
3
79
92
90
58
55
9
14
39
67
92
38
22
53
22
67
86
47
57
94
72
57
23
9
35
57
33
75

3-element Array{Float64,1}:
  49.05088182201718
 -52.08858484437051
  71.59829861728468

In [24]:
# Create a user friendly function that predicts the species of a given flower
function predictor(n)
    return h(w, X[n]) == 1 ? "setosa" : "versicolor"
end
X[60]

2-element Array{Float64,1}:
 5.2
 2.7

In [21]:
# Test your predictions!
predictor(60)

"versicolor"

In [22]:
g(x) = w[2]

g (generic function with 1 method)