<img src="images/kiksmeisedwengougent.png" alt="Banner" width="1100"/>

<div>
    <font color=#690027 markdown="1">   
<h1>CLASSIFICATION OF THE IRIS DATASET</h1>    </font>
</div>

<div class="alert alert-box alert-success">
In this notebook you will see how a <em>machine learning</em> system manages to <b>linearly separate</b> two classes of points. The <b>Perceptron algorithm</b> starts from a randomly chosen straight line. The algorithm adjusts the coefficients in the equation of the line step by step, based on labeled data, until eventually a straight line is obtained that separates the two classes from each other.</div>

The Iris dataset was published in 1936 by the Brit, Ronald Fischer, in 'The Use of Multiple Measurements in Taxonomic Problems' [1][2].<br>The dataset pertains to **three types of irises** (*Iris setosa*, *Iris virginica* and *Iris versicolor*).
Fischer could distinguish the species from each other based on **four characteristics**: the length and width of the calyx leaves and the petals.

<table><tr>
<td><img src="images/irissetosa.jpg" alt="Drawing" width="200"/></td>
<td><img src="images/irisversicolor.jpg" alt="Drawing" width="220"/></td>
<td><img src="images/irisvirginica.jpg" alt="Drawing" width="203"/></td>
</tr></table>

<table><tr>
<td><em>Iris setosa</em> [3]</td><td> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</td><td><em>Iris versicolor</em> [4]</td><td>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</td><td><em>Iris virginica</em> [5]</td></tr></table>
<br>
<center>Figure 1: <em>Iris setosa</em> by Radomil Binek. <a href="https://creativecommons.org/licenses/by-sa/3.0">CC BY-SA 3.0</a>, via Wikimedia Commons;<br> <em>Iris versicolor</em>. No machine-readable author provided. Dlanglois assumed (based on copyright claims). CC BY-SA 3.0, via Wikimedia Commons;<br> <em>Iris virginica</em> by Frank Mayfield. <a href="https://creativecommons.org/licenses/by-sa/2.0">CC BY-SA 2.0</a>, via Wikimedia Commons.</center>

The Iris dataset is a *multivariate dataset*, i.e. a dataset with multiple variables, containing 50 samples from each species. From each sample, the length and the width of a petal and a sepal were measured in centimeters.

<img src="images/kelkbladkroonblad.jpg" alt="Drawing" width="400"/> <br>
<center>Figure 2: Sepal and petal.</center>

### Import the necessary modules

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

from matplotlib import animation   # for animation
from IPython.display import HTML   # to show animation in notebook

<div>
    <font color=#690027 markdown="1"> 
<h2>1. Reading in the data</h2>    </font>
</div>

Read the Iris dataset using the `pandas` module.

In [None]:
# read in dataset
# table to be read has a heading
iris = pd.read_csv("data/iris.csv", header="infer")

<div>
    <font color=#690027 markdown="1"> 
<h2>2. Displaying the read data</h2>    </font>
</div>

Look at the data. Both the four characteristics and the name of the species are displayed. The number of samples is easy to read.

### Assignment 2.1
How many **variables** does this *multivariate dataset* have?

Answer: the dataset has ... variables.

In [None]:
# display dataset in table
iris

This table corresponds to a matrix with 150 rows and 5 columns: <br>150 monsters, 4 characteristics (x1, x2, x3, x4) and 1 label (y) <br><br>The characteristics:<br>
- first column: sepal length
- second column: width of sepal
- third column: petal length
- fourth column: petal width<br><br>
The label:<br>
- last column: the name of the species

<div class="alert alert-box alert-info">
For the machine learning system, the <em>features</em> will serve as <b>input</b> and the <em>labels</em> as <b>output</b>.</div>

It is possible to only show the beginning or only the last part of the table.

In [None]:
# first part of the table
iris.head()

In [None]:
# last part of the table
iris.tail()

It is also possible to display a certain part of the table.

In [None]:
# show table from row 46 to row 53
iris[46:54]

Note that <span style="background-color:whitesmoke; font-family:consolas; font-size:1em;">[46:54]</span> stands for the *half-open interval* [46:54[.

### In this notebook you will work with this last part table.

<div>
    <font color=#690027 markdown="1"> 
<h2>3. Research: Can two types of irises be distinguished based on two characteristics?</h2>    </font>
</div>

<div>
    <font color=#690027 markdown="1"> 
<h3>3.1 Consider four samples from each of two types of irises, <em>Iris setosa</em> and <em>Iris versicolor</em></h3>    </font>
</div>

<table><tr>
<td><img src="images/irissetosa.jpg" alt="Drawing" width="200"/></td>
<td><img src="images/irisversicolor.jpg" alt="Drawing" width="300"/></td>
</tr></table>

<table><tr>
<td> Figure 3: <em>Iris setosa</em></td><td> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</td><td> <em>Iris versicolor</em> </td></tr></table>

In the *part table* there are four samples of each. <br>In the first four columns of the table there is a feature, in the last column there is the label.

<div class="alert alert-box alert-info">
For the machine learning system, these features are called $x_{i}$ and the label $y$.<br></div>

<div class="alert alert-box alert-danger">
If you prefer not to adhere to the notations from machine learning, because you prefer meaningful variable names, that is possible. Then choose, instead of for example $x_{1}$ for <code>lengte_kelkblad</code>.</div>

In [None]:
x1 = iris["lengte kelkblad"]          # feature: sepal length
x2 = iris["breedte kelkblad"]         # feature: width of sepal
x3 = iris["lengte kroonblad"]         # feature: petal length
x4 = iris["breedte kroonblad"]        # feature: petal width

y = iris["Iris type"]                 # label: type

In [None]:
print(x1)
print(y)

<div>
    <font color=#690027 markdown="1"> 
<h3>3.2 Preparing the data</h3>    </font>
</div>

In [None]:
# convert to NumPy array
x1 = np.array(x1)
x2 = np.array(x2)
x3 = np.array(x3)
x4 = np.array(x4)

You only have to work with two characteristics: the length of the petal and the sepal.<br>And you only need the 8 monsters from the particle table.

In [None]:
# choose sepal length and petal length, these are in the first and third column
# select four samples of setosa and four samples of versicolor
x1 = x1[46:54]
x3 = x3[46:54]
y = y[46:54]

<div>
    <font color=#690027 markdown="1"> 
<h3>3.3 Standardizing the data</h3>    </font>
</div>

To standardize, the Z-scores of the features are being adopted.

<div class="alert alert-box alert-warning">
For more explanation on the importance of standardization, we refer to the notebook 'Standardization'.</div>

In [None]:
x1 = (x1-np.mean(x1))/np.std(x1)
x3 = (x3-np.mean(x3))/np.std(x3)

In [None]:
print(x1)
print(x3)

<div>
    <font color=#690027 markdown="1">     
<h3>3.4 Displaying the data in scatter plot</h3>    </font>
</div>

In [None]:
# petal length vs. sepal length
# sepal length on x-axis, petal length comes on y-axis
plt.scatter(x1, x3, color="black", marker="o")
plt.title("Iris")
plt.xlabel("sepal length (cm)")          # xlabel provides a description on the x1-axis
plt.ylabel("petal length (cm)")         # ylabel provides a description on the x3-axis
plt.show()

There are two groups to distinguish. Moreover, these groups are **linearly separable**: they can be separated by a straight line. <br>It is not clear from the graph which datapoint corresponds to which type of iris, since all points are represented in the same way.

<div>
    <font color=#690027 markdown="1"> 
<h3>3.5 Display data in scatter plot as two classes</h3>    </font>
</div>

The representation of the point cloud is adjusted so that the two iris species are each represented by a different symbol.

In [None]:
# petal length relative to sepal length
plt.scatter(x1[:4], x3[:4], color="green", marker="o", label="setosa")      # first 4 are setosa
plt.scatter(x1[4:], x3[4:], color="blue", marker="x", label="versicolor")   # versicolor are next 4           

plt.title("Iris")
plt.xlabel("sepal length (cm)")
plt.ylabel("petal length (cm)")
plt.legend(loc="lower right")
plt.show()

<div>
    <font color=#690027 markdown="1"> 
<h2>4. Classification with the Perceptron</h2>    </font>
</div>

<div>
    <font color=#690027 markdown="1"> 
<h3>4.1 Annotated data</h3>    </font>
</div>

The AI system will learn from the 8 labeled examples.<br>You have already named the column with the labels $y$. However, the label is not a quantitative (numeric) variable. <br>There are two types of irises. If you match the species *setosa* with class $0$ and the species *versicolor* with class $1$, then you have made the **label** $y$ **numeric**.

In [None]:
# making labels numerical, setosa:0, versicolor:1
y = np.where(y == "Iris-setosa", 0, 1)                # if setosa, then 0, otherwise 1

In [None]:
print(y)

In [None]:
# reinsert standardized characteristics into matrix
# this matrix X then contains the features that the machine learning system will use
X = np.stack((x1, x3), axis=1)  # axis 1 means that x1 and x3 are considered as columns (with axis 0 as rows)
print(X)
print(X.shape)
print(X.shape[1])

The features are now in a matrix X and the labels in a vector y. The i-th row of X corresponds to two features of a certain sample and the label of that sample is at the i-th place in y.

<div>
    <font color=#690027 markdown="1"> 
<h3>4.2 The Perceptron</h3>    </font>
</div>

The Perceptron is a neural network with two layers: an input layer and an output layer.<br>The neurons of the input layer are connected to the neuron of the output layer.<br><br>The Perceptron has an algorithm to be able to learn. <br>It is trained with labeled examples: a number of input points X$_{i}$ with a corresponding label $y_{i}$. Between the neurons of the input and output layer, there are connections with a certain weight. <br>The Perceptron learns: based on the labeled examples, the weights are gradually adjusted; The adjustment is based on the Perceptron algorithm.

<img src="images/perceptronalgoritme.jpg" alt="Drawing" width="600"/> 
<center>Figure 4: The Perceptron algorithm.</center>

<img src="images/perceptron3weights.png" alt="Drawing" width="500"/> 
<center>Figure 5: Schematic representation of the Perceptron.</center>

To find a line that separates the two types of irises, we start with a **randomly chosen line**. This is done by randomly choosing the coefficients in the equation of this line.<br> Both sides of this *division line* determine a different *class*.<br> The system is *trained* with the training set including the corresponding labels: **For each point of the training set, it is checked whether the point is on the correct side of the division line.** If a point is not on the correct side of the division line, the coefficients in the equation of the line are adjusted. <br>The complete training set is run through a number of times. Such a time is called an *epoch*. The system *learns* during these *attempts ('epochs')*.

If two classes are linearly separable, one can find a straight line that separates both classes. One can write the equation of the separation line in such a way (in the form $ax+by+c=0$) that for every point $(x_{1}, y_{1})$ in one class $ax_{1}+by_{1}+c >= 0$ and for every point $(x_{1}, y_{1})$ in the other class $ax_{1} +by_{1}+c < 0$. <br>As long as this is not complied with, the coefficients must be adjusted.<br>The training set with associated labels is run through several times. For each point, the coefficients are adjusted if necessary.<br><br>**The weights of the Perceptron are the coefficients in the equation of the separating line.**

So the rule here is:<br>The equation of the dividing line: $ax+by+c=0$; or thus for every point $(x_{1}, x_{3})$ in one class $ax_{1}+bx_{3}+c >= 0$ and for every point $(x_{1}, x_{3})$ in the other class $ax_{1}+bx_{3}+c < 0$. <br>$a$ is thus the coefficient of the variable $x_{1}$ and $b$ is that of $x_{3}$, $c$ is a constant.<br>In the following code cell, $a$ is represented by `coeff_x1` and $b$ by `coeff_x3`, $c$ by `cte`.<br>For an oblique straight line $ax+by+c=0$, $y = -\frac{a}{b} x - \frac{c}{b}$.

In [None]:
# preparing work

font = {"family": "serif",
       "color": "black",
        "weight": "normal",
        "size": 16,
    }

def graph(coeff_x1, coeff_x3, cte):
    """Plot decision boundary ('decision boundary') and gives its equation."""
    # length of corolla relative to length of calyx leaf        
    plt.scatter(x1[:4], x3[:4], color="green", marker="o", label="setosa")      # first 4 are setosa (label 0)        
    plt.scatter(x1[4:], x3[4:], color="blue", marker="x", label="versicolor")   # versicolor are the next 4 (label 1)
    x = np.linspace(-1.5, 1.5, 10)
    y = -coeff_x1/coeff_x3 * x - cte/coeff_x3        
    print("The boundary is a straight line with eq.", coeff_x1, "* x1 +", coeff_x3, "* x3 +", cte, "= 0")
    plt.plot(x, y_r, color="black")        
    
    plt.title("Separation of two types of irises", fontdict=font)        
    plt.xlabel("sepal length (cm)", fontdict=font)        
    plt.ylabel("petal length (cm)", fontdict=font)plt.legend(loc="lower right")       
    plt.show()        

class Perceptron(object):
    """Perceptron classifier."""    
    
    def __init__(self, eta=0.01, n_iter=50, random_state=1):
        """self has three parameters: learning rate, number of attempts, randomness."""        
        self.eta = eta        
        self.n_iter = n_iter
        self.random_state = random_state    
        
    def fit(self, X, y):
        """Fit training data."""
        rgen = np.random.RandomState(self.random_state)
        # column matrix of the weights ('weights')
        # randomly generated from normal distribution with mean 0 and standard deviation 0.01
        # number of weights is number of features in X plus 1 (+1 for the bias)        
        self.w_ = rgen.normal(loc=0.0, scale=0.01, size=X.shape[1]+1)     # weight matrix that contains 3 weights
        if self.w_[2] < 0:
            self.w_ = -self.w_   # this changes nothing on the starters line, but easier calculations
        print("Initial random weights:", self.w_)
        self.errors_ = []    # error list       

        # plot graph with separating line
        # graph(self.w_[1], self.w_[2], self.w_[0])
        coeff_rechten = np.array([self.w_])
        print(rights)
        # adjust weights point by point, based on feedback from the various attempts        
        for _ in range(self.n_iter):
            print("epoch =", _)            
            errors = 0
            counter = 0   
            
            for x, label in zip(X, y):            # x is data point (sample) from matrix X, y corresponding label
                print("counter =", counter)         # count points, there are eight
                print("point:", x, "\tlabel:", label)
                gegiste_klasse = self.predict(x)
                print("predicted class =", gegiste_klasse)
                # check adjustment for this point
                update = self.eta * (label - gegiste_klasse)     # if update = 0, correct class, no adjustment needed
                print("update=", update)
                # adjust graph and weights possibly after this point                
                if update !=0:                    
                    self.w_[0:2] += update *x                    
                    self.w_[2] += update                    
                    errors += update
                    print("weights =", self.w_)                    
                    # graph(self.w_[1], self.w_[2], self.w_[0])     
                    # preliminary 'decision boundary'
                    coeff_rechten = np.append(coeff_rechten, [self.w_], axis =0)
                    print(coeff_rechten)
                counter += 1            
            self.errors_.append(errors)           # after all points, add total error to error list
            print("error list =", self.errors_)        
        return self, coeff_rechten            # returns weight matrix and error list    
    
    def net_input(self, x):      # point filling in the provisional dividing line
        """Calculating z = linear combination of the inputs including bias and weights for each given point."""
        return np.dot(x, self.w_[0:2]) + self.w_[2]    
    
    def predict(self, x):
        """Gist class."""        
        print("point filled in straight line equation:", self.net_input(x))
        klasse = np.where(self.net_input(x) >=0, 1, 0)
        return klasse    

### Assignment 4.2.1
Search for the Perceptron algorithm in the code cell above. <br>Found?

Answer:

In [None]:
# Perceptron, learning rate 0.001 and 12 attempts
ppn = Perceptron(eta=0.001, n_iter=12)
gewichtenlijst = ppn.fit(X,y)[1]                # fit(X,y) returns two things
print("Weight list =", gewichtenlijst)

<div>
    <font color=#690027 markdown="1"> 
<h3>4.3 Animation</h3>    </font>
</div>

Now follows an **animation** where you see how the Perceptron learns. <br>First, you see a randomly chosen straight line. After that, this line is adjusted step by step until the two classes are separated from each other.

In [None]:
# animation
xcoord = np.linspace(-1.5, 1.5, 10)

lijst_ycoord = []
for w in gewichtenlijst:                       
    y_coord = -w[0]/w[1] * xcoord - w[2]/w[1]      # each w corresponds to another straight line
    lijst_ycoord.append(y_r)                   # list of y coordinates for peticular line  
lijst_ycoord = np.array(lijst_ycoord)          # type casting (form list of lists to NumPY array)

# graph window with graph (ax) in it
fig, ax = plt.subplots()
ax.axis([-2, 2, -5, 5])

ax.scatter(x1[:4], x3[:4], color="green", marker="o", label="setosa")      # first 4 setosas (label 0)
ax.scatter(x1[4:], x3[4:], color="blue", marker="x", label="versicolor")   # versicolor are the next 4 (label 1)
line, = ax.plot(xcoord, lijst_ycoord[0], color="black")    # show first line

ax.set_title("Separating two types of irises", fontdict=font)
ax.set_xlabel("length sepal (cm)", fontdict=font)
ax.set_ylabel("petal length (cm)", fontdict=font)
ax.legend(loc="lower right")

def animate(i):    
    line.set_ydata(lijst_ycoord[i])  # update data step by step by list y coordinates
    
plt.close()   # close plot window, only animation has to be shown

ani = animation.FuncAnimation(fig, animate,  interval=1000, blit=True, save_count=10, frames=len(lijst_ycoord))    

HTML(ani.to_jshtml())

<div>
    <font color=#690027 markdown="1"> 
<h3>4.4 Experiment</h3>    </font>
</div>

### Assignment 4.4.1
The learning rate or the number of attempts can be adjusted.

- Does it go faster with a smaller or larger learning rate?
- Is it also possible with fewer epochs (attempts)?

The code has already been copied below. Adjust as desired!

In [None]:
# Perceptron, learning rate 0.001 and 12 attempts
ppn = Perceptron(eta=0.001, n_iter=12)
gewichtenlijst = ppn.fit(X,y)[1]
print("Weight list =", gewichtenlijst)

In [None]:
# animation
xcoord = np.linspace(-1.5, 1.5, 10)

lijst_ycoord = []
for w in gewichtenlijst:                       
    y_coord = -w[0]/w[1] * xcoord - w[2]/w[1]      # each w corresponds to another straight line
    lijst_ycoord.append(y_r)                   # list of y coordinates for peticular line  
lijst_ycoord = np.array(lijst_ycoord)          # type casting (form list of lists to NumPY array)

# graph window with graph (ax) in it
fig, ax = plt.subplots()
ax.axis([-2, 2, -5, 5])

ax.scatter(x1[:4], x3[:4], color="green", marker="o", label="setosa")      # first 4 setosas (label 0)
ax.scatter(x1[4:], x3[4:], color="blue", marker="x", label="versicolor")   # versicolor are the next 4 (label 1)
line, = ax.plot(xcoord, lijst_ycoord[0], color="black")    # show first line

ax.set_title("Separating two types of irises", fontdict=font)
ax.set_xlabel("length sepal (cm)", fontdict=font)
ax.set_ylabel("petal length (cm)", fontdict=font)
ax.legend(loc="lower right")

def animate(i):    
    line.set_ydata(lijst_ycoord[i])  # update data step by step by list y coordinates
    
plt.close()   # close plot window, only animation has to be shown

ani = animation.FuncAnimation(fig, animate,  interval=1000, blit=True, save_count=10, frames=len(lijst_ycoord))    

HTML(ani.to_jshtml())

<div>
    <font color=#690027 markdown="1"> 
<h2>5. Now conduct a research yourself, for example with two other types of irises or with other features</h2>    </font>
</div>

### Assignment 5.1
- length of sepal vs. width of sepal
- setosa and virginica
- more monsters

(Tip: search the internet to find out how to merge NumPy arrays.)

### Assignment 5.2
Do you find two types of irises that are not linearly separable?

Answer:

<div class="alert alert-box alert-info">
The Perceptron is a neural network with two layers: an input layer and an output layer. Between the neurons of the input and output layer, there are connections with a specific weight. <br>The Perceptron is suitable for separating classes that are linearly separable.<br>The Perceptron has an algorithm to be able to learn, it is trained with labeled examples. The Perceptron learns by adjusting the weights in the network after each input point.</div>

<div>
<h2>Reference list</h2></div>

[1] Dua, D., & Karra Taniskidou, E. (2017). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. <br> &nbsp; &nbsp; &nbsp; &nbsp; Irvine, CA: University of California, School of Information and Computer Sciences.<br>[2] Fisher, R. A. (1936). The Use of Multiple Measurements in Taxonomic Problems. *Annals of Eugenics*. 7(2), 179–188. <br> &nbsp; &nbsp; &nbsp; &nbsp; https://doi.org/10.1111/j.1469-1809.1936.tb02137.x.<br>[3] Radomil Binek [CC BY-SA 3.0 (https://creativecommons.org/licenses/by-sa/3.0)], via Wikimedia Commons.<br>[4] Danielle Langlois. No machine-readable author provided. Dlanglois assumed (based on copyright claims). <br> &nbsp; &nbsp; &nbsp; &nbsp;[CC BY-SA 3.0 (http://creativecommons.org/licenses/by-sa/3.0/)], via Wikimedia Commons; <br>[5] Frank Mayfield [CC BY-SA 2.0 (https://creativecommons.org/licenses/by-sa/2.0)], via Wikimedia Commons.

<div>
<h2>With support from</h2></div>

<img src="images/kikssteun.png" alt="Banner" width="1100"/>

<img src="images/cclic.png" alt="Banner" align="left" width="100"/><br><br>
KIKS notebook, see <a href="http://www.aiopschool.be">AI At School</a>, by F. Wyffels & N. Gesquière, is licensed according to a <a href="http://creativecommons.org/licenses/by-nc-sa/4.0/">Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License</a>.