<img src="images/kiksmeisedwengougent.png" alt="Banner" width="1100"/>

<div>
    <font color=#690027 markdown="1">   
<h1>CLASSIFICATION OF STOMATA ON SUNLIT AND SHADED LEAVES</h1>    </font>
</div>

<div class="alert alert-box alert-success">
In this notebook, you will separate sunlit and shaded leaves from each other. The two classes are approximately linearly separable.</div>

Krappa or crabwood is a fast-growing tree species that is commonly found in the Amazon region. Mature specimens can have a diameter of more than a meter and can be more than 40 meters high. <br>The high-quality wood is used for making furniture, flooring, masts... A fever-reducing agent is extracted from the bark. An oil for medicinal applications is produced from the seeds, including the treatment of skin diseases and tetanus, and as a repellent for insects.

<table><tr>
<td> <img src="images/andirobaamazonica.jpg" alt="Drawing" width="200"/></td>
<td> <img src="images/crabwoodtree.jpg" alt="Drawing" width="236"/> </td>
</tr></table>

<center>
Photos: Mauroguanandi [Public domain] [2] and P. S. Sena [CC BY-SA 4.0] [3].</center>

Because some climate models predict a rise in temperature and a reduction in rainfall in the coming decades, it is important to know how these trees adapt to changing conditions. <br>Scientists Camargo and Marenco conducted research in the Amazon rainforest [1].<br>In addition to the influence of seasonal rainfall, they also examined stomatal features of leaves under full sunlight and under shaded conditions.<br> For this, a number of plants, grown in the shade, were moved to full sunlight for 60 days. Another group of plants was kept in the shade. <br>The characteristics of the stomata were measured on impressions of the leaves made with transparent nail polish.

### Import required modules

In [None]:
import pandas as pdimport matplotlib.pyplot as pltimport numpy as np
from sklearn.linear_model import LogisticRegression
from matplotlib import animationfrom IPython.display import HTML

<div>
    <font color=#690027 markdown="1">   
<h2>1. Reading in the data</h2>    </font>
</div>

Read the dataset using the `pandas` module.

In [None]:
stomata = pd.read_csv("data/schaduwzon.csv", header="infer")  # table to be read has a header

<div>
    <font color=#690027 markdown="1">   
<h2>2. Displaying the read data</h2>    </font>
</div>

<div>
    <font color=#690027 markdown="1">   
<h3>2.1 Table with the data</h3>    </font>
</div>

Look into the data.

In [None]:
stomata

### Assignment 2.1- What data are characteristics?- Which data is the label?- These data can be visualized with a point cloud. What matrices do you need for this?

Answer:

Answer:- The plant species is the same everywhere: Carapa.- The characteristics are the stomatal density and the stomatal size.- The number of samples is 50.- The label is the environment in which the sample was taken: sun or shade.- To display the point cloud, you need two matrices with a dimension of 50x1.

<div>
    <font color=#690027 markdown="1">   
<h3>2.2 Displaying the Data in a Scatter Plot</h3>    </font>
</div>

The researchers plot the stomatal density against the stomatal length.<br> Proceed in the same way.

In [None]:
x1 = stomata["stomatal length"]          # feature: lengthx2 = stomata["stomatal density"]       # feature: density

In [None]:
x1 = np.array(x1)          # feature: lengthx2 = np.array(x2)          # feature: density

In [None]:
# density vs. lengthplt.figure()
plt.scatter(x1[:25], x2[:25], color="lightgreen", marker="o", label="sun")      # sun's first 25plt.scatter(x1[25:], x2[25:], color="darkgreen", marker="o", label="shadow")   # shadow are next 25           
plt.title("Carapa")plt.xlabel("stomatal length (micron)")plt.ylabel("stomatal density (per mm²)")plt.legend(loc="lower left")
plt.show()

<div>
    <font color=#690027 markdown="1">   
<h2>3. Standardize</h2>    </font>
</div>

<div>
    <font color=#690027 markdown="1">   
<h3>3.1 Linearly separable?</h3>    </font>
</div>

There are two groups to distinguish. They are linearly separable except for a few points.

<div>
    <font color=#690027 markdown="1">   
<h3>3.2 Standardize</h3>    </font>
</div>

The magnitude of this data varies greatly; therefore, the data needs to be standardized.

<div class="alert alert-block alert-warning">
More explanation about the importance of standardization can be found in the notebook 'Standardizing'.</div>

In [None]:
x1_avg = np.mean(x1)x1_std = np.std(x1)x2_avg = np.mean(x2)x2_std = np.std(x2)x1 = (x1 - x1_mean) / x1_stdx2 = (x2 - x2_avg) / x2_std

In [None]:
# density relative to lengthplt.figure()
plt.scatter(x1[:25], x2[:25], color="lightgreen", marker="o", label="sun")      # sun are first 25plt.scatter(x1[25:], x2[25:], color="darkgreen", marker="o", label="shadow")   # shadow are the next 25           
plt.title("Carapa")plt.xlabel("standardized stomatal length (micron)")plt.ylabel("standardized stomatal density (per mm²)")plt.legend(loc="lower left")
plt.show()

<div>
    <font color=#690027 markdown="1">   
<h2>4. Classification with Perceptron</h2>    </font>
</div>

<div>
    <font color=#690027 markdown="1">   
<h3>4.1 Annotated data</h3>    </font>
</div>

The ML system will machine learn from the 50 labeled examples.<br>Read the labels in.

In [None]:
y = stomata["environment"]            # labels: second column of the original tabley = np.array(y)print(y)

In [None]:
y = np.where(y == "sun", 1, 0)     # make labels numeric, sun:1, shadow:0print(y)

In [None]:
X = np.stack((x1, x2), axis = 1)    # convert to desired format

<div>
    <font color=#690027 markdown="1">   
<h3>4.2 Perceptron</h3>    </font>
</div>

<div class="alert alert-box alert-info">
If two classes are linearly separable, one can find a line that separates both classes. One can write down the equation of the dividing line in the form $ax+by+c=0$. For each point $(x_{1}, y_{1})$ in one class, then $ax_{1}+by_{1}+c \geq 0$ and for each point $(x_{2}, y_{2})$ in the other class, then $ax_{2} +by_{2}+c < 0$. <br>As long as this is not met, the coefficients must be adjusted.<br>The training set with corresponding labels is run through several times. The coefficients are adjusted for each point if necessary.</div>

A random line is chosen that should separate the two types of leaves. This is done by randomly selecting the coefficients in the equation of the line. Both sides of the dividing line determine a different class. <br>The system is trained with the training set and the given labels. For each point in the training set, it is checked whether the point is on the correct side of the dividing line. For a point that is not on the correct side of the dividing line, the coefficients in the equation of the line are adjusted. <br>The entire training set is run through several times. The system learns during these 'attempts' or *epochs*.

In [None]:
def graph(coeff_x1, coeff_x2, cte):"""Plot separation rights ('decision boundary') and provide its equation."""        # stomatal density relative to length of stomata        plt.figure()        
plt.scatter(x1[:25], x2[:25], color="lightgreen", marker="o", label="sun")      # first 25 are sun (label 1)        plt.scatter(x1[25:], x2[25:], color="darkgreen", marker="o", label="shadow")   # shadow are the next 25 (label 0)x = np.linspace(-1.5, 1.5, 10)y_r = -coeff_x1/coeff_x2 * x - cte/coeff_x2        print("The boundary is a straight line with eq.", coeff_x1, "* x1 +", coeff_x2, "* x2 +", cte, "= 0")        plt.plot(x, y_r, color="black")        
        plt.title("Classification Carapa")        plt.xlabel("standardized stomatal length (micron)")plt.ylabel("standardized stomatal density (per mm²)")        plt.legend(loc="lower left")        
        plt.show()
class Perceptron(object):"""Perceptron classifier."""    
def __init__(self, eta=0.01, n_iter=50, random_state=1):"""self has three parameters: learning rate, number of attempts, randomness."""        self.eta = eta        self.n_iter = n_iter        self.random_state = random_state    
    def fit(self, X, y):"Fit training data."rgen = np.random.RandomState(self.random_state)# column matrix of the weights ('weights')# randomly generated from normal distribution with mean 0 and standard deviation 0.01# number of weights is number of features in X plus 1 (+1 for bias)        self.w_ = rgen.normal(loc=0.0, scale=0.01, size=X.shape[1]+1)     # weight matrix that contains 3 weightsprint("Initial random weights:", self.w_)        self.errors_ = []    # error list       
# plot graph with initial separating lineprint("Initial random line:")graph(self.w_[1], self.w_[2], self.w_[0])weightslist = np.array([self.w_])                
# adjust weights point by point, based on feedback from the various attempts        for _ in range(self.n_iter):print("epoch =", _)            errors = 0counter = 0            for x, label in zip(X, y):            # x is data point, y corresponding labelprint("counter =", counter)         # count points, there are eightprint("point:", x, "\tlabel:", label)predicted_class = self.predict(x)print("predicted class =", gegiste_klasse)# check adjustment for this pointupdate = self.eta * (label - predicted_class)     # if update = 0, correct class, no adjustment neededprint("update =", update)# adjust graph and weights if necessary after this point                if update !=0:                    self.w_[1:] += update *x                    self.w_[0] += update                    errors += updateprint("weights =", self.w_) # determine provisional 'decision boundary'weightslist = np.append(weightslist, [self.w_], axis =0)counter += 1            self.errors_.append(errors)           # after all points, add total error to error listprint("error list =", self.errors_)return self, weightslist               # returns list of weight matrices    
    def net_input(self, x):      # fill in the point in the provisional division line"""Calculating z = linear combination of the inputs including bias and the weights for each given point."""return np.dot(x, self.w_[1:]) + self.w_[0]    
The input does not provide any text in Dutch, only Python code. Therefore, there is nothing to translate. The original input will be returned.
    def predict(self, x):"""Yeast class."""print("point inserted in straight line equation:", self.net_input(x))klasse = np.where(self.net_input(x) >=0, 1, 0)        return class    

In [None]:
# perceptron, learning rate 0.0001 and 20 attemptsppn = Perceptron(eta=0.0001, n_iter=20)weightslist = ppn.fit(X,y)[1]print("Weight list =", gewichtenlijst)

In [None]:
# animation
xcoord = np.linspace(-1.5, 1.5, 10)
ycoord = []for w in weightlist:    y_r = -w[1]/w[2] * xcoord - w[0]/w[2]ycoord.append(y_r)ycoord = np.array(ycoord)    # type casting
fig, ax = plt.subplots()line, = ax.plot(xcoord, ycoord[0])
plt.scatter(x1[:25], x2[:25], color="lightgreen", marker="o", label="sun")      # sun's first 25 (label 1)plt.scatter(x1[25:], x2[25:], color="darkgreen", marker="o", label="shadow")   # shadow are the next 25 (label 0)
ax.axis([-2,2,-2,2])
def animate(i):    line.set_ydata(ycoord[i])  # update the equation of the linereturn line,
plt.close()  # to close temporary plot window, only need animation screen
anim = animation.FuncAnimation(fig, animate, interval=1000, repeat=False, frames=len(ycoord))
HTML(anim.to_jshtml())

Great result! But not yet optimal.### Assignment 4.2Perhaps more iterations will provide a better result. Give it a try.

<div class="alert alert-block alert-info">
Since the classes are not linearly separable, the Perceptron will naturally fail to get the error to zero. By choosing the learning rate and the number of epochs as best as possible, you can try to achieve the best possible separation.<br>For non-linearly separable classes, one will therefore not use a Perceptron in machine learning, but try to optimally separate the classes in a different way: with gradient descent for adjustments and binary cross entropy to determine the error.</div>

<div>
<h2>With support from</h2></div>

<img src="images/kikssteun.png" alt="Banner" width="1100"/>

<img src="images/cclic.png" alt="Banner" align="left" width="100"/><br><br>
Notebook KIKS, see <a href="http://www.aiopschool.be">AI At School</a>, by F. wyffels & N. Gesquière is licensed under a <a href="http://creativecommons.org/licenses/by-nc-sa/4.0/">Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License</a>.