<a href="https://colab.research.google.com/github/ChanglinWu/DL/blob/main/IRIS_Classification_MLP3_98_0.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Simple PyTorch MLP Iris Flower Classification

This example shows how to implement a 3-layer Multi-Layer Perceptron (MLP) model (ask as feedforward neural network) for Iris flower classification using the well-known Iris dataset.

The Iris dataset is a classic dataset that is commonly used to demonstrate supervised learning techniques in machine learning and deep learning. It was introduced by statistician Ronald Fisher in his 1936 paper "The Use of Multiple Measurements in Taxonomic Problems".

<img src='https://www.ee.cityu.edu.hk/~lmpo/ee5438/images/iris_mlp_classifier.png'>

The dataset contains 150 samples of iris flowers from three different species - Iris setosa, Iris versicolor and Iris virginica. Each sample contains the following four features:

- Sepal length in cm
- Sepal width in cm
- Petal length in cm
- Petal width in cm

The dataset contains 50 samples from each of the three iris species. One flower species is linearly separable from the other two, but the other two species are not linearly separable from each other.

References:

- [Mastering Pandas DataFrames for Machine Learning](https://pub.aimind.so/mastering-pandas-dataframes-for-machine-learning-490896e73a3a)


In [None]:
import numpy as np
import pandas as pd
import torch
import torch.nn as nn
import torch.nn.functional as F
from sklearn import preprocessing
from sklearn.metrics import accuracy_score

df = pd.read_csv("https://www.ee.cityu.edu.hk/~lmpo/ee5438/data/iris.csv", na_values=["NA", "?"])

We load the IRIS dataset from a CSV file into a Pandas dataframe. We will use these four numeric values as features for IRIS specie classificaiton.

In [None]:
df

Unnamed: 0,Id,SepalLengthCm,SepalWidthCm,PetalLengthCm,PetalWidthCm,Species
0,1,5.1,3.5,1.4,0.2,Iris-setosa
1,2,4.9,3.0,1.4,0.2,Iris-setosa
2,3,4.7,3.2,1.3,0.2,Iris-setosa
3,4,4.6,3.1,1.5,0.2,Iris-setosa
4,5,5.0,3.6,1.4,0.2,Iris-setosa
...,...,...,...,...,...,...
145,146,6.7,3.0,5.2,2.3,Iris-virginica
146,147,6.3,2.5,5.0,1.9,Iris-virginica
147,148,6.5,3.0,5.2,2.0,Iris-virginica
148,149,6.2,3.4,5.4,2.3,Iris-virginica


We convert the Pandas dataframe into NumPy arrays, then into PyTorch tensors. We use all four features as input features to classify the species.

In [None]:
# LabelEncoder() from scikit-learn is used to encode categorical values to numeric labels.
le = preprocessing.LabelEncoder()

x = df[["SepalLengthCm", "SepalWidthCm", "PetalLengthCm", "PetalWidthCm"]].values
y = le.fit_transform(df["Species"])
species = le.classes_

x = torch.tensor(x, dtype=torch.float32)
y = torch.tensor(y, dtype=torch.long)

We are designing a feedforward neural network, also called a multilayer perceptron (MLP), that can automatically detect the number of input features and create the input layer accordingly.

In this example, there are 4 input features (Sepal length, Sepal width,
Petal length, and Petal width) based on the Iris dataset. These 4 input features connect to a hidden layer with 25 neurons. This hidden layer then connects to another hidden layer with 15 neurons. The Relu activation functions are used for the two hidden layers.

The output layer should have a number of neurons matching the number of classes, which is 3 for the Iris dataset. The Softmax function is used for the output layer. CrossEntropyLoss is used as the loss function.

Softmax is commonly used in classification problems because it converts the raw outputs into normalized probability scores for each class. This allows us to interpret the predictions as confidence levels and pick the class with the highest probability.


In [None]:
model = nn.Sequential(
    nn.Linear(x.shape[1], 25),
    nn.ReLU(),
    nn.Linear(25, 15),
    nn.ReLU(),
    nn.Linear(15, len(species)),
    nn.Softmax(dim=1),
)

We compile the model using the AOT (ahead-of-time) compilation feature in the eager mode backend. AOT compilation allows you to optimize and prepare the model for execution ahead of time, which can improve inference speed.

We use Cross Entropy Loss function, which is common for multi-class classificaiton tasks.

To train the model, we utilize the Stochastic Gradient Descent (SGD) optimizer with a learning rate of 0.01. SGD is a basic optimization algorithm for neural network training. The learning rate determines the step size taken during optimization, and in this case, 0.01 is a reasonable starting value. However, you can experiment with different optimizers like Adam and adjust the learning rate within the range of 0 to 1 to find the optimal performance. Setting the learning rate correctly is important, as too high or too low values can negatively impact the training process.

In [None]:
# PyTorch 2.0 Model Compile
# Enables ahead-of-time (AOT) compilation using the eager mode backend.
model = torch.compile(model,backend="aot_eager")

cross_entropy_loss = nn.CrossEntropyLoss()  # cross entropy loss

optimizer = torch.optim.SGD(model.parameters(), lr=0.01)  # SGD optimizer

# optimizer = torch.optim.Adam(model.parameters(), lr=0.01) # Adam optimizer

We call the train() function to tell PyTorch we are now training the model. Later, we call the eval() function to tell PyTorch we are now evaluating the model.

In [None]:
model.train()
for epoch in range(2000):
    optimizer.zero_grad()
    out = model(x)
    # CrossEntropyLoss combines nn.Softmax() and nn.NLLLoss()
    loss = cross_entropy_loss(out, y)
    loss.backward()
    optimizer.step()

    if epoch % 100 == 0:
        print(f"Epoch {epoch}, loss: {loss.item()}")

Epoch 0, loss: 1.1241774559020996
Epoch 100, loss: 1.0835837125778198
Epoch 200, loss: 1.0565294027328491
Epoch 300, loss: 1.0182915925979614
Epoch 400, loss: 0.9746531248092651
Epoch 500, loss: 0.9259414672851562
Epoch 600, loss: 0.878994882106781
Epoch 700, loss: 0.8453763723373413
Epoch 800, loss: 0.8200873732566833
Epoch 900, loss: 0.7983616590499878
Epoch 1000, loss: 0.7777446508407593
Epoch 1100, loss: 0.7570653557777405
Epoch 1200, loss: 0.7361510396003723
Epoch 1300, loss: 0.7156763076782227
Epoch 1400, loss: 0.696642279624939
Epoch 1500, loss: 0.6797868609428406
Epoch 1600, loss: 0.6654560565948486
Epoch 1700, loss: 0.6535544395446777
Epoch 1800, loss: 0.6437768340110779
Epoch 1900, loss: 0.6357559561729431


Now that we have finished training the neural network model, we want to use it to make predictions.

The code below uses the trained model to generate predictions. Just like before, it will return 3 prediction values for each of the 150 iris flower samples. This is because there were 3 types of iris flowers (Iris-setosa, Iris-versicolor, and Iris-virginica).

In [None]:
# Print out number of species found:

print(species)

['Iris-setosa' 'Iris-versicolor' 'Iris-virginica']


We call the eval() function to tell PyTorch that we are no longer training the model. Instead, we want to evaluate the trained model and generate predictions.

By putting the model in evaluation mode, PyTorch will turn off things like dropout layers and batch normalization layers. This allows us to get predictions that are optimized for inference rather than training.

In [None]:
model.eval()
pred = model(x)
print(f"Shape: {pred.shape}")
print(pred[0:10])

Shape: torch.Size([150, 3])
tensor([[9.8849e-01, 1.1509e-02, 6.3188e-09],
        [9.8342e-01, 1.6579e-02, 3.2629e-08],
        [9.8402e-01, 1.5984e-02, 2.7161e-08],
        [9.8047e-01, 1.9531e-02, 6.7597e-08],
        [9.8843e-01, 1.1569e-02, 6.4286e-09],
        [9.9043e-01, 9.5744e-03, 2.8586e-09],
        [9.8363e-01, 1.6373e-02, 3.0364e-08],
        [9.8658e-01, 1.3415e-02, 1.2649e-08],
        [9.7696e-01, 2.3037e-02, 1.4081e-07],
        [9.8386e-01, 1.6138e-02, 2.8983e-08]], grad_fn=<SliceBackward0>)


Scientific notation is a way to write very large or very small numbers compactly in the form of a number multiplied by a power of 10. If you would like to turn of scientific notation, the following line can be used:

In [None]:
np.set_printoptions(suppress=True)

Now we see these values rounded up.

In [None]:
print(pred[0:10].detach().numpy())

[[0.9884913  0.01150869 0.00000001]
 [0.98342127 0.01657872 0.00000003]
 [0.9840164  0.0159836  0.00000003]
 [0.9804685  0.01953143 0.00000007]
 [0.98843074 0.0115693  0.00000001]
 [0.99042565 0.00957435 0.        ]
 [0.98362696 0.01637298 0.00000003]
 [0.9865849  0.01341505 0.00000001]
 [0.97696334 0.02303652 0.00000014]
 [0.98386186 0.01613817 0.00000003]]


Typically, the model predicts the class with the highest prediction score.

The argmax function finds the index of the maximum prediction for each sample.

We can use argmax to get the highest prediction column.

Then we lookup the actual class name using that column index.

This automatically converts the predictions to class names.

In [None]:
_, predict_classes = torch.max(pred, 1)
print(f"Predictions: {predict_classes}")
print(f"Expected: {y}")

Predictions: tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1,
        2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
        1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
        2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
        2, 2, 2, 2, 2, 2])
Expected: tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
        1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
        1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
        2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,

It is easy to turn the index numbers back into the names of the iris species. We can use the list of species names that we created before.

In [None]:
print(species[predict_classes[1:10]])

['Iris-setosa' 'Iris-setosa' 'Iris-setosa' 'Iris-setosa' 'Iris-setosa'
 'Iris-setosa' 'Iris-setosa' 'Iris-setosa' 'Iris-setosa']


Accuracy is an error measurement that is easy to understand. It is like a test score. For all the iris flower predictions the neural network made, what percent were right? The problem with only using accuracy is that it does not think about how sure the neural network was for each prediction.

In [None]:
correct = accuracy_score(y, predict_classes)
print(f"Accuracy: {correct}")

Accuracy: 0.98


In [None]:
# Confusion matrix
from sklearn.metrics import confusion_matrix
conf_mat = confusion_matrix(predict_classes, y)
print(f"Confusion Matrix \n {conf_mat}")

Confusion Matrix 
 [[50  0  0]
 [ 0 47  0]
 [ 0  3 50]]


The code makes two predictions. The first prediction is for one iris flower. The second prediction is for two iris flowers. For the second prediction, we use "argmax" and say axis=1. This is because now we have a 2D array instead of 1D array. By saying axis=1, we tell it to find the maximum column index for each row. This lets us get the predicted class for each of the two flowers

In [None]:
sample_flower = torch.tensor([[6.7, 4.5, 5.0, 2.4]])
pred = model(sample_flower)
print(pred)
_, predict_classes = torch.max(pred, 1)
print(f"Predict that {sample_flower} is: {species[predict_classes]}")

tensor([[0.0012, 0.9282, 0.0706]], grad_fn=<CompiledFunctionBackward>)
Predict that tensor([[6.7000, 4.5000, 5.0000, 2.4000]]) is: Iris-versicolor


You can also make predictions for two sample flowers at the same time:

In [None]:
sample_flower = torch.tensor(
    [[6.7, 4.5, 5.0, 2.4], [5.2, 3.5, 1.5, 0.8], [5.8, 3.1, 5.2, 1.9]])
pred = model(sample_flower)
print(pred)
_, predict_classes = torch.max(pred, 1)
print(f"Predict that these two flowers {sample_flower} ")
print(f"are: {species[predict_classes.cpu().detach()]}")

tensor([[1.2075e-03, 9.2822e-01, 7.0572e-02],
        [9.8302e-01, 1.6981e-02, 1.8414e-08],
        [1.6020e-05, 9.3275e-02, 9.0671e-01]],
       grad_fn=<CompiledFunctionBackward>)
Predict that these two flowers tensor([[6.7000, 4.5000, 5.0000, 2.4000],
        [5.2000, 3.5000, 1.5000, 0.8000],
        [5.8000, 3.1000, 5.2000, 1.9000]]) 
are: ['Iris-versicolor' 'Iris-setosa' 'Iris-virginica']
