# Iris Flower Species Detection Model

Iris flower has three species; setosa, versicolor and virginica, which differ according to their measurements.

This is a RandomForestClassifer model that predicts the species of the flower with features SepalLenth (cm), SepalWidth (cm), PetalLength (cm), PetalWidth (cm).

# 1.Load Python Packages

In [76]:
import numpy as np
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score

# 2.Pre-process the data

In [77]:
iris = pd.read_csv("Iris.csv")
iris=iris.drop("Id",axis=1)

Converting the flower species into numerical data. The values associated with various species are given below :

Iris-setosa : 1

Iris-versicolor : 2

Iris-virginica : 3

In [78]:
iris.Species.unique()

array(['Iris-setosa', 'Iris-versicolor', 'Iris-virginica'], dtype=object)

In [79]:
iris.Species.replace("Iris-setosa",1,inplace=True)
iris.Species.replace("Iris-versicolor",2,inplace=True)
iris.Species.replace("Iris-virginica",3,inplace=True)
iris.Species.unique()

array([1, 2, 3], dtype=int64)

In [80]:
iris.head(10)

Unnamed: 0,SepalLengthCm,SepalWidthCm,PetalLengthCm,PetalWidthCm,Species
0,5.1,3.5,1.4,0.2,1
1,4.9,3.0,1.4,0.2,1
2,4.7,3.2,1.3,0.2,1
3,4.6,3.1,1.5,0.2,1
4,5.0,3.6,1.4,0.2,1
5,5.4,3.9,1.7,0.4,1
6,4.6,3.4,1.4,0.3,1
7,5.0,3.4,1.5,0.2,1
8,4.4,2.9,1.4,0.2,1
9,4.9,3.1,1.5,0.1,1


In [81]:
iris.dtypes

SepalLengthCm    float64
SepalWidthCm     float64
PetalLengthCm    float64
PetalWidthCm     float64
Species            int64
dtype: object

In [82]:
iris.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 150 entries, 0 to 149
Data columns (total 5 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   SepalLengthCm  150 non-null    float64
 1   SepalWidthCm   150 non-null    float64
 2   PetalLengthCm  150 non-null    float64
 3   PetalWidthCm   150 non-null    float64
 4   Species        150 non-null    int64  
dtypes: float64(4), int64(1)
memory usage: 6.0 KB


# 3.Subset the data

specifying the input attribute values and target values.

In [83]:
x = iris.drop("Species",axis=1)
y = iris.Species

# 4.Split data into train and test data

splitting the data into 4:1 ratio for training data and testing data respectively.

X_train : Training input attributes.

X_test : Testing input attributes.

Y_train : Training target attributes.

Y_test : Testing target attibutes.

In [84]:
X_train, X_test, Y_train, Y_test = train_test_split(x, y, test_size=0.2)

In [85]:
X_train.shape, X_test.shape, Y_train.shape, Y_test.shape

((120, 4), (30, 4), (120,), (30,))

In [86]:
X_train

Unnamed: 0,SepalLengthCm,SepalWidthCm,PetalLengthCm,PetalWidthCm
17,5.1,3.5,1.4,0.3
127,6.1,3.0,4.9,1.8
119,6.0,2.2,5.0,1.5
110,6.5,3.2,5.1,2.0
24,4.8,3.4,1.9,0.2
...,...,...,...,...
87,6.3,2.3,4.4,1.3
28,5.2,3.4,1.4,0.2
138,6.0,3.0,4.8,1.8
136,6.3,3.4,5.6,2.4


In [87]:
X_test

Unnamed: 0,SepalLengthCm,SepalWidthCm,PetalLengthCm,PetalWidthCm
146,6.3,2.5,5.0,1.9
25,5.0,3.0,1.6,0.2
90,5.5,2.6,4.4,1.2
30,4.8,3.1,1.6,0.2
47,4.6,3.2,1.4,0.2
37,4.9,3.1,1.5,0.1
96,5.7,2.9,4.2,1.3
56,6.3,3.3,4.7,1.6
34,4.9,3.1,1.5,0.1
131,7.9,3.8,6.4,2.0


In [88]:
Y_train

17     1
127    3
119    3
110    3
24     1
      ..
87     2
28     1
138    3
136    3
20     1
Name: Species, Length: 120, dtype: int64

In [89]:
Y_test

146    3
25     1
90     2
30     1
47     1
37     1
96     2
56     2
34     1
131    3
7      1
8      1
134    3
76     2
130    3
15     1
4      1
79     2
91     2
139    3
107    3
21     1
46     1
73     2
140    3
88     2
57     2
109    3
45     1
44     1
Name: Species, dtype: int64

# 5. Build Random Forest Classifier

Building RandomForestClassifier model and tarining it with the training data.

In [90]:
clf = RandomForestClassifier(n_estimators=100)

In [91]:
clf.fit(X_train, Y_train)

RandomForestClassifier()

# 6.Prediction

Prediction of the test data with the Model.

In [92]:
y_label = clf.predict(X_test)

In [93]:
y_label

array([3, 1, 2, 1, 1, 1, 2, 2, 1, 3, 1, 1, 3, 2, 3, 1, 1, 2, 2, 3, 3, 1,
       1, 2, 3, 2, 2, 3, 1, 1], dtype=int64)

Comparing the accuracy of the training model based on trained data.

In [94]:
clf.score(X_train, Y_train)

1.0

# 7.Check the Accuracy of the Model

In [95]:
clf.score(X_test,Y_test)

1.0

In [96]:
print(classification_report(Y_test, y_label))

              precision    recall  f1-score   support

           1       1.00      1.00      1.00        13
           2       1.00      1.00      1.00         9
           3       1.00      1.00      1.00         8

    accuracy                           1.00        30
   macro avg       1.00      1.00      1.00        30
weighted avg       1.00      1.00      1.00        30



In [97]:
confusion_matrix(Y_test, y_label)

array([[13,  0,  0],
       [ 0,  9,  0],
       [ 0,  0,  8]], dtype=int64)

In [98]:
accuracy_score(Y_test, y_label)

1.0

# 8.Check feature importance

In [99]:
list(zip(X_train, clf.feature_importances_))

[('SepalLengthCm', 0.07996700575860766),
 ('SepalWidthCm', 0.03237529325165018),
 ('PetalLengthCm', 0.41084299532140295),
 ('PetalWidthCm', 0.4768147056683391)]

# 9.Estimator Selection

Efficient estimator selection using the n_estimators. It is an iterative step to find the accuracy of the model at various levels of estimator.

In [100]:
for i in range(10,110,10):
    print(f"Trying model with {i} estimators...")
    clf = RandomForestClassifier(n_estimators=i)
    clf.fit(X_train, Y_train)
    print(f"The model accuracy on test set: {clf.score(X_test, Y_test)*100:.2f}%")
    print("")

Trying model with 10 estimators...
The model accuracy on test set: 96.67%

Trying model with 20 estimators...
The model accuracy on test set: 96.67%

Trying model with 30 estimators...
The model accuracy on test set: 100.00%

Trying model with 40 estimators...
The model accuracy on test set: 100.00%

Trying model with 50 estimators...
The model accuracy on test set: 100.00%

Trying model with 60 estimators...
The model accuracy on test set: 100.00%

Trying model with 70 estimators...
The model accuracy on test set: 100.00%

Trying model with 80 estimators...
The model accuracy on test set: 96.67%

Trying model with 90 estimators...
The model accuracy on test set: 100.00%

Trying model with 100 estimators...
The model accuracy on test set: 96.67%



In [101]:
clf = RandomForestClassifier(n_estimators=50).fit(X_train, Y_train)

# 10.Save the model

In [102]:
import pickle
pickle.dump(clf, open("iris_prediction.pkl", "wb"))

# 11. Load the saved model

In [103]:
loaded_model = pickle.load(open("iris_prediction.pkl", "rb"))
loaded_model.score(X_test, Y_test)

1.0

# 12. Testing with random input attributes

In [106]:
loaded_model.predict([[4.7,3.2,8.2,2.9]])



array([3], dtype=int64)