# Iris Flowers Classification

> In this notebook, we are trying to make a model that classifiy between 3 different types of irises’ (Setosa, Versicolour, and Virginica) based on the petal and sepal length.

![image.png](https://miro.medium.com/max/720/1*YYiQed4kj_EZ2qfg_imDWA.png)



## Importing the libraries

Importing the Data Science and Machine Learning libraries

* Pandas as **`pd`**
* Numpy as **`np`**
* Matplotlib as **`plt`**

In [None]:
# Import libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# We will leave the scikit library as we will only import functions of the library when we need to

## Import and analyze the data

In [2]:
df = pd.read_csv("iris.csv")

In [3]:
df.head()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species_name,species_no
0,5.1,3.5,1.4,0.2,Iris-setosa,0
1,4.9,3.0,1.4,0.2,Iris-setosa,0
2,4.7,3.2,1.3,0.2,Iris-setosa,0
3,4.6,3.1,1.5,0.2,Iris-setosa,0
4,5.0,3.6,1.4,0.2,Iris-setosa,0


In [4]:
len(df)

150

In [5]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 150 entries, 0 to 149
Data columns (total 6 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   sepal_length  150 non-null    float64
 1   sepal_width   150 non-null    float64
 2   petal_length  150 non-null    float64
 3   petal_width   150 non-null    float64
 4   species_name  150 non-null    object 
 5   species_no    150 non-null    int64  
dtypes: float64(4), int64(1), object(1)
memory usage: 7.2+ KB


In [6]:
df.describe()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species_no
count,150.0,150.0,150.0,150.0,150.0
mean,5.843333,3.054,3.758667,1.198667,1.0
std,0.828066,0.433594,1.76442,0.763161,0.819232
min,4.3,2.0,1.0,0.1,0.0
25%,5.1,2.8,1.6,0.3,0.0
50%,5.8,3.0,4.35,1.3,1.0
75%,6.4,3.3,5.1,1.8,2.0
max,7.9,4.4,6.9,2.5,2.0


In [7]:
df.species_name.value_counts()

Iris-setosa        50
Iris-versicolor    50
Iris-virginica     50
Name: species_name, dtype: int64

In [8]:
df.species_no.value_counts()

0    50
1    50
2    50
Name: species_no, dtype: int64

## Preprocess the data

In [9]:
FLOWER_TYPES = df.species_name.unique()
df = df.drop("species_name", axis=1)

In [10]:
X = df.drop("species_no", axis=1)
y = df.species_no
len(X), len(y)

(150, 150)

In [11]:
X.head()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width
0,5.1,3.5,1.4,0.2
1,4.9,3.0,1.4,0.2
2,4.7,3.2,1.3,0.2
3,4.6,3.1,1.5,0.2
4,5.0,3.6,1.4,0.2


In [12]:
y.head()

0    0
1    0
2    0
3    0
4    0
Name: species_no, dtype: int64

In [13]:
# Import the train_test_split() function from sckit-learn
from sklearn.model_selection import train_test_split

# Use the train_test_function() function to split the dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

In [14]:
# Check the length of the training and testing sets
print(f"The length of the training dataset = {len(X_train)}")
print(f"The length of the test dataset = {len(X_test)}")

The length of the training dataset = 120
The length of the test dataset = 30


## Prepare Machine Learning Model

As our problem is a classification problem, we are going to use the `RandomForestClassifier()` model.


In [15]:
# Import the model from scikit-learn library
from sklearn.ensemble import RandomForestClassifier

# Make the model
model = RandomForestClassifier()

# Train the model
model.fit(X_train, y_train)

# Evaluate the model
model.score(X_test, y_test)

1.0

In [16]:
model.predict(X_test)

array([1, 0, 2, 1, 0, 0, 2, 1, 2, 0, 2, 0, 0, 0, 0, 1, 1, 1, 0, 1, 2, 2,
       2, 2, 2, 2, 2, 0, 0, 0])

In [17]:
FLOWER_TYPES

array(['Iris-setosa', 'Iris-versicolor', 'Iris-virginica'], dtype=object)