# IRIS FLOWER CLASSIFICATION
 
Iris flower has three species; setosa, versicolor, and virginica, which differs according to their
measurements. Now assume that you have the measurements of the iris flowers according to
their species, and here your task is to train a machine learning model that can learn from the
measurements of the iris species and classify them.

#### Importing the libaries and data sheets

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
iris = pd.read_csv("IRIS.csv")

In [22]:
iris.head()

Unnamed: 0,Id,SepalLengthCm,SepalWidthCm,PetalLengthCm,PetalWidthCm,Species
0,1,5.1,3.5,1.4,0.2,Iris-setosa
1,2,4.9,3.0,1.4,0.2,Iris-setosa
2,3,4.7,3.2,1.3,0.2,Iris-setosa
3,4,4.6,3.1,1.5,0.2,Iris-setosa
4,5,5.0,3.6,1.4,0.2,Iris-setosa


In [23]:
# Since we don't need the ID row we can drop it
iris = iris.drop("Id", axis = 1)

In [24]:
iris.head()

Unnamed: 0,SepalLengthCm,SepalWidthCm,PetalLengthCm,PetalWidthCm,Species
0,5.1,3.5,1.4,0.2,Iris-setosa
1,4.9,3.0,1.4,0.2,Iris-setosa
2,4.7,3.2,1.3,0.2,Iris-setosa
3,4.6,3.1,1.5,0.2,Iris-setosa
4,5.0,3.6,1.4,0.2,Iris-setosa


In [25]:
# Basic stats
iris.describe()

Unnamed: 0,SepalLengthCm,SepalWidthCm,PetalLengthCm,PetalWidthCm
count,150.0,150.0,150.0,150.0
mean,5.843333,3.054,3.758667,1.198667
std,0.828066,0.433594,1.76442,0.763161
min,4.3,2.0,1.0,0.1
25%,5.1,2.8,1.6,0.3
50%,5.8,3.0,4.35,1.3
75%,6.4,3.3,5.1,1.8
max,7.9,4.4,6.9,2.5


In [26]:
# Basic info
iris.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 150 entries, 0 to 149
Data columns (total 5 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   SepalLengthCm  150 non-null    float64
 1   SepalWidthCm   150 non-null    float64
 2   PetalLengthCm  150 non-null    float64
 3   PetalWidthCm   150 non-null    float64
 4   Species        150 non-null    object 
dtypes: float64(4), object(1)
memory usage: 6.0+ KB


In [27]:
# Checking the different species of flower in the data set
print(iris["Species"].unique())

['Iris-setosa' 'Iris-versicolor' 'Iris-virginica']


## Pre- Processing the data

In [28]:
# Checking for null values
iris.isnull().sum()

SepalLengthCm    0
SepalWidthCm     0
PetalLengthCm    0
PetalWidthCm     0
Species          0
dtype: int64

## Visualising the data

In [29]:
# Ploting a scatter plot of iris species according to the sepal length and sepal width
import plotly.express as px
fig = px.scatter(iris, x="SepalWidthCm", y="SepalLengthCm", color="Species")
fig.show();

## Label Encoding

In [30]:
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()

iris["Species"] = le.fit_transform(iris["Species"])
iris.head()

Unnamed: 0,SepalLengthCm,SepalWidthCm,PetalLengthCm,PetalWidthCm,Species
0,5.1,3.5,1.4,0.2,0
1,4.9,3.0,1.4,0.2,0
2,4.7,3.2,1.3,0.2,0
3,4.6,3.1,1.5,0.2,0
4,5.0,3.6,1.4,0.2,0


## Model Training

In [31]:
# Defining X and Y and spliting the data in 70 - 30
X = iris.drop("Species", axis=1) # Storing data except from Y variable
Y = iris["Species"]
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(X, Y, 
                                                    test_size=0.3, 
                                                    random_state=0)

#### Testing which model suites the best

In [32]:
# Logistic Regression
from sklearn.linear_model import LogisticRegression
model = LogisticRegression()

model.fit(x_train, y_train)

print("Accuracy :", round((model.score(x_test, y_test)*100),2), "%")

Accuracy : 97.78 %


In [37]:
iris.head()

Unnamed: 0,SepalLengthCm,SepalWidthCm,PetalLengthCm,PetalWidthCm,Species
0,5.1,3.5,1.4,0.2,0
1,4.9,3.0,1.4,0.2,0
2,4.7,3.2,1.3,0.2,0
3,4.6,3.1,1.5,0.2,0
4,5.0,3.6,1.4,0.2,0


In [34]:
#Prediction ususing Logistic Regression
x_new = np.array([[5, 2.9, 1, 0.2]])
x_new_df = pd.DataFrame(x_new, columns=iris.columns[:-1])  

prediction = model.predict(x_new_df)
print("Prediction: {}".format(prediction))

Prediction: [0]


In [35]:
# KNN 
from sklearn.neighbors import KNeighborsClassifier
model = KNeighborsClassifier()

model.fit(x_train, y_train)

print("Accuracy :", round((model.score(x_test, y_test)*100),2), "%")

Accuracy : 97.78 %


In [36]:
# #Prediction ususing KNeighborsClassifier
x_new = np.array([[5, 2.9, 1, 0.2]])
x_new_df = pd.DataFrame(x_new, columns=iris.columns[:-1])
predictionKNN = model.predict(x_new_df)
print("Prediction: {}".format(prediction))

Prediction: [0]
