# *Iris Classification*


by Bhakti Khobrekar

# Overview of the Project

The Iris classification project involves the categorization of Iris flowers into different species based on their physical attributes. It utilizes the Iris dataset, introduced by Ronald Fisher in 1936, which contains measurements of sepal and petal length and width for three Iris species: setosa, versicolor, and virginica. The objective is to develop a machine learning model that can accurately classify new Iris flowers based on these features. Various algorithms like decision trees, k-nearest neighbors, support vector machines, and neural networks can be employed for this task. The project serves as a benchmark for evaluating different classification techniques, with its small, well-balanced dataset and separable classes making it a suitable starting point for learning and experimenting with machine learning algorithms.

In [1]:
#importing the required libraries
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression

In [3]:
#loading the dataset
data = pd.read_csv("IRIS.csv")

In [4]:
#EDA
data.head()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
0,5.1,3.5,1.4,0.2,Iris-setosa
1,4.9,3.0,1.4,0.2,Iris-setosa
2,4.7,3.2,1.3,0.2,Iris-setosa
3,4.6,3.1,1.5,0.2,Iris-setosa
4,5.0,3.6,1.4,0.2,Iris-setosa


In [5]:
data.tail()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
145,6.7,3.0,5.2,2.3,Iris-virginica
146,6.3,2.5,5.0,1.9,Iris-virginica
147,6.5,3.0,5.2,2.0,Iris-virginica
148,6.2,3.4,5.4,2.3,Iris-virginica
149,5.9,3.0,5.1,1.8,Iris-virginica


In [6]:
data.describe()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width
count,150.0,150.0,150.0,150.0
mean,5.843333,3.054,3.758667,1.198667
std,0.828066,0.433594,1.76442,0.763161
min,4.3,2.0,1.0,0.1
25%,5.1,2.8,1.6,0.3
50%,5.8,3.0,4.35,1.3
75%,6.4,3.3,5.1,1.8
max,7.9,4.4,6.9,2.5


In [7]:
#checking the null values
data.isnull().sum()

sepal_length    0
sepal_width     0
petal_length    0
petal_width     0
species         0
dtype: int64

In [8]:
# Checking unique species of the flower
data['species'].unique() 

array(['Iris-setosa', 'Iris-versicolor', 'Iris-virginica'], dtype=object)

In [9]:
# Seperating independent variables and target variables 
x = data[['sepal_length','sepal_width','petal_length','petal_width']]
y = data['species']

In [10]:
#splitting the data into training and testing phase
x_train, x_test, y_train, y_test = train_test_split(x,y,test_size=0.30,random_state=100)

In [11]:
x_train.shape

(105, 4)

In [12]:
x_test.shape

(45, 4)

In [13]:
y_train.shape

(105,)

In [14]:
y_test.shape

(45,)

In [15]:
#Logistic regression model
regressor = LogisticRegression()
regressor = regressor.fit(x_train.values,y_train)
y_train_pred = regressor.predict(x_train)
y_test_pred = regressor.predict(x_test)

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(


In [16]:
y_test_pred

array(['Iris-virginica', 'Iris-setosa', 'Iris-virginica', 'Iris-setosa',
       'Iris-virginica', 'Iris-virginica', 'Iris-setosa', 'Iris-setosa',
       'Iris-virginica', 'Iris-setosa', 'Iris-setosa', 'Iris-virginica',
       'Iris-setosa', 'Iris-setosa', 'Iris-virginica', 'Iris-versicolor',
       'Iris-versicolor', 'Iris-versicolor', 'Iris-virginica',
       'Iris-virginica', 'Iris-virginica', 'Iris-setosa',
       'Iris-virginica', 'Iris-setosa', 'Iris-versicolor',
       'Iris-virginica', 'Iris-versicolor', 'Iris-setosa',
       'Iris-versicolor', 'Iris-virginica', 'Iris-versicolor',
       'Iris-versicolor', 'Iris-versicolor', 'Iris-setosa', 'Iris-setosa',
       'Iris-versicolor', 'Iris-setosa', 'Iris-versicolor',
       'Iris-virginica', 'Iris-virginica', 'Iris-setosa',
       'Iris-versicolor', 'Iris-virginica', 'Iris-virginica',
       'Iris-setosa'], dtype=object)

In [18]:
from sklearn.metrics import confusion_matrix, accuracy_score, classification_report

print('confusion matrix: ',)
cfm=confusion_matrix(y_test,y_test_pred)
print(cfm)

print('classification report: ',)
print(classification_report(y_test,y_test_pred))

acc=accuracy_score(y_test,y_test_pred)
print('accuracy of the model: ',acc)

confusion matrix: 
[[16  0  0]
 [ 0 11  0]
 [ 0  1 17]]
classification report: 
                 precision    recall  f1-score   support

    Iris-setosa       1.00      1.00      1.00        16
Iris-versicolor       0.92      1.00      0.96        11
 Iris-virginica       1.00      0.94      0.97        18

       accuracy                           0.98        45
      macro avg       0.97      0.98      0.98        45
   weighted avg       0.98      0.98      0.98        45

accuracy of the model:  0.9777777777777777


# Result

The result of the Iris classification project is a trained machine learning model that can accurately classify new Iris flowers into their respective species (setosa, versicolor, and virginica) based on their measured attributes. The model is developed using various classification algorithms and trained on the Iris dataset. The performance of the model is evaluated using metrics such as accuracy, precision, recall, and F1 score, indicating how well it can classify Iris flowers. The successful outcome of the project showcases the effectiveness of machine learning in accurately categorizing Iris flowers based on their physical characteristics.

# Conclusion

The Iris classification project demonstrates the successful application of machine learning algorithms in accurately categorizing Iris flowers into different species based on their measured attributes. The project utilizes the Iris dataset, and various classification algorithms are employed to train a model that can predict the species of new, unseen Iris flowers. The project serves as a benchmark for evaluating classification techniques and showcases the effectiveness of machine learning in solving real-world classification problems. The outcome highlights the importance of feature analysis and pattern recognition in achieving accurate species classification.