# Iris Flower Classification Dataset

The Iris flower classification dataset comprises 150 samples of Iris flowers, categorized into three species:

a) Iris setosa

b) Iris versicolor

c) Iris virginica



**The data set contains 3 classes of 50 instances each, where each class refers to a type of iris plant.One class is linearly separable from the other 2; the latter are NOT linearly separable from each other.**

Attribute Information:

a) Sepal length in cm

b) Sepal width in cm

c) Petal length in cm

d) Petal width in cm

e) Class


In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

# 1.Understanding the data

In [None]:
# Importing the library

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

import warnings
warnings.filterwarnings('ignore')

In [None]:
# Loading the dataset into pandas dataframe

df=pd.read_csv('/kaggle/input/iris/Iris.csv')

In [None]:
# Displaying the dataset

df

In [None]:
# Concise summary of DataFrame

df.info()

In [None]:
# Descripitive statistics of the dataset

df.describe()

In [None]:
# # Display the number of samples on each class

df['Species'].value_counts()

In [None]:
# Check for null values

df.isnull().sum()

In [None]:
df=df.drop(columns='Id')

# 2.Exploratory Data Analysis

In [None]:
# Countplot

plt.figure(figsize=(8,6))
sns.countplot(x='Species',data=df, palette='turbo')

In [None]:
# Histograms

df['SepalLengthCm'].hist()

In [None]:
df['SepalWidthCm'].hist(color='green')

In [None]:
df['PetalLengthCm'].hist(color='red')

In [None]:
df['PetalWidthCm'].hist(color='orange')

In [None]:
# Relationship between Species and Sepal length

plt.figure(figsize=(15,8))
sns.boxplot(x='Species',y='SepalLengthCm',data=df.sort_values('SepalLengthCm',ascending=False))

In [None]:
# Relationship between Species and Sepal width

df.plot(kind='scatter', x='SepalWidthCm',y='SepalLengthCm')


In [None]:
#  Relationship between sepal width and sepal length

sns.jointplot(x='SepalLengthCm',y='SepalWidthCm',data=df,size=5)

In [None]:
# Pairplot

sns.pairplot(df,hue='Species',size=3)

# 3.Correlation Matrix

In [None]:
# Compute the correlation matrix
df.corr()

In [None]:
# Displaying the correlation using Heatmap

corr=df.corr()
fig, ax = plt.subplots(figsize=(5,4))
sns.heatmap(corr, annot=True, ax=ax, cmap='cividis')

# 4.Model Training

In [None]:
# Splitting the dataset into training and testing sets

from sklearn.model_selection import train_test_split
X=df.drop(columns=['Species'])
Y=df['Species']

X_train,X_test,Y_train,Y_test=train_test_split(X,Y,test_size=0.3,random_state=1)

**4.1 Logistic Regression**

In [None]:
# Logistic Regression Model
from sklearn.linear_model import LogisticRegression
model1 = LogisticRegression()
model1.fit(X_train, Y_train)
acc_lr = model1.score(X_test, Y_test) * 100
print("Accuracy (Logistic Regression): ", acc_lr)

**4.2 K-Nearest  Neighbors**

In [None]:
from sklearn.neighbors import KNeighborsClassifier
model2=KNeighborsClassifier()
model2.fit(X_train,Y_train)
acc_knn = model2.score(X_test, Y_test)*100
print("Accuracy (KNN): ",acc_knn)

**4.3 Random Forest Classifier**

In [None]:
from sklearn.ensemble import RandomForestClassifier
model3=RandomForestClassifier()
model3.fit(X_train,Y_train)
acc_rfc=model3.score(X_test,Y_test)*100
print("Accuracy (Random Forest Classifier): ",acc_rfc)

**4.4 Decision Tree**

In [None]:
from sklearn.tree import DecisionTreeClassifier
model4=DecisionTreeClassifier()
model4.fit(X_train,Y_train)
acc_dct=model3.score(X_test,Y_test)*100
print("Accuracy (Decision Tree): ",acc_dct)

# 5.Final result

In [None]:
# Visualising the accuracy

plt.figure(figsize=(12,6))
model_acc = [acc_lr,acc_knn,acc_rfc,acc_dct]
model_name = ['Logistic Regression','KNN','Random Forest','Decision Tree']
plt.xlabel("Accuracy")
plt.ylabel("Models")
sns.barplot(x=model_acc, y=model_name, palette='plasma')

### Logistic Regression and KNN model gave the best performance with 97.77% accuracy