# Problem Description
Use sklearn.datasets iris flower dataset to train your model using logistic regression. You need to figure out the accuracy of your model and use that to predict different samples in your test dataset. In iris dataset there are 150 samples containing following features,

1. Sepal Length
2. Sepal Width
3. Petal length
4. Petal width
# Using above 4 features you will classify a flower in one of the three categories,
1. Setosa
2. Versicolour
3. Virginica

In [114]:
# Import libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.datasets import load_iris

In [115]:
#load dataset
iris=load_iris()
dir(iris)

['DESCR',
 'data',
 'data_module',
 'feature_names',
 'filename',
 'frame',
 'target',
 'target_names']

In [116]:
#restructure the data to a dataFrame(Table format)
df=pd.DataFrame(iris.data,columns=iris.feature_names)
df

Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm)
0,5.1,3.5,1.4,0.2
1,4.9,3.0,1.4,0.2
2,4.7,3.2,1.3,0.2
3,4.6,3.1,1.5,0.2
4,5.0,3.6,1.4,0.2
...,...,...,...,...
145,6.7,3.0,5.2,2.3
146,6.3,2.5,5.0,1.9
147,6.5,3.0,5.2,2.0
148,6.2,3.4,5.4,2.3


In [117]:
#add target column
df['target']=iris.target
df

Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm),target
0,5.1,3.5,1.4,0.2,0
1,4.9,3.0,1.4,0.2,0
2,4.7,3.2,1.3,0.2,0
3,4.6,3.1,1.5,0.2,0
4,5.0,3.6,1.4,0.2,0
...,...,...,...,...,...
145,6.7,3.0,5.2,2.3,2
146,6.3,2.5,5.0,1.9,2
147,6.5,3.0,5.2,2.0,2
148,6.2,3.4,5.4,2.3,2


In [118]:
iris.feature_names

['sepal length (cm)',
 'sepal width (cm)',
 'petal length (cm)',
 'petal width (cm)']

In [119]:
iris.target_names

array(['setosa', 'versicolor', 'virginica'], dtype='<U10')

In [120]:
# Data Exploration

In [121]:
df.shape

(150, 5)

In [122]:
df.columns

Index(['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)',
       'petal width (cm)', 'target'],
      dtype='object')

In [123]:
df.head()

Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm),target
0,5.1,3.5,1.4,0.2,0
1,4.9,3.0,1.4,0.2,0
2,4.7,3.2,1.3,0.2,0
3,4.6,3.1,1.5,0.2,0
4,5.0,3.6,1.4,0.2,0


In [124]:
#datatypes
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 150 entries, 0 to 149
Data columns (total 5 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   sepal length (cm)  150 non-null    float64
 1   sepal width (cm)   150 non-null    float64
 2   petal length (cm)  150 non-null    float64
 3   petal width (cm)   150 non-null    float64
 4   target             150 non-null    int64  
dtypes: float64(4), int64(1)
memory usage: 6.0 KB


In [125]:
#statistical analysis
df.describe()

Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm),target
count,150.0,150.0,150.0,150.0,150.0
mean,5.843333,3.057333,3.758,1.199333,1.0
std,0.828066,0.435866,1.765298,0.762238,0.819232
min,4.3,2.0,1.0,0.1,0.0
25%,5.1,2.8,1.6,0.3,0.0
50%,5.8,3.0,4.35,1.3,1.0
75%,6.4,3.3,5.1,1.8,2.0
max,7.9,4.4,6.9,2.5,2.0


In [126]:
#unique values in each columns/features
df.nunique()

sepal length (cm)    35
sepal width (cm)     23
petal length (cm)    43
petal width (cm)     22
target                3
dtype: int64

# Data Preprocessing

In [128]:
#Check for null values
df.isna().sum()

sepal length (cm)    0
sepal width (cm)     0
petal length (cm)    0
petal width (cm)     0
target               0
dtype: int64

# Create a ML Model using Different Classification Allgorithms

In [130]:
#import library
from sklearn.model_selection import train_test_split

In [131]:
# identify the x(independent variables) and y(dependant variables)
x=df.drop('target',axis=1)
x.head(2)

Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm)
0,5.1,3.5,1.4,0.2
1,4.9,3.0,1.4,0.2


In [132]:
y=df.target
y

0      0
1      0
2      0
3      0
4      0
      ..
145    2
146    2
147    2
148    2
149    2
Name: target, Length: 150, dtype: int64

In [133]:
#split the data into train and test datasets
x_train,x_test,y_train,y_test=train_test_split(x,y,train_size=0.7)

# 1.Using Logistic Regression

In [135]:
#import the library for LogisticRegresiion
from sklearn.linear_model import LogisticRegression
logre=LogisticRegression()
logre

In [136]:
logre.fit(x_train,y_train)

In [137]:
logac=logre.score(x_test,y_test)
print("Accuracy of model:{0}%".format(logac*100))

Accuracy of model:97.77777777777777%


In [138]:
#predict the class of species

sle=float(input("Enter the sepal length (cm): "))
swi=float(input("Enter the sepal width (cm) : "))
ple=float(input("Enter the petal length (cm): "))
pwi=float(input("Enter the petal width (cm): "))

value=[[sle,swi,ple,pwi]]

result=logre.predict(value)
if result[0]==0:
    print('The species is Setosa')
elif result[0]==1:
    print('The species is Versicolr')
else:
    print('The species is Virginica')

Enter the sepal length (cm):  6.7
Enter the sepal width (cm) :  3
Enter the petal length (cm):  5.2
Enter the petal width (cm):  2.3


The species is Virginica




In [139]:
iris.target_names

array(['setosa', 'versicolor', 'virginica'], dtype='<U10')

# 2.Using K Nearest Neighbor (KNN)

In [141]:
#import library
from sklearn.neighbors import KNeighborsClassifier

In [142]:
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.3)

In [143]:
#training
kmodel=KNeighborsClassifier(n_neighbors=16) # lowest mean error
kmodel

In [144]:
kmodel.fit(x_train,y_train)

In [145]:
#accuracy
acc=kmodel.score(x_test,y_test)
print("Accuracy of model:{0}%".format(acc*100))

Accuracy of model:91.11111111111111%


In [146]:
#predict the class of species

sle=float(input("Enter the sepal length (cm): "))
swi=float(input("Enter the sepal width (cm) : "))
ple=float(input("Enter the petal length (cm): "))
pwi=float(input("Enter the petal width (cm): "))

value=[[sle,swi,ple,pwi]]

result=kmodel.predict(value)
if result[0]==0:
    print('The species is Setosa')
elif result[0]==1:
    print('The species is Versicolr')
else:
    print('The species is Virginica')

Enter the sepal length (cm):  5.1
Enter the sepal width (cm) :  3.5
Enter the petal length (cm):  1.4
Enter the petal width (cm):  0.2


The species is Setosa




# 3.Using Support Vector Machine (SVM)

In [148]:
#import libraries
from sklearn.svm import SVC

In [149]:
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.25)

In [150]:
svmodel=SVC()
svmodel.fit(x_train,y_train)

In [151]:
#accuracy
svacc=svmodel.score(x_test,y_test)
print("Accuracy of model:{0}%".format(svacc*100))

Accuracy of model:92.10526315789474%


In [152]:
#predict the class of species

sle=float(input("Enter the sepal length (cm): "))
swi=float(input("Enter the sepal width (cm) : "))
ple=float(input("Enter the petal length (cm): "))
pwi=float(input("Enter the petal width (cm): "))

value=[[sle,swi,ple,pwi]]

result=svmodel.predict(value)
if result[0]==0:
    print('The species is Setosa')
elif result[0]==1:
    print('The species is Versicolr')
else:
    print('The species is Virginica')

Enter the sepal length (cm):  6.7
Enter the sepal width (cm) :  3
Enter the petal length (cm):  5.2
Enter the petal width (cm):  2.3


The species is Virginica




# 4.Using Decision Tree

In [154]:
#import libraries
from sklearn.tree import DecisionTreeClassifier


In [155]:
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.25)

In [156]:
dtmodel=DecisionTreeClassifier()
dtmodel.fit(x_train,y_train)

In [157]:
#accuracy
dtacc=dtmodel.score(x_test,y_test)
print("Accuracy of model:{0}%".format(dtacc*100))

Accuracy of model:94.73684210526315%


In [158]:
#predict the class of species

sle=float(input("Enter the sepal length (cm): "))
swi=float(input("Enter the sepal width (cm) : "))
ple=float(input("Enter the petal length (cm): "))
pwi=float(input("Enter the petal width (cm): "))

value=[[sle,swi,ple,pwi]]

result=dtmodel.predict(value)
if result[0]==0:
    print('The species is Setosa')
elif result[0]==1:
    print('The species is Versicolr')
else:
    print('The species is Virginica')

Enter the sepal length (cm):  5.1
Enter the sepal width (cm) :  3.5
Enter the petal length (cm):  1.4
Enter the petal width (cm):  0.2


The species is Setosa




# 5.Using Random forest

In [160]:
#import libraries
from sklearn.ensemble import RandomForestClassifier

In [161]:
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.25)

In [162]:
rfmodel=RandomForestClassifier()
rfmodel.fit(x_train,y_train)

In [163]:
#accuracy
rfacc=rfmodel.score(x_test,y_test)
print("Accuracy of model:{0}%".format(rfacc*100))

Accuracy of model:97.36842105263158%


In [164]:
#predict the class of species

sle=float(input("Enter the sepal length (cm): "))
swi=float(input("Enter the sepal width (cm) : "))
ple=float(input("Enter the petal length (cm): "))
pwi=float(input("Enter the petal width (cm): "))
value=[[sle,swi,ple,pwi]]
result=rfmodel.predict(value)
if result[0]==0:
    print('The species is Setosa')
elif result[0]==1:
    print('The species is Versicolr')
else:
    print('The species is Virginica')

Enter the sepal length (cm):  5.1
Enter the sepal width (cm) :  3.5
Enter the petal length (cm):  1.4
Enter the petal width (cm):  0.2


The species is Setosa




# 6.Using Naive Bayes

In [166]:
#import libraries
from sklearn.naive_bayes import GaussianNB

In [167]:
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.25)

In [168]:
nbmodel=GaussianNB()
nbmodel.fit(x_train,y_train)

In [169]:
#accuracy
nbacc=nbmodel.score(x_test,y_test)
print("Accuracy of model:{0}%".format(nbacc*100))

Accuracy of model:92.10526315789474%


In [170]:
#predict the class of species

sle=float(input("Enter the sepal length (cm): "))
swi=float(input("Enter the sepal width (cm) : "))
ple=float(input("Enter the petal length (cm): "))
pwi=float(input("Enter the petal width (cm): "))

value=[[sle,swi,ple,pwi]]

result=nbmodel.predict(value)
if result[0]==0:
    print('The species is Setosa')
elif result[0]==1:
    print('The species is Versicolr')
else:
    print('The species is Virginica')

Enter the sepal length (cm):  6.7
Enter the sepal width (cm) :  3
Enter the petal length (cm):  5.2
Enter the petal width (cm):  2.3


The species is Virginica




# Result Analysis

1. The Logistic Regression Model Has 97.77% accuracy in predicting the class successfully.

2. The KNN Model Has 91.11% accuracy in predicting the class successfully.

3. The SVM Model Has 92.10% accuracy in predicting the class successfully.

4. The Decision Tree Model has 94.73% accuracy in predicting the class successfully.

5. The Random Forest Model has 97.36% accuracy in predicting the class successfully.

6. The Naive Bayes Model has 92.10% accuracy in predicting the class successfully.

The Logistic Regression Model(97.77%) has the highest accuracy from overall ML Models.

