### IRIS flower dataset classification

### Problem Description
Use sklearn.datasets iris flower dataset to train your model using logistic regression. You need
to figure out the accuracy of your model and use that to predict different samples in your test
dataset. In iris dataset there are 150 samples containing following features,
1. Sepal Length
2. Sepal Width
3. Petal length
4. Petal width

Using above 4 features you will classify a flower in one of the three categories,
1. Setosa
2. Versicolour
3. Virginica

#### Dataset is taken from sklearn.datasets. 
There are 150 samples containing 3 classifications of iris flower based on some features. Need to frame a mode and figure out the accuracy and use to predict in which category a given sample falls into.

In [1]:
#importing libraries and loading dataset
import pandas as pd
import numpy as np
from sklearn.datasets import load_iris
i = load_iris()
from sklearn.model_selection import train_test_split

In [2]:
#loading dataset
dir(i)

['DESCR',
 'data',
 'data_module',
 'feature_names',
 'filename',
 'frame',
 'target',
 'target_names']

In [3]:
#creating dataFrame(Table format)
df=pd.DataFrame(i.data,columns=i.feature_names)
df

Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm)
0,5.1,3.5,1.4,0.2
1,4.9,3.0,1.4,0.2
2,4.7,3.2,1.3,0.2
3,4.6,3.1,1.5,0.2
4,5.0,3.6,1.4,0.2
...,...,...,...,...
145,6.7,3.0,5.2,2.3
146,6.3,2.5,5.0,1.9
147,6.5,3.0,5.2,2.0
148,6.2,3.4,5.4,2.3


In [4]:
i.feature_names

['sepal length (cm)',
 'sepal width (cm)',
 'petal length (cm)',
 'petal width (cm)']

In [5]:
i.target_names

array(['setosa', 'versicolor', 'virginica'], dtype='<U10')

In [6]:
#datatypes
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 150 entries, 0 to 149
Data columns (total 4 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   sepal length (cm)  150 non-null    float64
 1   sepal width (cm)   150 non-null    float64
 2   petal length (cm)  150 non-null    float64
 3   petal width (cm)   150 non-null    float64
dtypes: float64(4)
memory usage: 4.8 KB


In [7]:
#the dataset seems to be preprocessed so proceeding to ML model

### ML modelling and train/test 

In [8]:
#adding  target column
df['target']=i.target
df

Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm),target
0,5.1,3.5,1.4,0.2,0
1,4.9,3.0,1.4,0.2,0
2,4.7,3.2,1.3,0.2,0
3,4.6,3.1,1.5,0.2,0
4,5.0,3.6,1.4,0.2,0
...,...,...,...,...,...
145,6.7,3.0,5.2,2.3,2
146,6.3,2.5,5.0,1.9,2
147,6.5,3.0,5.2,2.0,2
148,6.2,3.4,5.4,2.3,2


In [9]:
df['target'].value_counts()

target
0    50
1    50
2    50
Name: count, dtype: int64

In [10]:
# identifying the x(independent variables) and y(dependant variables)
X = df.drop('target',axis=1)
Y = df['target']

X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size = 0.50)

In [11]:
len(X_train), len(X_test)

(75, 75)

In [12]:
len(Y_train), len(Y_test)

(75, 75)

#### Method 1: Using Logistic regression 

In [13]:
#import the library for LogisticRegresiion
from sklearn.linear_model import LogisticRegression
l_re=LogisticRegression()
l_re

In [14]:
l_re.fit(X_train, Y_train)

In [15]:
#finding accuracy of the model
log_ac=l_re.score(X_test,Y_test)
print("Accuracy of model:{0}%".format(log_ac*100))

Accuracy of model:93.33333333333333%


In [16]:
#predicting the sample
l_re.predict([[4.9,3.2,1.5,0.3]])



array([0])

In [17]:
#predicting the class of species

sle=float(input("Enter the sepal length (cm): "))
swi=float(input("Enter the sepal width (cm) : "))
ple=float(input("Enter the petal length (cm): "))
pwi=float(input("Enter the petal width (cm): "))

value=[[sle,swi,ple,pwi]]

result=l_re.predict(value)
if result[0]==0:
    print('The species is Setosa')
elif result[0]==1:
    print('The species is Versicolr')
else:
    print('The species is Virginica')

Enter the sepal length (cm): 5
Enter the sepal width (cm) : 3.3
Enter the petal length (cm): 1.5
Enter the petal width (cm): 0.4
The species is Setosa




In [20]:
i.target_names

array(['setosa', 'versicolor', 'virginica'], dtype='<U10')

In [21]:
#The model has accuracy of 96% and is able to predict samples correctly

#### Method 2: Using Decision Tree 

In [24]:
inputs = df.drop(['target'], axis='columns')
target = df.target

X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.25)

In [25]:
from sklearn import tree
model = tree.DecisionTreeClassifier()
model.fit(X_train, Y_train)

In [26]:
model.score(X_test,Y_test)

0.9736842105263158

In [27]:
model.predict([[5,3.6,1.4,0.2]])



array([0])

In [None]:
### The accuracy of model is 97.3% and the predictions are almost correct

#### Method 3: Random Forest Classification 

In [30]:
from sklearn.ensemble import RandomForestClassifier
model2 = RandomForestClassifier()
model2.fit(X_train, Y_train)

In [34]:
model2.score(X_test,Y_test)

0.9736842105263158

In [33]:
model2.predict([[6.5,3.0,5.2,2.0]])



array([2])

In [35]:
### The accuracy of model is 97.3% and the predictions are almost correct

#### Method 4: KNN Classification 

In [36]:
from sklearn.neighbors import KNeighborsClassifier
knn = KNeighborsClassifier(n_neighbors=10)
knn.fit(X_train, Y_train)

In [38]:
knn.score(X_test,Y_test)

0.9473684210526315

In [39]:
model.predict([[4.8,3.0,1.5,0.3]])



array([0])

In [40]:
### The accuracy of model is 94.7% and the predictions are almost correct

In [None]:
#--The End--