# Iris Species
#### Classify iris plants into three species in this classic dataset

#### Description:-
    The Iris dataset was used in R.A. Fisher's classic 1936 paper, The Use of Multiple Measurements in Taxonomic Problems, and can also be found on the UCI Machine Learning Repository.

    It includes three iris species with 50 samples each as well as some properties about each flower. One flower species is linearly separable from the other two, but the other two are not linearly separable from each other.

    
   #### The columns in this dataset are:
    > Id
    > SepalLengthCm
    > SepalWidthCm
    > PetalLengthCm
    > PetalWidthCm
    > Species

In [1]:
## Important some important library's
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

## 1st Task: Importing The DataSet  

In [8]:
df = pd.read_csv('Iris.csv')
df.head()

Unnamed: 0,Id,SepalLengthCm,SepalWidthCm,PetalLengthCm,PetalWidthCm,Species
0,1,5.1,3.5,1.4,0.2,Iris-setosa
1,2,4.9,3.0,1.4,0.2,Iris-setosa
2,3,4.7,3.2,1.3,0.2,Iris-setosa
3,4,4.6,3.1,1.5,0.2,Iris-setosa
4,5,5.0,3.6,1.4,0.2,Iris-setosa


In [9]:
# let see the shape of our data
df.shape

(150, 6)

In [11]:
# let split our dataset into data and target feature
X = df.drop(['Id','Species'], 1)
X.head()

Unnamed: 0,SepalLengthCm,SepalWidthCm,PetalLengthCm,PetalWidthCm
0,5.1,3.5,1.4,0.2
1,4.9,3.0,1.4,0.2
2,4.7,3.2,1.3,0.2
3,4.6,3.1,1.5,0.2
4,5.0,3.6,1.4,0.2


In [12]:
y = df['Species']
y.head()

0    Iris-setosa
1    Iris-setosa
2    Iris-setosa
3    Iris-setosa
4    Iris-setosa
Name: Species, dtype: object

In [14]:
y1 = y.copy()

In [17]:
# Using Factorize to give each type of species a number
y1 = pd.factorize(y1)[0]
y1

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2], dtype=int64)

In [19]:
Y = pd.DataFrame(y1)
Y.head()

Unnamed: 0,0
0,0
1,0
2,0
3,0
4,0


## 2nd Task: Store the split data into separate variables.

In [20]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,Y,test_size=0.2,random_state=15)

## 3rd Task: Create a naive Bayes model on the training dataset

In [21]:
from sklearn.naive_bayes import GaussianNB 
model = GaussianNB() 
model.fit(X_train, y_train)

  y = column_or_1d(y, warn=True)


GaussianNB(priors=None, var_smoothing=1e-09)

In [22]:
# let see our score
model.score(X_train, y_train)

0.9583333333333334

## 4th Task: Predicting out test dataset

In [23]:
prediction = model.predict(X_test)

In [25]:
prediction

array([0, 1, 1, 0, 0, 1, 2, 1, 1, 2, 2, 1, 1, 1, 2, 0, 1, 2, 0, 2, 1, 0,
       1, 1, 0, 0, 2, 2, 2, 1], dtype=int64)

In [27]:
y_predict = pd.DataFrame(prediction)
y_predict.head()

Unnamed: 0,0
0,0
1,1
2,1
3,0
4,0


## Let see how accurate our predictions are

In [29]:
y.value_counts()

Iris-setosa        50
Iris-versicolor    50
Iris-virginica     50
Name: Species, dtype: int64

In [32]:
y_test.head()

Unnamed: 0,0
6,0
61,1
90,1
30,0
31,0


In [33]:
a =y_test.rename(columns={0:'Test_Species'})
a.head()

Unnamed: 0,Test_Species
6,0
61,1
90,1
30,0
31,0


In [34]:
a["Test_Species"].replace({0: "Iris-setosa", 1: "Iris-versicolor", 2: "Iris-virginica"}, inplace=True)
a.head()

Unnamed: 0,Test_Species
6,Iris-setosa
61,Iris-versicolor
90,Iris-versicolor
30,Iris-setosa
31,Iris-setosa


In [36]:
b = y_predict.rename(columns={0:'Predicted_Species'})
b.head()

Unnamed: 0,Predicted_Species
0,0
1,1
2,1
3,0
4,0


In [37]:
b["Predicted_Species"].replace({0: "Iris-setosa", 1: "Iris-versicolor", 2: "Iris-virginica"}, inplace=True)
b.head()

Unnamed: 0,Predicted_Species
0,Iris-setosa
1,Iris-versicolor
2,Iris-versicolor
3,Iris-setosa
4,Iris-setosa


In [46]:
a1 = a.copy()
b1 = b.copy()

In [48]:
a1.reset_index(drop=True, inplace=True)
b1.reset_index(drop=True, inplace=True)

In [50]:
compare = pd.concat( [a1, b1], axis=1)
compare

Unnamed: 0,Test_Species,Predicted_Species
0,Iris-setosa,Iris-setosa
1,Iris-versicolor,Iris-versicolor
2,Iris-versicolor,Iris-versicolor
3,Iris-setosa,Iris-setosa
4,Iris-setosa,Iris-setosa
5,Iris-versicolor,Iris-versicolor
6,Iris-virginica,Iris-virginica
7,Iris-versicolor,Iris-versicolor
8,Iris-versicolor,Iris-versicolor
9,Iris-virginica,Iris-virginica


#### Conclusion: Our Predicted data is 100% accurate