# Supervised Learning Using Iris Dataset

This notebook contains sample code to predict the correct class using different machine learning methods. We will use the following:

- KNN
- Naive Bayes (Gaussian)
- Decision Tree

All models in this code follow the same pattern: 

1. we get a "black box" version of the model using sklearn
2. the black box model is trained by giving it a set of features (training dataframe) and their corresponding classes (y value)
3. the trained model is used to predict classes for a different set of features (test dataframe). It will return an array for the predicted classes
4. The correct classes (y value) and the predicted classes are put in one dataframe to show the results

## Importing Packages

In [1]:
# KNN
from sklearn.neighbors import KNeighborsClassifier

# Naive Bayes
from sklearn.naive_bayes import GaussianNB

# Decision Tree
from sklearn import tree

# Iris Dataset
from sklearn.datasets import load_iris

# DataFrame
import pandas as pd

## Preparing Iris Dataset

In [2]:
# Load iris dataset
iris = load_iris()
iris_df = pd.DataFrame(data=iris.data, columns=iris.feature_names)

iris_df.head()

Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm)
0,5.1,3.5,1.4,0.2
1,4.9,3.0,1.4,0.2
2,4.7,3.2,1.3,0.2
3,4.6,3.1,1.5,0.2
4,5.0,3.6,1.4,0.2


In [3]:
iris_df_with_label = iris_df.copy()
iris_df_with_label["label"] = iris.target

iris_df_with_label.sample(frac = 0.03)

Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm),label
16,5.4,3.9,1.3,0.4,0
47,4.6,3.2,1.4,0.2,0
29,4.7,3.2,1.6,0.2,0
149,5.9,3.0,5.1,1.8,2


In [4]:
# split training (75%) and test (25%) dataset

iris_train_df = iris_df.sample(frac = 0.75)
iris_train_y = iris.target[iris_train_df.index]

In [5]:
iris_test_df = iris_df.drop(iris_train_df.index)
iris_test_y = iris.target[iris_test_df.index]

## KNN

In [6]:
knn1 = KNeighborsClassifier(n_neighbors=1)
knn1.fit(iris_train_df, iris_train_y)

In [7]:
knn3 = KNeighborsClassifier(n_neighbors=3)
knn3.fit(iris_train_df, iris_train_y)

In [8]:
knn1_prediction = knn1.predict(iris_test_df)
knn3_prediction = knn3.predict(iris_test_df)

In [9]:
result_df = pd.DataFrame({"KNN1": knn1_prediction, "KNN3": knn3_prediction})
result_df.sample(frac = 0.2)

Unnamed: 0,KNN1,KNN3
30,2,2
17,1,1
26,1,1
16,1,1
22,1,1
10,0,0
34,2,2
1,0,0


## Naive Bayes

In [10]:
gnb = GaussianNB()
gnb.fit(iris_train_df, iris_train_y)

In [11]:
gnb_prediction = gnb.predict(iris_test_df)

In [12]:
result_df = pd.DataFrame({"Correct": iris_test_y, "GNaiveBayes": gnb_prediction})
result_df

Unnamed: 0,Correct,GNaiveBayes
0,0,0
1,0,0
2,0,0
3,0,0
4,0,0
5,0,0
6,0,0
7,0,0
8,0,0
9,0,0


## Decision Tree

In [13]:
clf = tree.DecisionTreeClassifier()
clf = clf.fit(iris_train_df, iris_train_y)

In [14]:
# Make predictions for a sample iris observation
dt_prediction = clf.predict(iris_test_df)
print("Predicted class:", dt_prediction)

Predicted class: [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 2 1 1 2 1 1 1 1 2 2 2 2 2 2 2 2 2
 2]


In [15]:
result_df = pd.DataFrame({"Correct": iris_test_y, "DecisionTree": dt_prediction})
result_df

Unnamed: 0,Correct,DecisionTree
0,0,0
1,0,0
2,0,0
3,0,0
4,0,0
5,0,0
6,0,0
7,0,0
8,0,0
9,0,0
