# The Iris Dataset - Machine Learning Project

Data Set Information:

This is perhaps the best known database to be found in the pattern recognition literature. Fisher's paper is a classic in the field and is referenced frequently to this day. (See Duda & Hart, for example.) The data set contains 3 classes of 50 instances each, where each class refers to a type of iris plant. One class is linearly separable from the other 2; the latter are NOT linearly separable from each other.

Importing libraries

In [2]:
import numpy as np
import pandas as pd

Importing the iris dataset from sklearn database

In [3]:
from sklearn.datasets import load_iris
iris_dataset=load_iris()

Here i am printing the keys of iris_dataset

In [6]:
print("Keys of iris_dataset: \n{}".format(iris_dataset.keys()))

Keys of iris_dataset: 
dict_keys(['data', 'target', 'target_names', 'DESCR', 'feature_names', 'filename'])


Printing the description of the iris dataset

In [8]:
print(iris_dataset['DESCR'][:193] + '\n...')

.. _iris_dataset:

Iris plants dataset
--------------------

**Data Set Characteristics:**

    :Number of Instances: 150 (50 in each of three classes)
    :Number of Attributes: 4 numeric, pre
...


Printing the target names which is the species of plants

In [10]:
print("Target Names: {}".format(iris_dataset['target_names']))

Target Names: ['setosa' 'versicolor' 'virginica']


Printing the feature names or attributes

In [13]:
print("Feature Names: {}".format(iris_dataset['feature_names']))

Feature Names: ['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']


Printing the type of the data key

In [16]:
print("Type of data: {}".format(type(iris_dataset['data'])))

Type of data: <class 'numpy.ndarray'>


Printing the shape of the ndarray

In [19]:
print("Shape of data: {}".format(iris_dataset['data'].shape))

Shape of data: (150, 4)


Printing the first 5 columns of the data 

In [24]:
print("First five rows of the data: \n{}".format(iris_dataset['data'][:5]))

First five rows of the data: 
[[5.1 3.5 1.4 0.2]
 [4.9 3.  1.4 0.2]
 [4.7 3.2 1.3 0.2]
 [4.6 3.1 1.5 0.2]
 [5.  3.6 1.4 0.2]]


Printing the type of target key

In [26]:
print("Type of target: {}".format(type(iris_dataset['target'])))

Type of target: <class 'numpy.ndarray'>


Printing the shape of target array

In [28]:
print("Shape of target: {}".format(iris_dataset['target'].shape))

Shape of target: (150,)


Printing the values of target, where 0 means Setosa, 1 means Versicolor, and 2 means Virginica

In [31]:
print("Target: \n{}".format(iris_dataset['target']))

Target: 
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2
 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
 2 2]


Model Selection, train, test and split data function

In [37]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(iris_dataset['data'],iris_dataset['target'],random_state=0) 

Printing the shape of each train and test data

In [39]:
print("X_train shape: {}".format(X_train.shape))
print("y_train shape: {}".format(y_train.shape))
print("X_test shape: {}".format(X_test.shape))
print("y_test shape: {}".format(y_test.shape))

X_train shape: (112, 4)
y_train shape: (112,)
X_test shape: (38, 4)
y_test shape: (38,)


Importing K-nearest Neighbors Classification algorithm and giving the value of K as 1. Model is ready 

In [41]:
from sklearn.neighbors import KNeighborsClassifier
knn = KNeighborsClassifier(n_neighbors=1)

Fitting the values in training dataset into the KNN model

In [43]:
knn.fit(X_train,y_train)

KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
                     metric_params=None, n_jobs=None, n_neighbors=1, p=2,
                     weights='uniform')

Now i am making some predictions using this model to test it

In [44]:
X_new = np.array([[5,2.9,1,0.2]])
print("X_new shape: {}".format(X_new.shape))

X_new shape: (1, 4)


Predicting the targeted name of the above data points using KNN model

In [47]:
prediction = knn.predict(X_new)
print("Prediction: {}".format(prediction))
print("Predicted targeted name: {}".format(iris_dataset['target_names'][prediction]))

Prediction: [0]
Predicted targeted name: ['setosa']


Evaluating the model with test data

In [53]:
y_pred = knn.predict(X_test)
print("Test set predictions:\n {}".format(y_pred))

Test set predictions:
 [2 1 0 2 0 2 0 1 1 1 2 1 1 1 1 0 1 1 0 0 2 1 0 0 2 0 0 1 1 0 2 1 0 2 2 1 0
 2]


Finding the accuracy or score of the model

In [54]:
print("Test Score: {}".format(np.mean(y_pred==y_test)))

Test Score: 0.9736842105263158
