# ShapeAI- Machine Learning with Python BootCamp

## Iris Dataset Classification

### Problem Statement: Using this dataset, the model has to predict what species does this flower belong to with respect to the sepal length, sepal width, petal lenght, and petal width.

### Author: Rabbiyah Sulman

### Step 1: Import the Libraries

In [1]:
import pandas as pd
import numpy as np

### Step 2: Exploring the Data

In [2]:
from sklearn.datasets import load_iris
iris_dataset = load_iris()

The Iris dataset is very similar to a dictionary. It contains keys and values:

In [3]:
print("Keys of iris_dataset: \n{}".format(iris_dataset.keys()))

Keys of iris_dataset: 
dict_keys(['data', 'target', 'frame', 'target_names', 'DESCR', 'feature_names', 'filename'])


The value of the key DESCR is a short descrription of the dataset. It contains:

In [4]:
print(iris_dataset['DESCR'][:193]+"\n....") #Using the slicing operator to get the first 193 characters

.. _iris_dataset:

Iris plants dataset
--------------------

**Data Set Characteristics:**

    :Number of Instances: 150 (50 in each of three classes)
    :Number of Attributes: 4 numeric, pre
....


In [5]:
val = iris_dataset['DESCR']
start_val = val[:200]
print (start_val + "\n....")

.. _iris_dataset:

Iris plants dataset
--------------------

**Data Set Characteristics:**

    :Number of Instances: 150 (50 in each of three classes)
    :Number of Attributes: 4 numeric, predictive
....


The target names are the labels.

In [6]:
print("Target names: {}".format(iris_dataset['target_names'])) 

Target names: ['setosa' 'versicolor' 'virginica']


These are the features of the labels.

In [7]:
print("Feature names: {}".format(iris_dataset['feature_names']))

Feature names: ['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']


In [8]:
print("Type of data: {}".format(type(iris_dataset['data'])))

Type of data: <class 'numpy.ndarray'>


In [9]:
print("Shape of data: {}".format(iris_dataset['data'].shape)) #We have 150 data points and 4 features

Shape of data: (150, 4)


In [10]:
print("First five rows: \n{}".format(iris_dataset['data'][:5]))

First five rows: 
[[5.1 3.5 1.4 0.2]
 [4.9 3.  1.4 0.2]
 [4.7 3.2 1.3 0.2]
 [4.6 3.1 1.5 0.2]
 [5.  3.6 1.4 0.2]]


In [11]:
print("Last five rows: \n{}".format(iris_dataset['data'][145:150]))

Last five rows: 
[[6.7 3.  5.2 2.3]
 [6.3 2.5 5.  1.9]
 [6.5 3.  5.2 2. ]
 [6.2 3.4 5.4 2.3]
 [5.9 3.  5.1 1.8]]


In [12]:
print("Type of target: {}".format(type(iris_dataset['target'])))

Type of target: <class 'numpy.ndarray'>


In [13]:
print("Shape of target: {}".format(iris_dataset['target'].shape))

Shape of target: (150,)


This is a one dimensional np-array.

In [14]:
print("Target: \n{}".format(iris_dataset['target']))

Target: 
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2
 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
 2 2]


In these target values:
0 means sentosa,
1 means versicolor, and
2 means virginca

### Step 3: Testing & Training data

In [15]:
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test= train_test_split(iris_dataset['data'],iris_dataset['target'],random_state=0)

Training data is 75-80% and Testing data is 25-20% of the dataset.

Sklearn is a machine learning framework.

In [16]:
print("X_train shape: {}".format(X_train.shape))
print("y_train shape: {}".format(y_train.shape))
print("X_test shape: {}".format(X_test.shape))
print("y_test shape: {}".format(y_test.shape))

X_train shape: (112, 4)
y_train shape: (112,)
X_test shape: (38, 4)
y_test shape: (38,)


We will be using the KNearest Neighbors Algorithm to build the model from the training data. We will be using new data points to make preditions as well. To build the model on the training set, we will call the fit method of the knn object with the parameters X_train and y_train.

In [17]:
from sklearn.neighbors import KNeighborsClassifier
knn= KNeighborsClassifier(n_neighbors=1)

In [18]:
knn.fit(X_train,y_train)

KNeighborsClassifier(n_neighbors=1)

### Step 4: Making Predictions

In [19]:
X_new = np.array([[5,2.9,1,0.2]])

In [20]:
print("X_new.shape: {}".format(X_new.shape))

X_new.shape: (1, 4)


To make a prediction, we call the predict method of the knn object:

In [21]:
prediction= knn.predict(X_new)

print("Prediction: {}".format(prediction))
print("Predicted target name: {}".format(iris_dataset['target_names'][prediction]))

Prediction: [0]
Predicted target name: ['setosa']


In [22]:
X1_new = np.array([[7,0.1,3,1.2]])

In [23]:
prediction= knn.predict(X1_new)

print("Prediction: {}".format(prediction))
print("Predicted target name: {}".format(iris_dataset['target_names'][prediction]))

Prediction: [1]
Predicted target name: ['versicolor']


In [24]:
X2_new = np.array([[1.2,0.2,5,2.6]])

In [25]:
prediction= knn.predict(X2_new)

print("Prediction: {}".format(prediction))
print("Predicted target name: {}".format(iris_dataset['target_names'][prediction]))

Prediction: [2]
Predicted target name: ['virginica']


### Step 5: Evaluating the Model

In [26]:
y_pred= knn.predict(X_test)
print("Test set predictions: \n{}".format(y_pred))

Test set predictions: 
[2 1 0 2 0 2 0 1 1 1 2 1 1 1 1 0 1 1 0 0 2 1 0 0 2 0 0 1 1 0 2 1 0 2 2 1 0
 2]


In [27]:
print("Test set score: {}".format(np.mean(y_pred==y_test)))

Test set score: 0.9736842105263158


The accuracy of our model is 97.36% which is the accuracy of the model we have created and the accuracy is pretty high.