<a href="https://colab.research.google.com/github/SobiyaChainee/ML_PROJECT/blob/main/FlowerSpeciesSpecification.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Importing libraries

In [None]:
import pandas as pd
import numpy as np


We use iris dataset for this project. It included scikit-learn in the datasets module. We can load it by calling the load_iris function.

In [None]:
from sklearn.datasets import load_iris
iris_dataset = load_iris()

This iris object that is returned by load_iris is a Bunch object.

In [None]:
print("Keys of iris_dataset: \n{}".format(iris_dataset.keys()))

Keys of iris_dataset: 
dict_keys(['data', 'target', 'target_names', 'DESCR', 'feature_names', 'filename'])


In [None]:
print(iris_dataset['DESCR'][:193] + "\n...")

.. _iris_dataset:

Iris plants dataset
--------------------

**Data Set Characteristics:**

    :Number of Instances: 150 (50 in each of three classes)
    :Number of Attributes: 4 numeric, pre
...


The value of the key target_names is an array of strings, containing the species of flower that we want to predict

In [None]:
print("Target names: {}".format(iris_dataset['target_names']))

Target names: ['setosa' 'versicolor' 'virginica']


The value of featurs_names is a list of strings, giving the description of each feature it includes the 'sepal length', 'sepal width', 'petal length' and the petal width all in centimeters.

In [None]:
print("Feature names: \n{}".format(iris_dataset['feature_names']))

Feature names: 
['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']


The data that we are going to use itself contained in the target and data fields. The data contains numeric measurements of sepal length, sepal width, petal length and petal width in a Numpy array:

In [None]:
print("Type of data: {}". format(type(iris_dataset['data'])))

Type of data: <class 'numpy.ndarray'>


The rows in this data array correspond to the flowers, while the columns represent the four measurements that were taken for each flower:

In [None]:
print("shape of data: {}". format(iris_dataset['data'].shape))

shape of data: (150, 4)


In [None]:
print("First five rows of data: \n{}".format(iris_dataset['data'][:5]))

First five rows of data: 
[[5.1 3.5 1.4 0.2]
 [4.9 3.  1.4 0.2]
 [4.7 3.2 1.3 0.2]
 [4.6 3.1 1.5 0.2]
 [5.  3.6 1.4 0.2]]


The key target array contains the species of each of the flowers that were measured, also as a Numpy array: 

In [None]:
print("Type of target: {}".format(type(iris_dataset['target'])))

Type of target: <class 'numpy.ndarray'>


In [None]:
print("Shape of target: {}".format(iris_dataset['target'].shape))

Shape of target: (150,)


In [None]:
print("Target:\n{}".format(iris_dataset['target']))

Target:
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2
 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
 2 2]


In the above line the meaning of the numbers are given by the iris[target_names]array: 0 means setosa, 1 means versicolor, 2 means virginica.

**Training and testing**

The output of the train_test_split function is X_train, X_test, y_train, y_test which are all Numpy array, X_train contains 75% of the rows of the datsets, and X_test contains the remaining 25%

In [None]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(
                                            iris_dataset['data'], iris_dataset['target'], random_state=0)

In [None]:
print("X_train shape: {}".format(X_train.shape))
print("y_train shape: {}".format(y_train.shape))
print("X_test shape: {}".format(X_test.shape))
print("y_test shape: {}".format(X_test.shape))

X_train shape: (112, 4)
y_train shape: (112,)
X_test shape: (38, 4)
y_test shape: (38, 4)


This is kn neighbore machine learning model

In [None]:
from sklearn.neighbors import KNeighborsClassifier
knn = KNeighborsClassifier(n_neighbors=1)

Train the model for training data

In [None]:
knn.fit(X_train, y_train)

KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
                     metric_params=None, n_jobs=None, n_neighbors=1, p=2,
                     weights='uniform')

**Making Predictions**

In [None]:
X_new = np.array([[5, 2.9, 1, 0.2]])
print("X_new.shape: {}".format(X_new.shape))

X_new.shape: (1, 4)


To make prediction we call the predict method of the knn object:

In [None]:
prediction = knn.predict(X_new)

print("Prediction: {}".format(prediction))
print("Predicted target name: {}".format(iris_dataset['target_names'][prediction]))

Prediction: [0]
Predicted target name: ['setosa']


Evaluating the Model

In [None]:
y_pred = knn.predict(X_test)
print("Test set prediction:\n{}".format(y_pred))

Test set prediction:
[2 1 0 2 0 2 0 1 1 1 2 1 1 1 1 0 1 1 0 0 2 1 0 0 2 0 0 1 1 0 2 1 0 2 2 1 0
 2]


In [None]:
print("Test set score: {}".format(np.mean(y_pred == y_test)))

Test set score: 0.9736842105263158
