# Iris Flower Species Classification Documentation

## Introduction

### The Iris Flower Species Classification project aims to develop a machine learning model capable of classifying Iris flowers into their respective species: setosa, versicolor, and virginica. This classification is based on measurements of the sepal and petal lengths and widths

# Data set
# Iris Dataset
# Source:
#### The Iris dataset is a well-known dataset available in the scikit-learn library and various other data sources.
# Description:
#### The dataset contains 150 samples of Iris flowers, with 50 samples from each of the three species (setosa, versicolor, virginica). It includes four features: sepal length, sepal width, petal length, and petal width, all measured in centimeters.
# Objective:
### The goal is to train a machine learning model to predict the species of an Iris flower based on its sepal and petal measurements.


In [18]:
# loading data_set using python laiberary
import pandas as pd 
df_1=pd.read_csv("IRIS.csv")
df_1.head()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
0,5.1,3.5,1.4,0.2,Iris-setosa
1,4.9,3.0,1.4,0.2,Iris-setosa
2,4.7,3.2,1.3,0.2,Iris-setosa
3,4.6,3.1,1.5,0.2,Iris-setosa
4,5.0,3.6,1.4,0.2,Iris-setosa


In [2]:
df_1.shape

(150, 5)

In [3]:
df_1["species"].unique()

array(['Iris-setosa', 'Iris-versicolor', 'Iris-virginica'], dtype=object)

In [19]:
#chaning categories of coloumn "species" like 'Iris-setosa', 'Iris-versicolor', 'Iris-virginica' into numeric categorical value and save it to the new coloumn ""species_numerical_categories""   
import numpy as np
condition=[
    df_1["species"]=="Iris-setosa",df_1["species"]=="Iris-versicolor",df_1["species"]=="Iris-virginica'"
    
]
category=[1,2,3]
df_1["species_numerical_categories"]=np.select(condition,category).astype('int64')
df_1["species_numerical_categories"].unique()

array([1, 2, 0], dtype=int64)

In [5]:
df_1

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species,species_numerical_categories
0,5.1,3.5,1.4,0.2,Iris-setosa,1
1,4.9,3.0,1.4,0.2,Iris-setosa,1
2,4.7,3.2,1.3,0.2,Iris-setosa,1
3,4.6,3.1,1.5,0.2,Iris-setosa,1
4,5.0,3.6,1.4,0.2,Iris-setosa,1
...,...,...,...,...,...,...
145,6.7,3.0,5.2,2.3,Iris-virginica,0
146,6.3,2.5,5.0,1.9,Iris-virginica,0
147,6.5,3.0,5.2,2.0,Iris-virginica,0
148,6.2,3.4,5.4,2.3,Iris-virginica,0


In [6]:
#droping the "species" coloumn 
df_1.drop("species", axis=1, inplace=True)

In [7]:
df_1.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 150 entries, 0 to 149
Data columns (total 5 columns):
 #   Column                        Non-Null Count  Dtype  
---  ------                        --------------  -----  
 0   sepal_length                  150 non-null    float64
 1   sepal_width                   150 non-null    float64
 2   petal_length                  150 non-null    float64
 3   petal_width                   150 non-null    float64
 4   species_numerical_categories  150 non-null    int64  
dtypes: float64(4), int64(1)
memory usage: 6.0 KB


In [11]:
df_1.head()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species_numerical_categories
0,5.1,3.5,1.4,0.2,1
1,4.9,3.0,1.4,0.2,1
2,4.7,3.2,1.3,0.2,1
3,4.6,3.1,1.5,0.2,1
4,5.0,3.6,1.4,0.2,1


In [8]:
# Import the train_test_split function from scikit-learn
from sklearn.model_selection import train_test_split
# Split the DataFrame df_1 into training and testing sets
# x_train: Training set of feature variables (80% of the data)
# x_test: Testing set of feature variables (20% of the data)
# y_train: Training set of target variables (80% of the data)
# y_test: Testing set of target variables (20% of the data)
# test_size=0.2: Specifies that 20% of the data should be reserved for testing
# random_state=2: Sets a random seed for reproducibility
x_train,x_test,y_train,y_test=train_test_split(df_1.iloc[:,0:4],df_1.iloc[:,-1],test_size=0.2,random_state=2)

In [13]:
# Import the KNeighborsClassifier from scikit-learn
from sklearn.neighbors import KNeighborsClassifier
# Create a K-nearest neighbors (KNN) classifier with 3 neighbors
knn=KNeighborsClassifier(n_neighbors=3)
# Train the KNN classifier on the training data
# x_train: Training set of feature variables
# y_train: Training set of target variables
knn.fit(x_train,y_train)

In [14]:
# Import the accuracy_score function from scikit-learn metrics
from sklearn.metrics import accuracy_score
# Use the trained KNN classifier (knn) to predict the labels for the testing data
y_predict= knn.predict(x_test)
# Calculate the accuracy of the model's predictions
# y_test: The actual target values from the testing set
# y_predict: The predicted target values from the model
accuracy=accuracy_score(y_test,y_predict)

In [15]:
accuracy

1.0