# **IRIS FLOWER CLASSIFICATION**

**Problem Statement**:- 

• The Iris flower dataset consists of three species: setosa, versicolor, and virginica. These species can be distinguished based on their measurements. Now, imagine that you have the measurements of Iris flowers categorized by their respective species. Your objective is to train a machine learning model that can learn from these measurements and accurately classify the Iris flowers into their respective species.

• Use the Iris dataset to develop a model that can classify iris flowers into different species based on their sepal and petal measurements. This dataset is widely used for introductory classification tasks.

In [96]:
import pandas as pd
import warnings
warnings.filterwarnings('ignore')

In [97]:
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"

In [98]:
column_names = ['sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'species']

In [99]:
df = pd.read_csv(url, names=column_names)

In [100]:
print(df.head())   #showing first 5 rows of the data

   sepal_length  sepal_width  petal_length  petal_width      species
0           5.1          3.5           1.4          0.2  Iris-setosa
1           4.9          3.0           1.4          0.2  Iris-setosa
2           4.7          3.2           1.3          0.2  Iris-setosa
3           4.6          3.1           1.5          0.2  Iris-setosa
4           5.0          3.6           1.4          0.2  Iris-setosa


In [101]:
df.describe()   #exploring the data

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width
count,150.0,150.0,150.0,150.0
mean,5.843333,3.054,3.758667,1.198667
std,0.828066,0.433594,1.76442,0.763161
min,4.3,2.0,1.0,0.1
25%,5.1,2.8,1.6,0.3
50%,5.8,3.0,4.35,1.3
75%,6.4,3.3,5.1,1.8
max,7.9,4.4,6.9,2.5


In [102]:
df['species'].unique()  #showing unique species

In [103]:
df.info() #for checking null bhalues

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 150 entries, 0 to 149
Data columns (total 5 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   sepal_length  150 non-null    float64
 1   sepal_width   150 non-null    float64
 2   petal_length  150 non-null    float64
 3   petal_width   150 non-null    float64
 4   species       150 non-null    object 
dtypes: float64(4), object(1)
memory usage: 6.0+ KB


In [104]:
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import classification_report, accuracy_score

In [105]:
X = df.drop('species', axis=1)
y = df['species']

In [106]:
# spliting data in training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [107]:
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

Now we use KNN, which uses proximity to make classifications or predictions about the grouping of an individual data point.

In [108]:
knn = KNeighborsClassifier(n_neighbors=3)  # Initializing the KNN classifier
knn.fit(X_train_scaled, y_train)  # Training the model

In [109]:
y_pred = knn.predict(X_test_scaled)  # uses the trained model to make predictions on the test set.

In [110]:
# model evaluation

accuracy = accuracy_score(y_test, y_pred) #evaluate model performance using accuracy

classification_rep = classification_report(y_test, y_pred) #evaluate model performance using classification

In [111]:
print("Accuracy:", accuracy)
print("Classification Report:\n", classification_rep)

Accuracy: 1.0
Classification Report:
                  precision    recall  f1-score   support

    Iris-setosa       1.00      1.00      1.00        10
Iris-versicolor       1.00      1.00      1.00         9
 Iris-virginica       1.00      1.00      1.00        11

       accuracy                           1.00        30
      macro avg       1.00      1.00      1.00        30
   weighted avg       1.00      1.00      1.00        30



Our K-nearest neighbors (KNN) model has achieved perfect accuracy of 1.0 on the test dataset. This is a very good result, indicating that the model has learned the patterns in the data and can correctly classify all the Iris flowers in the test set. The classification report also shows that the model's precision, recall, and F1-score are all 1.0 for each class, which means the model is performing perfectly for all three species.