<h1 style="font-family: 'poppins'; font-weight: bold; color: Green;">👨‍💻Author: Mr. Maimoon Amin</h1>

[![GitHub](https://img.shields.io/badge/GitHub-Profile-blue?style=for-the-badge&logo=github)](https://github.com/Maimoon-github)
[![Kaggle](https://img.shields.io/badge/Kaggle-Profile-blue?style=for-the-badge&logo=kaggle)](https://www.kaggle.com/maimoon7)
[![LinkedIn](https://img.shields.io/badge/LinkedIn-Profile-blue?style=for-the-badge&logo=linkedin)](https://www.linkedin.com/in/maimoon-amin-6a2aa4275/)
[![Email](https://img.shields.io/badge/Email-Contact%20Me-red?style=for-the-badge&logo=email)](ideal.rhel@gmail.com)

# **TASK** # **1**

# **The overview of Iris Flower Classification**

Iris flower has three species; setosa, versicolor, and virginica, which differs according to their measurements. Now assume that you have the measurements of the iris flowers according to their species, and here your task is to train a machine learning model that can learn from the measurements of the iris species and classify them Although the Scikit-learn library provides a dataset for iris flower classification, you can also download the same [dataset](https://www.kaggle.com/datasets/saurabh00007/iriscsv) from here for the task of iris flower classification with Machine Learning.

In [1]:
# The importent libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import classification_report, confusion_matrix

# ignore warnings
import warnings
warnings.filterwarnings("ignore")

## **Loading Dataset**

In [2]:
iris = pd.read_csv("Iris.csv")

In [3]:
iris.head()

Unnamed: 0,Id,SepalLengthCm,SepalWidthCm,PetalLengthCm,PetalWidthCm,Species
0,1,5.1,3.5,1.4,0.2,Iris-setosa
1,2,4.9,3.0,1.4,0.2,Iris-setosa
2,3,4.7,3.2,1.3,0.2,Iris-setosa
3,4,4.6,3.1,1.5,0.2,Iris-setosa
4,5,5.0,3.6,1.4,0.2,Iris-setosa


In [4]:
iris.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 150 entries, 0 to 149
Data columns (total 6 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   Id             150 non-null    int64  
 1   SepalLengthCm  150 non-null    float64
 2   SepalWidthCm   150 non-null    float64
 3   PetalLengthCm  150 non-null    float64
 4   PetalWidthCm   150 non-null    float64
 5   Species        150 non-null    object 
dtypes: float64(4), int64(1), object(1)
memory usage: 7.2+ KB


In [5]:
iris.describe()

Unnamed: 0,Id,SepalLengthCm,SepalWidthCm,PetalLengthCm,PetalWidthCm
count,150.0,150.0,150.0,150.0,150.0
mean,75.5,5.843333,3.054,3.758667,1.198667
std,43.445368,0.828066,0.433594,1.76442,0.763161
min,1.0,4.3,2.0,1.0,0.1
25%,38.25,5.1,2.8,1.6,0.3
50%,75.5,5.8,3.0,4.35,1.3
75%,112.75,6.4,3.3,5.1,1.8
max,150.0,7.9,4.4,6.9,2.5


This output provides a summary of the statistics for each feature in the Iris dataset. Here’s what each row means:

1. **Count**: Number of entries for each feature (150 samples).

2. **Mean**: The average value for each feature:
   - Sepal Length: 5.843 cm
   - Sepal Width: 3.054 cm
   - Petal Length: 3.758 cm
   - Petal Width: 1.199 cm

3. **Standard Deviation (std)**: How much the values vary from the mean:
   - Sepal Length has a std of 0.828, meaning values vary about 0.828 cm from the average.
   - Sepal Width, Petal Length, and Petal Width have std values of 0.434, 1.764, and 0.763, respectively.

4. **Minimum (min)**: The smallest value for each feature.

5. **25%, 50%, 75% (Quartiles)**:
   - 25% (1st quartile): 25% of values fall below this.
   - 50% (Median): The middle value for each feature.
   - 75% (3rd quartile): 75% of values fall below this.

6. **Maximum (max)**: The largest value for each feature.

This summary gives you an idea of the distribution and spread of each feature, helping you understand the dataset better. For example, petal length and width have more variation compared to sepal length and width.

Now let’s plot the data using a scatter plot which will plot the iris species according to the sepal length and sepal width:

In [6]:
import plotly.express as px
fig = px.scatter(iris, x="SepalWidthCm", y="SepalLengthCm", color="Species")
fig.show()

# **Iris Classification Model**

Now let’s train a machine learning model for the task of classifying iris species. Here, I will first split the data into training and test sets, and then I will use the KNN classification algorithm to train the iris classification model:

In [7]:
# split the data into features and target variable
X = iris.drop("Species", axis=1)
y = iris["Species"]

In [8]:
# encode the y variable
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
y = le.fit_transform(y)

In [9]:
# now split X and y variable into train test
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [10]:
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

In [11]:
classifier = KNeighborsClassifier(n_neighbors=5)
classifier.fit(X_train, y_train)

### Now let’s input a set of measurements of the iris flower and use the model to predict the iris species:

In [12]:
y_pred = classifier.predict(X_test)
print(f"Predicted species: {y_pred}")

Predicted species: [1 0 2 1 1 0 1 2 1 1 2 0 0 0 0 1 2 1 1 2 0 2 0 2 2 2 2 2 0 0]


In [13]:
print(confusion_matrix(y_test, y_pred))
print(classification_report(y_test, y_pred))

[[10  0  0]
 [ 0  9  0]
 [ 0  0 11]]
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        10
           1       1.00      1.00      1.00         9
           2       1.00      1.00      1.00        11

    accuracy                           1.00        30
   macro avg       1.00      1.00      1.00        30
weighted avg       1.00      1.00      1.00        30



This output shows the performance of your Iris Classification model on the test set. Here’s a breakdown:

1. **Confusion Matrix** (top part with [[10, 0, 0], [0, 9, 0], [0, 0, 11]]):
   - This matrix shows the number of correct and incorrect predictions for each class.
   - Each row represents the actual class, and each column represents the predicted class.
   - The diagonal values (10, 9, 11) represent correct predictions for each class, while zeros elsewhere mean there were no incorrect predictions.

2. **Classification Report** (bottom part):
   - **Precision**: The percentage of true positive predictions for each class (1.00 means 100% accurate).
   - **Recall**: The percentage of actual positives correctly identified by the model for each class.
   - **F1-score**: A balance between precision and recall, where 1.00 indicates perfect performance.
   - **Support**: The number of samples for each class in the test data (10, 9, and 11 samples, respectively).

3. **Overall Accuracy**: 1.00 or 100%, meaning the model predicted all test samples correctly.

Your model has achieved perfect classification on the test set, which is great! However, for real-world applications, you may want to double-check for potential overfitting if the model performs too perfectly on small datasets.

---

### Explanation:
- Replace `demo_data` with any measurements you’d like to test.
- We scale the `demo_data` using the `scaler` to match the training data.
- The classifier then predicts the species, and the label is decoded back to the original species name for easier interpretation.

This will give you the species prediction for any new set of flower measurements you input!

In [14]:
# Create demo data for prediction
# Example: sepal_length, sepal_width, petal_length, petal_width
# demo_data = [[5.1, 3.5, 1.4, 0.2]]  # Replace with any values you want to test
# Assuming the missing feature is an extra numerical feature, e.g., "flower_height"
demo_data = [[5.1, 3.5, 1.4, 0.2, 1.0]]  # Add a reasonable value for the missing feature

# Scale the demo data to match the model's training scale
demo_data_scaled = scaler.transform(demo_data)

# Predict the species for the demo data
predicted_species = classifier.predict(demo_data_scaled)

# Decode the prediction back to the original species label
predicted_species_label = le.inverse_transform(predicted_species)
print(f"Predicted species for demo data: {predicted_species_label[0]}")

Predicted species for demo data: Iris-versicolor
