# Iris Flower Classification

## About the Project

This project focuses on classifying the Iris flower species using various machine learning models. The goal is to accurately predict the species of an Iris flower based on its features.

## About the Dataset

The Iris dataset is a classic dataset in the machine learning community. It contains 150 samples of Iris flowers, with 50 samples each from three species: Iris-setosa, Iris-versicolour, and Iris-virginica. Each sample includes four features:
- Sepal length
- Sepal width
- Petal length
- Petal width

## Import Libraries

To start, we need to import the necessary libraries:

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

import sklearn
from sklearn.datasets import load_iris
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier, AdaBoostClassifier
from sklearn.svm import SVC
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
from sklearn.model_selection import RandomizedSearchCV
from catboost import CatBoostClassifier
from xgboost import XGBClassifier

## Load Dataset

We load the Iris dataset from `sklearn`:

In [2]:
iris = load_iris(as_frame=True)
data = iris['data']
data['target'] = iris['target']
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 150 entries, 0 to 149
Data columns (total 5 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   sepal length (cm)  150 non-null    float64
 1   sepal width (cm)   150 non-null    float64
 2   petal length (cm)  150 non-null    float64
 3   petal width (cm)   150 non-null    float64
 4   target             150 non-null    int32  
dtypes: float64(4), int32(1)
memory usage: 5.4 KB


## Splitting Features and Target

Separate the features and the target variable:


In [3]:
X = data.drop(columns = ['target'],axis = 1)
y = data['target']
X

Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm)
0,5.1,3.5,1.4,0.2
1,4.9,3.0,1.4,0.2
2,4.7,3.2,1.3,0.2
3,4.6,3.1,1.5,0.2
4,5.0,3.6,1.4,0.2
...,...,...,...,...
145,6.7,3.0,5.2,2.3
146,6.3,2.5,5.0,1.9
147,6.5,3.0,5.2,2.0
148,6.2,3.4,5.4,2.3


In [4]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.2,random_state=42)
X_train.shape, X_test.shape

((120, 4), (30, 4))

## Create Function to Evaluate Models

We create a function to evaluate the models using accuracy, precision, recall, and F1 score:

In [5]:
def evaluate_model(true, predicted):
    accuracy = accuracy_score(true, predicted)
    precision = precision_score(true, predicted, average='weighted')
    recall = recall_score(true, predicted, average='weighted')
    f1 = f1_score(true, predicted, average='weighted')
    return accuracy, precision, recall, f1

## Models Created, Ran, and Evaluated

We create, run, and evaluate the following models:

- **K-Nearest Neighbors**
- **Decision Tree**
- **Random Forest**
- **AdaBoost**
- **Support Vector Machine**
- **Logistic Regression**
- **CatBoost**
- **XGBoost**

Example for evaluating a model and showing the performance of each model:

In [6]:
models = {
    "Logistic Regression": LogisticRegression(),
    "K-Neighbors Classifier": KNeighborsClassifier(),
    "Decision Tree": DecisionTreeClassifier(),
    "Random Forest Classifier": RandomForestClassifier(),
    "SVC": SVC(),
    "XGBClassifier": XGBClassifier(),
    "CatBoost Classifier": CatBoostClassifier(verbose=False),
    "AdaBoost Classifier": AdaBoostClassifier()
}

model_list = []
accuracy_list = []

for i in range(len(list(models))):
    model = list(models.values())[i]
    model.fit(X_train, y_train)  # Train model

    # Make predictions
    y_train_pred = model.predict(X_train)
    y_test_pred = model.predict(X_test)
    
    # Evaluate Train and Test dataset
    model_train_accuracy, model_train_precision, model_train_recall, model_train_f1 = evaluate_model(y_train, y_train_pred)
    model_test_accuracy, model_test_precision, model_test_recall, model_test_f1 = evaluate_model(y_test, y_test_pred)

    print(list(models.keys())[i])
    model_list.append(list(models.keys())[i])
    
    print('Model performance for Training set')
    print("- Accuracy: {:.4f}".format(model_train_accuracy))
    print("- Precision: {:.4f}".format(model_train_precision))
    print("- Recall: {:.4f}".format(model_train_recall))
    print("- F1 Score: {:.4f}".format(model_train_f1))

    print('----------------------------------')
    
    print('Model performance for Test set')
    print("- Accuracy: {:.4f}".format(model_test_accuracy))
    print("- Precision: {:.4f}".format(model_test_precision))
    print("- Recall: {:.4f}".format(model_test_recall))
    print("- F1 Score: {:.4f}".format(model_test_f1))
    accuracy_list.append(model_test_accuracy)
    
    print('='*35)
    print('\n')

Logistic Regression
Model performance for Training set
- Accuracy: 0.9750
- Precision: 0.9768
- Recall: 0.9750
- F1 Score: 0.9750
----------------------------------
Model performance for Test set
- Accuracy: 1.0000
- Precision: 1.0000
- Recall: 1.0000
- F1 Score: 1.0000


K-Neighbors Classifier
Model performance for Training set
- Accuracy: 0.9667
- Precision: 0.9675
- Recall: 0.9667
- F1 Score: 0.9667
----------------------------------
Model performance for Test set
- Accuracy: 1.0000
- Precision: 1.0000
- Recall: 1.0000
- F1 Score: 1.0000


Decision Tree
Model performance for Training set
- Accuracy: 1.0000
- Precision: 1.0000
- Recall: 1.0000
- F1 Score: 1.0000
----------------------------------
Model performance for Test set
- Accuracy: 1.0000
- Precision: 1.0000
- Recall: 1.0000
- F1 Score: 1.0000


Random Forest Classifier
Model performance for Training set
- Accuracy: 1.0000
- Precision: 1.0000
- Recall: 1.0000
- F1 Score: 1.0000
----------------------------------
Model performa

In [7]:
# Function to get user input
def get_user_input():
    print("Please enter the following features of the Iris flower:")
    sepal_length = float(input("Sepal length (cm): "))
    sepal_width = float(input("Sepal width (cm): "))
    petal_length = float(input("Petal length (cm): "))
    petal_width = float(input("Petal width (cm): "))
    return np.array([[sepal_length, sepal_width, petal_length, petal_width]])

# Function to choose model
def choose_model():
    print("Choose a model to make the prediction:")
    for i, model_name in enumerate(models.keys(), 1):
        print(f"{i}. {model_name}")
    choice = int(input("Enter the number corresponding to the model: "))
    model_name = list(models.keys())[choice - 1]
    return models[model_name], model_name

# Main function to make prediction

X_new = get_user_input()
model, model_name = choose_model()
prediction = model.predict(X_new)
species = {0: 'Iris-setosa', 1: 'Iris-versicolour', 2: 'Iris-virginica'}
print(f"The predicted species using {model_name} is: {species[prediction[0]]}")


Please enter the following features of the Iris flower:


Sepal length (cm):  5.1
Sepal width (cm):  3.5
Petal length (cm):  1.4
Petal width (cm):  0.2


Choose a model to make the prediction:
1. Logistic Regression
2. K-Neighbors Classifier
3. Decision Tree
4. Random Forest Classifier
5. SVC
6. XGBClassifier
7. CatBoost Classifier
8. AdaBoost Classifier


Enter the number corresponding to the model:  3


The predicted species using Decision Tree is: Iris-setosa




## Conclusion

This project demonstrates the use of various machine learning models to classify Iris flower species. Each model was evaluated based on accuracy, precision, recall, and F1 score. The results show the effectiveness of different algorithms on this classification task.

## Acknowledgements

This project was completed as part of the Let's Grow More Virtual Internship Program. Special thanks to the Let's Grow More team for providing this opportunity.

---

