# Iris Flower Classification with Decision Tree

This project demonstrates how to use a Decision Tree Classifier to predict the species of Iris flowers based on their measurements.

## Project Overview

The goal of this project is to build a machine learning model that can accurately classify Iris flowers into one of three species: setosa, versicolor, or virginica, using the well-known Iris dataset.

The workflow includes:

1.  **Loading the Dataset**: The Iris dataset, which is included in the scikit-learn library, is loaded.
2.  **Data Preparation**: The data is split into features (measurements) and the target (species). The target variable is mapped from numerical representation to actual species names for better readability.
3.  **Data Splitting**: The dataset is divided into training and testing sets to train and evaluate the model.
4.  **Model Training**: A Decision Tree Classifier is trained on the training data.
5.  **Model Evaluation**: The trained model is evaluated on the test data using accuracy and a classification report.
6.  **Prediction**: The model is used to predict the species of a new, unseen Iris flower based on its measurements.

## Requirements

*   Python 3.6+
*   scikit-learn
*   pandas



In [None]:
# Import libraries
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import classification_report, accuracy_score
import pandas as pd

def run_iris_classification():
    """
    This function performs the complete workflow for the Iris flower classification project:
    1. Loads the Iris dataset.
    2. Splits the data into features (X) and target (y).
    3. Divides the dataset into training and testing subsets.
    4. Trains a Decision Tree  model.
    5. Evaluates the model's accuracy and generates a classification report.
    6. Predicts the species.
    """

    # 1. Load the Iris dataset
    iris = load_iris()

    # For clarity, we can create a pandas DataFrame
    # The features are the measurements, and the target is the species name.
    X = pd.DataFrame(iris.data, columns=iris.feature_names)
    y = pd.Series(iris.target, name='species')

    # Map target numbers to actual species names
    species_names = iris.target_names
    y = y.map({0: species_names[0], 1: species_names[1], 2: species_names[2]})

    print("--- Iris Dataset Loaded ---")
    print("First 5 rows of features (X):")
    print(X.head())
    print("\nFirst 5 rows of target (y):")
    print(y.head())
    print("-" * 29 + "\n")

    # 2. Split the data into training and testing sets
    # We'll use 80% of the data for training and 20% for testing.
    # `random_state` ensures that the split is the same every time we run the code.
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

    print("--- Data Split ---")
    print(f"Training set size: {len(X_train)} samples")
    print(f"Testing set size: {len(X_test)} samples")
    print("-" * 20 + "\n")

    # 3. Train the Decision Tree Classifier model
    # We create an instance of the classifier and fit it to our training data.
    # The model learns the patterns from the features (X_train) and the corresponding labels (y_train).
    model = DecisionTreeClassifier(random_state=42)
    model.fit(X_train, y_train)

    print("--- Model Training Complete ---")
    print("Decision Tree Classifier has been trained on the training data.")
    print("-" * 31 + "\n")

    # 4. Evaluate the model using the test dataset
    y_pred = model.predict(X_test)

    # Calculate the accuracy and generate a report.
    accuracy = accuracy_score(y_test, y_pred)
    report = classification_report(y_test, y_pred, target_names=species_names)

    print("--- Model Evaluation ---")
    print(f"Accuracy on Test Data: {accuracy:.2f}")
    print("\nClassification Report:")
    print(report)
    print("-" * 24 + "\n")

    # 5. Use the trained model to predict the species for new input values
    # This is the expected output as per the problem statement.
    new_flower_data = [[5.0, 3.4, 1.5, 0.2]]

    # Create a DataFrame for the new data with correct column names
    new_flower_df = pd.DataFrame(new_flower_data, columns=iris.feature_names)

    prediction = model.predict(new_flower_df)

    print("--- Prediction for New Flower ---")
    print(f"Input measurements: {new_flower_data[0]}")
    print(f"Predicted Species: {prediction[0]}")
    print("-" * 33)

# Run the main function when the script is executed
if __name__ == "__main__":
    run_iris_classification()

--- Iris Dataset Loaded ---
First 5 rows of features (X):
   sepal length (cm)  sepal width (cm)  petal length (cm)  petal width (cm)
0                5.1               3.5                1.4               0.2
1                4.9               3.0                1.4               0.2
2                4.7               3.2                1.3               0.2
3                4.6               3.1                1.5               0.2
4                5.0               3.6                1.4               0.2

First 5 rows of target (y):
0    setosa
1    setosa
2    setosa
3    setosa
4    setosa
Name: species, dtype: object
-----------------------------

--- Data Split ---
Training set size: 120 samples
Testing set size: 30 samples
--------------------

--- Model Training Complete ---
Decision Tree Classifier has been trained on the training data.
-------------------------------

--- Model Evaluation ---
Accuracy on Test Data: 1.00

Classification Report:
              precision    rec