# Iris Species Classification using Decision Trees

In this notebook, we'll work with the famous Iris Species dataset to build a Decision Tree Classifier. We'll go through the following steps:

1. Data Loading and Exploration
2. Data Preprocessing
   - Handling missing values
   - Label encoding
3. Model Training
4. Model Evaluation

Let's begin by importing the necessary libraries.

In [1]:
# Import required libraries
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import accuracy_score, precision_score, recall_score, classification_report
import matplotlib.pyplot as plt
import seaborn as sns

## 1. Data Loading and Exploration

Let's load the Iris dataset from our data folder and take a look at its structure.

In [2]:
# Load the dataset
df = pd.read_csv('data/Iris.csv')

# Display the first few rows and basic information about the dataset
print("First few rows of the dataset:")
print(df.head())
print("\nDataset information:")
print(df.info())
print("\nChecking for missing values:")
print(df.isnull().sum())

First few rows of the dataset:
   Id  SepalLengthCm  SepalWidthCm  PetalLengthCm  PetalWidthCm      Species
0   1            5.1           3.5            1.4           0.2  Iris-setosa
1   2            4.9           3.0            1.4           0.2  Iris-setosa
2   3            4.7           3.2            1.3           0.2  Iris-setosa
3   4            4.6           3.1            1.5           0.2  Iris-setosa
4   5            5.0           3.6            1.4           0.2  Iris-setosa

Dataset information:
   Id  SepalLengthCm  SepalWidthCm  PetalLengthCm  PetalWidthCm      Species
0   1            5.1           3.5            1.4           0.2  Iris-setosa
1   2            4.9           3.0            1.4           0.2  Iris-setosa
2   3            4.7           3.2            1.3           0.2  Iris-setosa
3   4            4.6           3.1            1.5           0.2  Iris-setosa
4   5            5.0           3.6            1.4           0.2  Iris-setosa

Dataset information:
<

## 2. Data Preprocessing

Now we'll prepare our data for training:
1. Remove any unnecessary columns
2. Encode the target variable (Species)
3. Split the data into features (X) and target (y)
4. Split the data into training and testing sets

In [3]:
# Remove the Id column as it's not needed for prediction
df = df.drop('Id', axis=1)

# Separate features and target
X = df.drop('Species', axis=1)
y = df['Species']

# Encode the target variable
le = LabelEncoder()
y = le.fit_transform(y)

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

print("Training set shape:", X_train.shape)
print("Testing set shape:", X_test.shape)
print("\nUnique classes:", le.classes_)

Training set shape: (120, 4)
Testing set shape: (30, 4)

Unique classes: ['Iris-setosa' 'Iris-versicolor' 'Iris-virginica']


## 3. Model Training

Now we'll create and train our Decision Tree Classifier. We'll use default parameters for simplicity.

In [4]:
# Create and train the Decision Tree Classifier
dt_classifier = DecisionTreeClassifier(random_state=42)
dt_classifier.fit(X_train, y_train)

# Make predictions on the test set
y_pred = dt_classifier.predict(X_test)

## 4. Model Evaluation

Let's evaluate our model's performance using multiple metrics:
- Accuracy
- Precision
- Recall
- Classification Report

In [5]:
# Calculate evaluation metrics
accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred, average='weighted')
recall = recall_score(y_test, y_pred, average='weighted')

print(f"Accuracy: {accuracy:.4f}")
print(f"Precision: {precision:.4f}")
print(f"Recall: {recall:.4f}")

print("\nDetailed Classification Report:")
print(classification_report(y_test, y_pred, target_names=le.classes_))

Accuracy: 1.0000
Precision: 1.0000
Recall: 1.0000

Detailed Classification Report:
                 precision    recall  f1-score   support

    Iris-setosa       1.00      1.00      1.00        10
Iris-versicolor       1.00      1.00      1.00         9
 Iris-virginica       1.00      1.00      1.00        11

       accuracy                           1.00        30
      macro avg       1.00      1.00      1.00        30
   weighted avg       1.00      1.00      1.00        30

