## 1 Import Libraries

In [1]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score, precision_score, recall_score, classification_report


## 2 Load and Inspect Dataset

In [2]:
# Load dataset
iris = pd.read_csv(r"C:\Users\DIANA\Downloads\Iris.csv")

# Display first few rows
iris.head()


Unnamed: 0,Id,SepalLengthCm,SepalWidthCm,PetalLengthCm,PetalWidthCm,Species
0,1,5.1,3.5,1.4,0.2,Iris-setosa
1,2,4.9,3.0,1.4,0.2,Iris-setosa
2,3,4.7,3.2,1.3,0.2,Iris-setosa
3,4,4.6,3.1,1.5,0.2,Iris-setosa
4,5,5.0,3.6,1.4,0.2,Iris-setosa


## 3 Data Preprocessing

In [3]:
# Drop the 'Id' column as it does not contribute to prediction
iris1 = iris.drop(columns=["Id"])

iris1

Unnamed: 0,SepalLengthCm,SepalWidthCm,PetalLengthCm,PetalWidthCm,Species
0,5.1,3.5,1.4,0.2,Iris-setosa
1,4.9,3.0,1.4,0.2,Iris-setosa
2,4.7,3.2,1.3,0.2,Iris-setosa
3,4.6,3.1,1.5,0.2,Iris-setosa
4,5.0,3.6,1.4,0.2,Iris-setosa
...,...,...,...,...,...
145,6.7,3.0,5.2,2.3,Iris-virginica
146,6.3,2.5,5.0,1.9,Iris-virginica
147,6.5,3.0,5.2,2.0,Iris-virginica
148,6.2,3.4,5.4,2.3,Iris-virginica


In [4]:
# Encode the categorical target variable 'Species'
label_encoder = LabelEncoder()
iris1["Species"] = label_encoder.fit_transform(iris1["Species"])



In [5]:
# Split data into features and target
X = iris1.drop("Species", axis=1)
y = iris1["Species"]



In [6]:
# Train-test split (80% training, 20% testing)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


## 4 Train Decision Tree Model

In [7]:
# Initialize and train Decision Tree Classifier
dt_model = DecisionTreeClassifier(random_state=42)
dt_model.fit(X_train, y_train)


DecisionTreeClassifier(random_state=42)

## 5 Evaluate Model

In [8]:
# Make predictions
y_pred = dt_model.predict(X_test)

# Compute metrics
accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred, average="weighted")
recall = recall_score(y_test, y_pred, average="weighted")

# Display results
print("Accuracy:", accuracy)
print("Precision:", precision)
print("Recall:", recall)
print("\nClassification Report:\n", classification_report(y_test, y_pred, target_names=label_encoder.classes_))


Accuracy: 1.0
Precision: 1.0
Recall: 1.0

Classification Report:
                  precision    recall  f1-score   support

    Iris-setosa       1.00      1.00      1.00        10
Iris-versicolor       1.00      1.00      1.00         9
 Iris-virginica       1.00      1.00      1.00        11

       accuracy                           1.00        30
      macro avg       1.00      1.00      1.00        30
   weighted avg       1.00      1.00      1.00        30



## 6 Results Interpretation

The classification results show that the model achieved perfect performance across all evaluation metrics. The accuracy score of 1.0 (100%) indicates that the model correctly classified every flower in the test dataset. Similarly, the precision, recall, and F1-score values of 1.0 for all three Iris species setosa, versicolor, and virginica, confirm that the model made no false predictions.

These results suggest that the Decision Tree Classifier was able to fully capture the relationships between the input features (sepal length, sepal width, petal length, and petal width) and the target variable (species). The perfect scores can be attributed to the distinct separability of the Iris dataset, which is a small and well-structured dataset commonly used for testing classification algorithms.

In real-world scenarios, however, achieving 100% accuracy is rare and may indicate overfitting when working with complex or noisy datasets. But in this case, the results are acceptable because the Iris dataset is clean, balanced, and easy for tree-based models to learn perfectly.