### Introduction

This notebook is designed to evaluate our custom implementation of the ID3 Decision Tree algorithm using the well-known Iris dataset.<br>
The goal is to assess the model's performance by measuring its accuracy. If the results show high performance metrics (e.g., accuracy above 0.90), we can consider the implementation robust and suitable for use in the remaining parts of this project.


### Import Statements

In [8]:
from sklearn.model_selection import train_test_split
import os
import sys
import pandas as pd

p = os.path.abspath(os.path.join(os.getcwd(), '..'))
sys.path.append(p)  

from DecisionTree.ID3Tree import ID3Tree
from DecisionTree.Ruleset import Ruleset
from DecisionTree.Bootstrap_Aggregating import Bagging

### Load the Iris Dataset

The Iris dataset, provided via Moodle, was downloaded and saved in the <b>datasets</b> directory.<br>
In the cell below, the dataset is loaded into the variable <b>iris</b>, and the ID column is removed since it only represents the row index and is not a relevant feature for classification.


In [9]:
iris_path = os.path.join(p, 'datasets', 'iris.csv')
iris = pd.read_csv(iris_path)
iris.drop(columns=['ID'], inplace=True)
iris

Unnamed: 0,sepallength,sepalwidth,petallength,petalwidth,class
0,5.1,3.5,1.4,0.2,Iris-setosa
1,4.9,3.0,1.4,0.2,Iris-setosa
2,4.7,3.2,1.3,0.2,Iris-setosa
3,4.6,3.1,1.5,0.2,Iris-setosa
4,5.0,3.6,1.4,0.2,Iris-setosa
...,...,...,...,...,...
145,6.7,3.0,5.2,2.3,Iris-virginica
146,6.3,2.5,5.0,1.9,Iris-virginica
147,6.5,3.0,5.2,2.0,Iris-virginica
148,6.2,3.4,5.4,2.3,Iris-virginica


### Data Structure

Our implementation of the ID3 Decision Tree is sensitive to the data types of the initialization variables. Therefore, in the following code cell, the data is transformed to meet the required structure:<br>

<ul>
<li><b>X</b> → A list of lists, where each inner list represents the feature values for one instance.</li>
<li><b>Y</b> → A list containing the class labels corresponding to each instance.</li>
<li><b>feature_names</b> → A list containing <b>only</b> the names of the training attributes.</li>
<li><b>type_map</b> → A dictionary in the format { attribute: variable type }, where the type is either <i>continuous</i> or <i>discrete</i>.</li>
</ul>


In [10]:

X, y = iris[iris.columns[:-1]], iris[iris.columns[-1]]
X = X.to_numpy().tolist()
y = y.to_numpy().tolist()
feature_names = iris.columns[:-1].tolist()
print("Feature names: ", feature_names)

Feature names:  ['sepallength', 'sepalwidth', 'petallength', 'petalwidth']


In [11]:
# Combine features and labels
data = [x + [label] for x, label in zip(X, y)]

# Split data
train_data, test_data = train_test_split(data, test_size=0.3, random_state=42)

# Define attribute types (all continuous in iris)
type_map = {attr: 'continuous' for attr in feature_names}

### Decision Tree Implementations

In this project, we explore three different implementations of a Decision Tree based on the ID3 algorithm. Each approach introduces a unique technique to improve performance or robustness:

<ul>
<li><b>ID3</b> → A standard implementation of the ID3 algorithm that builds a single decision tree using information gain to split features.</li>
<li><b>Ruleset</b> → A pruned version of the ID3 tree, aimed at reducing overfitting and improving the model’s ability to generalize to unseen data.</li>
<li><b>Bagging</b> → A Bootstrap Aggregation approach where 10 ID3 trees are trained on random subsets of the training data. Each tree makes a prediction, and the final label is assigned based on majority voting.</li>
</ul>


### ID3 Training & Testing



In [12]:
print("\nTesting ID3Tree:")
tree = ID3Tree(feature_names, train_data, default=0, type_map=type_map)  # Default to class 0
tree.train()

# Build rules once after training
rules = tree.build_rules()

correct = 0
for row in test_data:
    # Iterate through all rules to find a matching prediction
    pred_label = None
    for rule in rules:
        pred_label = rule.predict(row)
        if pred_label is not None:
            break  # Stop at the first matching rule

    # Use the default class if no rule matches
    if pred_label is None:
        pred_label = 0  # Default to class 0

    if pred_label == row[-1]:
        correct += 1

print(f"Accuracy: {correct / len(test_data):.2f}")


Testing ID3Tree:
Accuracy: 0.91


### Ruleset Training & Test

In [13]:
# Train and test Ruleset
print("\nTesting Ruleset:")
ruleset = Ruleset(feature_names, train_data, 0, type_map)  # Default to class 0
ruleset.train()
correct = 0
for row in test_data:
    pred, _ = ruleset.predict(row)
    if pred == row[-1]:
        correct += 1
print(f"Accuracy: {correct / len(test_data):.2f}")



Testing Ruleset:
Accuracy: 0.96


### Bagging Training & Test

In [14]:
print("\nTesting Bagging:")
bagging = Bagging(feature_names, train_data, 0, type_map)  # Default to class 0
bagging.train()
correct = 0
for row in test_data:
    pred, _ = bagging.predict(row)
    if pred == row[-1]:
        correct += 1
print(f"Accuracy: {correct / len(test_data):.2f}")


Testing Bagging:
Training classifier #1
Training classifier #2
Training classifier #3
Training classifier #4
Training classifier #5
Training classifier #6
Training classifier #7
Training classifier #8
Training classifier #9
Training classifier #10
Accuracy: 1.00
