DecisionTreeClassifier: a custom built class implementation for the decision tree classifier logic.

src.shared imports: \
numpy as np \
pandas as pd \
matplotlib.pyplot as plt \
utils like load_dataset, calculating metrics, etc.

In [1]:
import sys
from pathlib import Path

project_root = str(Path.cwd().parents[1])

if project_root not in sys.path:
    sys.path.insert(0, project_root)

from src.classification.decision_tree import DecisionTreeClassifier
from src.shared import *

I chose a dataset suitable for classification. \
The moons dataset fits this criterion well and is perfect for demonstrating the decision tree classifier.

In [None]:
# --- 1. Load Data ---
df = load_dataset('moons')

# --- 2. Data Cleaning ---
# no need to clean as the dataset is already clean


# --- 3. Analysis ---
print("Statistics:")
print(df.describe())

print("\nClass Distribution:")
print(df.iloc[:, -1].value_counts())

print()

for col in df.columns[:-1]:
    range_0 = (df[df.iloc[:, -1] == 0][col].min(), df[df.iloc[:, -1] == 0][col].max())
    range_1 = (df[df.iloc[:, -1] == 1][col].min(), df[df.iloc[:, -1] == 1][col].max())
    
    print(f"Feature {col} Range - Class 0: {range_0}")
    print(f"Feature {col} Range - Class 1: {range_1}")

Statistics:
               X1          X2          X3       label
count  200.000000  200.000000  200.000000  200.000000
mean     0.499625    0.242255   -0.007900    0.500000
std      0.864680    0.505837    0.060073    0.501255
min     -1.082877   -0.671388   -0.099842    0.000000
25%     -0.066660   -0.182419   -0.059922    0.000000
50%      0.517917    0.215962   -0.010587    0.500000
75%      1.044591    0.696222    0.040636    1.000000
max      2.148725    1.168358    0.099933    1.000000

Class Distribution:
label
0    100
1    100
Name: count, dtype: int64

Feature X1 Range - Class 0: (np.float64(-1.0828773739554052), np.float64(1.1730813763478447))
Feature X1 Range - Class 1: (np.float64(-0.1884596119744169), np.float64(2.1487246149192694))
Feature X2 Range - Class 0: (np.float64(-0.1110730078386911), np.float64(1.1683578387775635))
Feature X2 Range - Class 1: (np.float64(-0.6713875239673394), np.float64(0.5720729725657026))
Feature X3 Range - Class 0: (np.float64(-0.09797243533