# Decision Tree Usage Demonstration

This notebook demonstrates the usage of our custom Decision Tree implementation across all four combinations of input and output types:

1. **Real Input & Real Output** (Regression)
2. **Real Input & Discrete Output** (Classification)
3. **Discrete Input & Discrete Output** (Classification)
4. **Discrete Input & Real Output** (Regression)

For each case, we'll test both Information Gain and Gini Index criteria and display the trees using graphviz visualization.

## Import Libraries

In [5]:
"""
The current code given is for the Assignment 1.
You will be expected to use this to make trees for:
> discrete input, discrete output
> real input, real output
> real input, discrete output
> discrete input, real output
"""

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from tree.base import DecisionTree
from metrics import *

np.random.seed(42)
print("Libraries imported successfully!")

Libraries imported successfully!


## Test Case 1: Real Input and Real Output (Regression)

This case uses continuous features and continuous target values. We'll use RMSE and MAE as evaluation metrics.

In [6]:
print()
print("TEST CASE 1: REAL INPUT AND REAL OUTPUT (REGRESSION)")
print()

# Generate data
N = 30  # Number of samples
P = 5   # Number of features
X = pd.DataFrame(np.random.randn(N, P))
y = pd.Series(np.random.randn(N))

print(f"Data shape: X={X.shape}, y={y.shape}")
y = y.rename("Target")
print("data: ")
print(pd.concat([X, y], axis=1))
print()
print(f"Feature types: {[X[col].dtype for col in X.columns]}")
print(f"Target type: {y.dtype}")
print(f"Target range: [{y.min():.2f}, {y.max():.2f}]")


TEST CASE 1: REAL INPUT AND REAL OUTPUT (REGRESSION)

Data shape: X=(30, 5), y=(30,)
data: 
           0         1         2         3         4    Target
0   0.496714 -0.138264  0.647689  1.523030 -0.234153  0.250493
1  -0.234137  1.579213  0.767435 -0.469474  0.542560  0.346448
2  -0.463418 -0.465730  0.241962 -1.913280 -1.724918 -0.680025
3  -0.562288 -1.012831  0.314247 -0.908024 -1.412304  0.232254
4   1.465649 -0.225776  0.067528 -1.424748 -0.544383  0.293072
5   0.110923 -1.150994  0.375698 -0.600639 -0.291694 -0.714351
6  -0.601707  1.852278 -0.013497 -1.057711  0.822545  1.865775
7  -1.220844  0.208864 -1.959670 -1.328186  0.196861  0.473833
8   0.738467  0.171368 -0.115648 -0.301104 -1.478522 -1.191303
9  -0.719844 -0.460639  1.057122  0.343618 -1.763040  0.656554
10  0.324084 -0.385082 -0.676922  0.611676  1.031000 -0.974682
11  0.931280 -0.839218 -0.309212  0.331263  0.975545  0.787085
12 -0.479174 -0.185659 -1.106335 -1.196207  0.812526  1.158596
13  1.356240 -0.072010  1

In [7]:
# Test with Information Gain
print("\n--- Testing with Information Gain ---")
tree_ig = DecisionTree(criterion="information_gain")
tree_ig.fit(X, y)
y_hat_ig = tree_ig.predict(X)

print("Criteria: information_gain")
print(f"RMSE: {rmse(y_hat_ig, y):.4f}")
print(f"MAE: {mae(y_hat_ig, y):.4f}")

# Display tree visualization
print(f"\nDisplaying graphical tree for information_gain...")
try:
    graph = tree_ig.create_graph(
        feature_names=[f'feature_{i}' for i in range(len(X.columns))]
    )
    if not graph:
        print("Graphviz not available, falling back to text display:")
        tree_ig.plot()
except Exception as e:
    print(f"Error creating graph: {e}")
    print("Falling back to text display:")
    tree_ig.plot()


--- Testing with Information Gain ---
Criteria: information_gain
RMSE: 0.3601
MAE: 0.2407

Displaying graphical tree for information_gain...
Error creating graph: 'DecisionTree' object has no attribute 'create_graph'
Falling back to text display:
Tree visualization already exists: tree_real_input_real_output_information_gain.png (skipping regeneration)

Decision Tree Structure:
Root: 1 ≤ -1.194 (samples: 30)
│
├─ value: 2.720 (samples: 1)
│
└─ 4 ≤ 0.387 (samples: 29)
    ├─ 0 ≤ 1.508 (samples: 21)
    │   ├─ 4 ≤ -1.744 (samples: 19)
    │   │   ├─ 1 ≤ 0.813 (samples: 3)
    │   │   │   ├─ value: 0.535 (samples: 2)
    │   │   │   └─ value: 0.963 (samples: 1)
    │   │   └─ 2 ≤ 1.241 (samples: 16)
    │   │       ├─ value: -0.346 (samples: 15)
    │   │       └─ value: 0.822 (samples: 1)
    │   └─ 0 ≤ 1.870 (samples: 2)
    │       ├─ value: 1.454 (samples: 1)
    │       └─ value: 0.827 (samples: 1)
    └─ 0 ≤ -0.357 (samples: 8)
        ├─ 0 ≤ -0.490 (samples: 3)
        │   ├─ 0 ≤ 

In [8]:
# Test with Gini Index
print("\n--- Testing with Gini Index ---")
tree_gini = DecisionTree(criterion="gini_index")
tree_gini.fit(X, y)
y_hat_gini = tree_gini.predict(X)

print("Criteria: gini_index")
print(f"RMSE: {rmse(y_hat_gini, y):.4f}")
print(f"MAE: {mae(y_hat_gini, y):.4f}")

# Display tree visualization
print(f"\nDisplaying graphical tree for gini_index...")
try:
    graph = tree_gini.create_graph(
        feature_names=[f'feature_{i}' for i in range(len(X.columns))]
    )
    if not graph:
        print("Graphviz not available, falling back to text display:")
        tree_gini.plot()
except Exception as e:
    print(f"Error creating graph: {e}")
    print("Falling back to text display:")
    tree_gini.plot()

print("-" * 50)


--- Testing with Gini Index ---
Criteria: gini_index
RMSE: 0.3601
MAE: 0.2407

Displaying graphical tree for gini_index...
Error creating graph: 'DecisionTree' object has no attribute 'create_graph'
Falling back to text display:
Tree visualization saved as: tree_real_input_real_output_gini_index.png

Decision Tree Structure:
Root: 1 ≤ -1.194 (samples: 30)
│
├─ value: 2.720 (samples: 1)
│
└─ 4 ≤ 0.387 (samples: 29)
    ├─ 0 ≤ 1.508 (samples: 21)
    │   ├─ 4 ≤ -1.744 (samples: 19)
    │   │   ├─ 1 ≤ 0.813 (samples: 3)
    │   │   │   ├─ value: 0.535 (samples: 2)
    │   │   │   └─ value: 0.963 (samples: 1)
    │   │   └─ 2 ≤ 1.241 (samples: 16)
    │   │       ├─ value: -0.346 (samples: 15)
    │   │       └─ value: 0.822 (samples: 1)
    │   └─ 0 ≤ 1.870 (samples: 2)
    │       ├─ value: 1.454 (samples: 1)
    │       └─ value: 0.827 (samples: 1)
    └─ 0 ≤ -0.357 (samples: 8)
        ├─ 0 ≤ -0.490 (samples: 3)
        │   ├─ 0 ≤ -0.552 (samples: 2)
        │   │   ├─ value: 1.866 (s

## Test Case 2: Real Input and Discrete Output (Classification)

This case uses continuous features and categorical target values. We'll use accuracy, precision, and recall as evaluation metrics.

In [9]:
print()
print("TEST CASE 2: REAL INPUT AND DISCRETE OUTPUT (CLASSIFICATION)")
print()

# Generate data
N = 30
P = 5
X = pd.DataFrame(np.random.randn(N, P))
y = pd.Series(np.random.randint(P, size=N), dtype="category")

y = y.rename("Target")
print("data: ")
print(pd.concat([X, y], axis=1))
print()
print(f"Data shape: X={X.shape}, y={y.shape}")
print(f"Feature types: {[X[col].dtype for col in X.columns]}")
print(f"Target type: {y.dtype}")
print(f"Target classes: {sorted(y.unique())}")
print(f"Class distribution: {y.value_counts().sort_index().to_dict()}")


TEST CASE 2: REAL INPUT AND DISCRETE OUTPUT (CLASSIFICATION)

data: 
           0         1         2         3         4 Target
0   0.625667 -0.857158 -1.070892  0.482472 -0.223463      4
1   0.714000  0.473238 -0.072829 -0.846794 -1.514847      4
2  -0.446515  0.856399  0.214094 -1.245739  0.173181      1
3   0.385317 -0.883857  0.153725  0.058209 -1.142970      1
4   0.357787  0.560785  1.083051  1.053802 -1.377669      1
5  -0.937825  0.515035  0.513786  0.515048  3.852731      4
6   0.570891  1.135566  0.954002  0.651391 -0.315269      2
7   0.758969 -0.772825 -0.236819 -0.485364  0.081874      4
8   2.314659 -1.867265  0.686260 -1.612716 -0.471932      2
9   1.088951  0.064280 -1.077745 -0.715304  0.679598      2
10 -0.730367  0.216459  0.045572 -0.651600  2.143944      1
11  0.633919 -2.025143  0.186454 -0.661786  0.852433      3
12 -0.792521 -0.114736  0.504987  0.865755 -1.200296      0
13 -0.334501 -0.474945 -0.653329  1.765454  0.404982      1
14 -1.260884  0.917862  2.1221

In [10]:
# Test with Information Gain
print("\n--- Testing with Information Gain ---")
tree_ig = DecisionTree(criterion="information_gain")
tree_ig.fit(X, y)
y_hat_ig = tree_ig.predict(X)

print("Criteria: information_gain")
print(f"Accuracy: {accuracy(y_hat_ig, y):.4f}")
for cls in sorted(y.unique()):
    print(f"Precision (class {cls}): {precision(y_hat_ig, y, cls):.4f}")
    print(f"Recall (class {cls}): {recall(y_hat_ig, y, cls):.4f}")

# Display tree visualization
print(f"\nDisplaying graphical tree for information_gain...")
try:
    graph = tree_ig.create_graph(
        feature_names=[f'feature_{i}' for i in range(len(X.columns))]
    )
    if not graph:
        print("Graphviz not available, falling back to text display:")
        tree_ig.plot()
except Exception as e:
    print(f"Error creating graph: {e}")
    print("Falling back to text display:")
    tree_ig.plot()


--- Testing with Information Gain ---
Criteria: information_gain
Accuracy: 0.9000
Precision (class 0): 1.0000
Recall (class 0): 0.3333
Precision (class 1): 0.8182
Recall (class 1): 0.9000
Precision (class 2): 0.8333
Recall (class 2): 1.0000
Precision (class 3): 1.0000
Recall (class 3): 1.0000
Precision (class 4): 1.0000
Recall (class 4): 1.0000

Displaying graphical tree for information_gain...
Error creating graph: 'DecisionTree' object has no attribute 'create_graph'
Falling back to text display:
Tree visualization saved as: tree_real_input_discrete_output_information_gain.png

Decision Tree Structure:
Root: 0 ≤ 0.516 (samples: 30)
│
├─ 1 ≤ 1.008 (samples: 22)
│   ├─ 2 ≤ -0.450 (samples: 17)
│   │   ├─ 0 ≤ -0.631 (samples: 5)
│   │   │   ├─ class: 0 (samples: 1)
│   │   │   └─ 0 ≤ -0.271 (samples: 4)
│   │   │       ├─ class: 1 (samples: 1)
│   │   │       └─ class: 2 (samples: 3)
│   │   └─ 2 ≤ -0.132 (samples: 12)
│   │       ├─ class: 4 (samples: 1)
│   │       └─ 4 ≤ 2.998 (samp

In [11]:
# Test with Gini Index
print("\n--- Testing with Gini Index ---")
tree_gini = DecisionTree(criterion="gini_index")
tree_gini.fit(X, y)
y_hat_gini = tree_gini.predict(X)

print("Criteria: gini_index")
print(f"Accuracy: {accuracy(y_hat_gini, y):.4f}")
for cls in sorted(y.unique()):
    print(f"Precision (class {cls}): {precision(y_hat_gini, y, cls):.4f}")
    print(f"Recall (class {cls}): {recall(y_hat_gini, y, cls):.4f}")

# Display tree visualization
print(f"\nDisplaying graphical tree for gini_index...")
try:
    graph = tree_gini.create_graph(
        feature_names=[f'feature_{i}' for i in range(len(X.columns))]
    )
    if not graph:
        print("Graphviz not available, falling back to text display:")
        tree_gini.plot()
except Exception as e:
    print(f"Error creating graph: {e}")
    print("Falling back to text display:")
    tree_gini.plot()



--- Testing with Gini Index ---
Criteria: gini_index
Accuracy: 0.8667
Precision (class 0): 1.0000
Recall (class 0): 0.3333
Precision (class 1): 0.7143
Recall (class 1): 1.0000
Precision (class 2): 1.0000
Recall (class 2): 0.8000
Precision (class 3): 1.0000
Recall (class 3): 1.0000
Precision (class 4): 1.0000
Recall (class 4): 0.9000

Displaying graphical tree for gini_index...
Error creating graph: 'DecisionTree' object has no attribute 'create_graph'
Falling back to text display:
Tree visualization saved as: tree_real_input_discrete_output_gini_index.png

Decision Tree Structure:
Root: 0 ≤ 0.516 (samples: 30)
│
├─ 1 ≤ 1.008 (samples: 22)
│   ├─ 2 ≤ -0.736 (samples: 17)
│   │   ├─ 0 ≤ -0.415 (samples: 2)
│   │   │   ├─ class: 0 (samples: 1)
│   │   │   └─ class: 2 (samples: 1)
│   │   └─ 4 ≤ 2.998 (samples: 15)
│   │       ├─ 2 ≤ -0.132 (samples: 14)
│   │       │   ├─ class: 1 (samples: 4)
│   │       │   └─ class: 1 (samples: 10)
│   │       └─ class: 4 (samples: 1)
│   └─ 2 ≤ -0.65

## Test Case 3: Discrete Input and Discrete Output (Classification)

This case uses categorical features and categorical target values. We'll use accuracy, precision, and recall as evaluation metrics.

In [12]:
print()
print("TEST CASE 3: DISCRETE INPUT AND DISCRETE OUTPUT (CLASSIFICATION)")
print()

# Generate data
N = 30
P = 5
X = pd.DataFrame({i: pd.Series(np.random.randint(P, size=N), dtype="category") for i in range(5)})
y = pd.Series(np.random.randint(P, size=N), dtype="category")

y = y.rename("Target")
print("data: ")
print(pd.concat([X, y], axis=1))
print()
print(f"Data shape: X={X.shape}, y={y.shape}")
print(f"Feature types: {[X[col].dtype for col in X.columns]}")
print(f"Target type: {y.dtype}")
print(f"Target classes: {sorted(y.unique())}")
print(f"Class distribution: {y.value_counts().sort_index().to_dict()}")
print(f"Feature value ranges: {[sorted(X[col].unique()) for col in X.columns]}")


TEST CASE 3: DISCRETE INPUT AND DISCRETE OUTPUT (CLASSIFICATION)

data: 
    0  1  2  3  4 Target
0   0  3  3  0  4      0
1   3  0  3  1  4      0
2   0  0  4  1  4      4
3   0  0  1  1  4      4
4   4  4  3  2  1      1
5   3  1  3  4  1      2
6   3  3  1  4  2      2
7   3  4  1  0  0      3
8   2  4  3  0  4      1
9   4  4  1  1  0      1
10  3  4  3  0  0      1
11  2  4  3  2  2      1
12  1  2  4  4  4      2
13  1  3  0  1  4      2
14  2  4  3  0  3      1
15  2  3  2  2  0      3
16  4  2  0  2  0      0
17  4  2  0  0  1      0
18  1  3  0  4  3      3
19  3  0  4  0  1      1
20  1  1  3  1  1      2
21  3  0  4  0  1      0
22  3  0  3  2  2      4
23  4  0  4  0  2      4
24  0  4  4  4  1      3
25  0  2  2  3  3      1
26  2  0  4  0  0      0
27  4  2  1  4  3      1
28  3  3  2  4  4      0
29  0  1  4  2  2      3

Data shape: X=(30, 5), y=(30,)
Feature types: [CategoricalDtype(categories=[0, 1, 2, 3, 4], ordered=False, categories_dtype=int32), CategoricalDtype(c

In [13]:
# Test with Information Gain
print("\n--- Testing with Information Gain ---")
tree_ig = DecisionTree(criterion="information_gain")
tree_ig.fit(X, y)
y_hat_ig = tree_ig.predict(X)

print("Criteria: information_gain")
print(f"Accuracy: {accuracy(y_hat_ig, y):.4f}")
for cls in sorted(y.unique()):
    print(f"Precision (class {cls}): {precision(y_hat_ig, y, cls):.4f}")
    print(f"Recall (class {cls}): {recall(y_hat_ig, y, cls):.4f}")

# Display tree visualization
print(f"\nDisplaying graphical tree for information_gain...")
try:
    graph = tree_ig.create_graph(
        feature_names=[f'feature_{i}' for i in range(len(X.columns))]
    )
    if not graph:
        print("Graphviz not available, falling back to text display:")
        tree_ig.plot()
except Exception as e:
    print(f"Error creating graph: {e}")
    print("Falling back to text display:")
    tree_ig.plot()


--- Testing with Information Gain ---
Criteria: information_gain
Accuracy: 0.4333
Precision (class 0): 0.5000
Recall (class 0): 0.4286
Precision (class 1): 0.3636
Recall (class 1): 0.8889
Precision (class 2): 0.0000
Recall (class 2): 0.0000
Precision (class 3): 0.0000
Recall (class 3): 0.0000
Precision (class 4): 1.0000
Recall (class 4): 0.5000

Displaying graphical tree for information_gain...
Error creating graph: 'DecisionTree' object has no attribute 'create_graph'
Falling back to text display:
Tree visualization already exists: tree_real_input_discrete_output_information_gain.png (skipping regeneration)

Decision Tree Structure:
Root: 1 ≤ 0.000 (samples: 30)
│
├─ 4 ≤ 1.000 (samples: 8)
│   ├─ class: 0 (samples: 2)
│   └─ 0 ≤ 2.000 (samples: 6)
│       ├─ class: 0 (samples: 1)
│       └─ 0 ≤ 3.000 (samples: 5)
│           ├─ 3 ≤ 1.000 (samples: 2)
│           │   ├─ class: 0 (samples: 1)
│           │   └─ class: 4 (samples: 1)
│           └─ class: 4 (samples: 3)
│
└─ 1 ≤ 4.000 (

In [14]:
# Test with Gini Index
print("\n--- Testing with Gini Index ---")
tree_gini = DecisionTree(criterion="gini_index")
tree_gini.fit(X, y)
y_hat_gini = tree_gini.predict(X)

print("Criteria: gini_index")
print(f"Accuracy: {accuracy(y_hat_gini, y):.4f}")
for cls in sorted(y.unique()):
    print(f"Precision (class {cls}): {precision(y_hat_gini, y, cls):.4f}")
    print(f"Recall (class {cls}): {recall(y_hat_gini, y, cls):.4f}")

# Display tree visualization
print(f"\nDisplaying graphical tree for gini_index...")
try:
    graph = tree_gini.create_graph(
        feature_names=[f'feature_{i}' for i in range(len(X.columns))]
    )
    if not graph:
        print("Graphviz not available, falling back to text display:")
        tree_gini.plot()
except Exception as e:
    print(f"Error creating graph: {e}")
    print("Falling back to text display:")
    tree_gini.plot()



--- Testing with Gini Index ---
Criteria: gini_index
Accuracy: 0.3000
Precision (class 0): 0.0000
Recall (class 0): 0.0000
Precision (class 1): 0.3000
Recall (class 1): 1.0000
Precision (class 2): 0.0000
Recall (class 2): 0.0000
Precision (class 3): 0.0000
Recall (class 3): 0.0000
Precision (class 4): 0.0000
Recall (class 4): 0.0000

Displaying graphical tree for gini_index...
Error creating graph: 'DecisionTree' object has no attribute 'create_graph'
Falling back to text display:
Tree visualization already exists: tree_real_input_discrete_output_gini_index.png (skipping regeneration)

Decision Tree Structure:
Root: 1 ≤ 4.000 (samples: 30)
│
├─ 2 ≤ 3.000 (samples: 8)
│   ├─ class: 1 (samples: 5)
│   └─ 0 ≤ 4.000 (samples: 3)
│       ├─ class: 1 (samples: 1)
│       └─ class: 3 (samples: 2)
│
└─ 1 ≤ 0.000 (samples: 22)
    ├─ 4 ≤ 1.000 (samples: 8)
    │   ├─ class: 0 (samples: 2)
    │   └─ 0 ≤ 2.000 (samples: 6)
    │       ├─ class: 0 (samples: 1)
    │       └─ 0 ≤ 3.000 (samples: 

## Test Case 4: Discrete Input and Real Output (Regression)

This case uses categorical features and continuous target values. We'll use RMSE and MAE as evaluation metrics.

In [15]:
print()
print("TEST CASE 4: DISCRETE INPUT AND REAL OUTPUT (REGRESSION)")
print()

# Generate data
N = 30
P = 5
X = pd.DataFrame({i: pd.Series(np.random.randint(P, size=N), dtype="category") for i in range(5)})
y = pd.Series(np.random.randn(N))

y = y.rename("Target")
print("data: ")
print(pd.concat([X, y], axis=1))
print()
print(f"Data shape: X={X.shape}, y={y.shape}")
print(f"Feature types: {[X[col].dtype for col in X.columns]}")
print(f"Target type: {y.dtype}")
print(f"Target range: [{y.min():.2f}, {y.max():.2f}]")
print(f"Feature value ranges: {[sorted(X[col].unique()) for col in X.columns]}")


TEST CASE 4: DISCRETE INPUT AND REAL OUTPUT (REGRESSION)

data: 
    0  1  2  3  4    Target
0   3  4  2  1  0  1.006293
1   4  0  2  0  2 -0.576892
2   1  4  2  3  0  0.835692
3   4  0  2  2  1 -1.129707
4   1  3  3  1  2  0.529804
5   4  1  4  0  1  1.441569
6   2  1  1  4  2 -2.471645
7   2  0  1  3  4 -0.796895
8   2  1  2  1  3  0.577072
9   2  4  2  1  4 -0.203045
10  0  2  0  2  1  0.371146
11  2  0  4  2  3 -0.603985
12  3  1  3  4  2  0.086590
13  3  0  1  4  3 -0.155677
14  0  0  0  0  0  1.167782
15  2  2  0  0  3  0.254421
16  2  4  1  4  0  0.337603
17  2  0  3  4  3 -0.411877
18  4  1  0  3  0 -0.487606
19  1  3  0  2  1 -0.432558
20  4  0  4  0  4  0.394452
21  1  0  3  2  2 -0.420984
22  2  2  0  2  3  0.289775
23  2  4  3  4  4  2.075401
24  4  3  1  3  2  0.871125
25  4  1  2  1  2 -0.326024
26  1  3  0  3  0  1.201214
27  3  1  4  3  1 -0.408075
28  1  4  1  2  1 -2.038125
29  4  1  3  3  4 -1.008086

Data shape: X=(30, 5), y=(30,)
Feature types: [CategoricalDtype(c

In [16]:
# Test with Information Gain
print("\n--- Testing with Information Gain ---")
tree_ig = DecisionTree(criterion="information_gain")
tree_ig.fit(X, y)
y_hat_ig = tree_ig.predict(X)

print("Criteria: information_gain")
print(f"RMSE: {rmse(y_hat_ig, y):.4f}")
print(f"MAE: {mae(y_hat_ig, y):.4f}")

# Display tree visualization
print(f"\nDisplaying graphical tree for information_gain...")
try:
    graph = tree_ig.create_graph(
        feature_names=[f'feature_{i}' for i in range(len(X.columns))]
    )
    if not graph:
        print("Graphviz not available, falling back to text display:")
        tree_ig.plot()
except Exception as e:
    print(f"Error creating graph: {e}")
    print("Falling back to text display:")
    tree_ig.plot()


--- Testing with Information Gain ---
Criteria: information_gain
RMSE: 1.8072
MAE: 1.4146

Displaying graphical tree for information_gain...
Error creating graph: 'DecisionTree' object has no attribute 'create_graph'
Falling back to text display:
Tree visualization already exists: tree_real_input_real_output_information_gain.png (skipping regeneration)

Decision Tree Structure:
Root: 2 ≤ 1.000 (samples: 30)
│
├─ 1 ≤ 1.000 (samples: 6)
│   ├─ value: -2.472 (samples: 1)
│   └─ 0 ≤ 1.000 (samples: 5)
│       ├─ value: -2.038 (samples: 1)
│       └─ 1 ≤ 0.000 (samples: 4)
│           ├─ 0 ≤ 2.000 (samples: 2)
│           │   ├─ value: -0.797 (samples: 1)
│           │   └─ value: -0.156 (samples: 1)
│           └─ 0 ≤ 2.000 (samples: 2)
│               ├─ value: 0.338 (samples: 1)
│               └─ value: 0.871 (samples: 1)
│
└─ 1 ≤ 4.000 (samples: 24)
    ├─ 2 ≤ 2.000 (samples: 4)
    │   ├─ 0 ≤ 2.000 (samples: 3)
    │   │   ├─ value: -0.203 (samples: 1)
    │   │   └─ 0 ≤ 3.000 (sampl

In [17]:
# Test with Gini Index
print("\n--- Testing with Gini Index ---")
tree_gini = DecisionTree(criterion="gini_index")
tree_gini.fit(X, y)
y_hat_gini = tree_gini.predict(X)

print("Criteria: gini_index")
print(f"RMSE: {rmse(y_hat_gini, y):.4f}")
print(f"MAE: {mae(y_hat_gini, y):.4f}")

# Display tree visualization
print(f"\nDisplaying graphical tree for gini_index...")
try:
    graph = tree_gini.create_graph(
        feature_names=[f'feature_{i}' for i in range(len(X.columns))]
    )
    if not graph:
        print("Graphviz not available, falling back to text display:")
        tree_gini.plot()
except Exception as e:
    print(f"Error creating graph: {e}")
    print("Falling back to text display:")
    tree_gini.plot()



--- Testing with Gini Index ---
Criteria: gini_index
RMSE: 1.8072
MAE: 1.4146

Displaying graphical tree for gini_index...
Error creating graph: 'DecisionTree' object has no attribute 'create_graph'
Falling back to text display:
Tree visualization already exists: tree_real_input_real_output_gini_index.png (skipping regeneration)

Decision Tree Structure:
Root: 2 ≤ 1.000 (samples: 30)
│
├─ 1 ≤ 1.000 (samples: 6)
│   ├─ value: -2.472 (samples: 1)
│   └─ 0 ≤ 1.000 (samples: 5)
│       ├─ value: -2.038 (samples: 1)
│       └─ 1 ≤ 0.000 (samples: 4)
│           ├─ 0 ≤ 2.000 (samples: 2)
│           │   ├─ value: -0.797 (samples: 1)
│           │   └─ value: -0.156 (samples: 1)
│           └─ 0 ≤ 2.000 (samples: 2)
│               ├─ value: 0.338 (samples: 1)
│               └─ value: 0.871 (samples: 1)
│
└─ 1 ≤ 4.000 (samples: 24)
    ├─ 2 ≤ 2.000 (samples: 4)
    │   ├─ 0 ≤ 2.000 (samples: 3)
    │   │   ├─ value: -0.203 (samples: 1)
    │   │   └─ 0 ≤ 3.000 (samples: 2)
    │   │       ├

## Summary

This notebook demonstrated the versatility of our Decision Tree implementation across all four combinations of input and output types:

**Real Input & Real Output** - Regression with continuous features  
**Real Input & Discrete Output** - Classification with continuous features  
**Discrete Input & Discrete Output** - Classification with categorical features  
**Discrete Input & Real Output** - Regression with categorical features  

Each case supports both Information Gain and Gini Index splitting criteria, with automatic detection of regression vs classification problems and appropriate visualization using graphviz.