# Decision Tree

This notebook demonstrates how to use the `Decision Tree` module from the `rice2025.supervised_learning` library.  

## Setup
Import necessary modules and load data. For this example, the wine dataset from sklearn will be used. 

The Wine dataset is a small classification dataset that has:

- **Samples:** 178  
- **Features:** 13 numeric chemical properties of wines  
- **Classes:** 3 types of wine  

**Goal:** Predict the type of wine based on its chemical features.  

In [1]:
# import library
from rice2025.supervised_learning import decision_tree
import rice2025.utilities as util

# load dataset
from sklearn.datasets import load_wine
data = load_wine()
X, y = data.data, data.target

## Data Pre-Processing
Before training, we split the dataset into **training** and **test** sets using `train_test_split`. We can verify the split by printing the lengths of each output dataset.

In [2]:
# split dataset
X_train, X_test, y_train, y_test = util.train_test_split(X, y, test_size=.2)
print(f"Train size: {X_train.shape}, Test size: {X_test.shape}")


Train size: (142, 13), Test size: (36, 13)


## Initializing and Training the Decision Tree Model

`DecisionTree` supports typical tree hyperparameters such as:

- `max_depth`
- `min_samples_split`

Weâ€™ll use the default parameters, `max_depth` = 100 and `min_samples_split` = 2.

In [5]:
model = decision_tree.DecisionTree()
model.fit(X_train, y_train)

## Making Predictions
Once the model is trained, the `predict()` method can be used to classify new data points.

In [6]:
y_pred = model.predict(X_test)

## Evaluating the Model

The model's performance can be measured using **accuracy** or a more detailed **classification report**.  
The `accuracy_score` and `classification_report` functions from scikit-learn can help measure performance.

In [7]:
from sklearn.metrics import accuracy_score, classification_report

# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy on test set: {accuracy:.2f}")

# Detailed report
print("\nClassification Report:")
print(classification_report(y_test, y_pred, target_names=data.target_names))

Accuracy on test set: 0.83

Classification Report:
              precision    recall  f1-score   support

     class_0       0.92      0.92      0.92        12
     class_1       0.92      0.73      0.81        15
     class_2       0.67      0.89      0.76         9

    accuracy                           0.83        36
   macro avg       0.83      0.85      0.83        36
weighted avg       0.85      0.83      0.84        36



## Hyperparamteter Tuning
We can also try various combinations of parameters to improve accuracy. 

In [14]:
best_acc = 0
best_params = None

for depth in [2, 4, 6, 8, 10]:
    for split in [2, 4, 6, 8, 10]:
        model = decision_tree.DecisionTree(max_depth=depth, min_samples_split=split)
        model.fit(X_train, y_train)
        y_pred = model.predict(X_test)
        acc = accuracy_score(y_test, y_pred)
        if acc > best_acc:
            best_acc = acc
            best_params = {"max_depth": depth, "min_samples_split": split}

print(best_params, best_acc)

{'max_depth': 6, 'min_samples_split': 2} 0.8333333333333334
