# Decision tree from Scratch: Module Demo & Testing


# 1. Introduction
Decision Tree is a supervised learning algorithm used for both classification and regression tasks. It works by recursively splitting the dataset based on feature values to build a tree-like model of decisions, making it highly interpretable and easy to visualize.

Decision Trees can handle both numerical and categorical data and are often used in fields like medicine, marketing, and finance for decision-making tasks.

In this notebook, we apply a custom Decision Tree implementation (built from scratch) to the Wine dataset to evaluate its classification performance and understand the learned decision structure.

## 2. Import Libraries

In [8]:
import numpy as np
from sklearn import datasets
from sklearn.metrics import classification_report, confusion_matrix
from sklearn.model_selection import train_test_split

In [2]:
# Connect to Google Drive and access my custom DecisionTree model
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [3]:
import sys
sys.path.append('/content/drive/My Drive/scratch/')

In [4]:
from models.decision_tree import DecisionTree

## 3. Load Dataset

In [5]:
# Load wine dataset
wine = datasets.load_wine()
X = wine.data
y = wine.target

## 4. Train-Test Split


In [9]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
print(X_train.shape , X_test.shape)

(142, 13) (36, 13)


## 5. Train the Model

In [11]:
model = DecisionTree(max_depth = 20, min_sample_split=5 )
model.fit(X_train, y_train)

## 6. predict

In [12]:
y_pred = model.predict(X_test)
y_pred

array([0, 0, 1, 0, 1, 0, 1, 2, 1, 0, 0, 1, 0, 1, 0, 1, 1, 1, 0, 1, 0, 1,
       1, 2, 2, 2, 1, 1, 1, 0, 0, 1, 2, 0, 0, 0])

## 7. Evaluate

In [13]:
print(classification_report(y_test, y_pred))

              precision    recall  f1-score   support

           0       0.93      1.00      0.97        14
           1       0.88      1.00      0.93        14
           2       1.00      0.62      0.77         8

    accuracy                           0.92        36
   macro avg       0.94      0.88      0.89        36
weighted avg       0.93      0.92      0.91        36



In [14]:
print(confusion_matrix(y_test, y_pred))

[[14  0  0]
 [ 0 14  0]
 [ 1  2  5]]
