## Grid Search Decision Trees

Hyperparameter tuning is critical for decision trees because it controls model complexity, preventing overfitting and underfitting and managing the bias–variance trade‑off. 

Key hyperparameters include,
- maximum depth
- minimum samples per leaf/split, 
- the split criterion (Gini or entropy); 

these shape how deep the tree grows and how finely it fits the data. 

To choose good values, different search strategies are used with cross‑validation: 

- GridSearchCV exhaustively tries all combinations in a predefined grid
- RandomizedSearchCV tests a random subset of combinations for efficiency 
- Bayesian optimization (e.g., with scikit‑optimize) uses a probabilistic model to explore promising regions of the hyperparameter space more intelligently. 

All three aim to find hyperparameters that maximize a chosen score (such as accuracy or F1) while keeping computation manageable.

[https://businessanalyticsinstitute.com/implementing-decision-trees-with-scikit-learn/]

#### Example using the Iris dataset to compare four tuning strategies for a DecisionTreeClassifier:

- GridSearchCV
- RandomizedSearchCV
- HalvingGridSearchCV
- HalvingRandomSearchCV
(halving searches use “successive halving”, which evaluates many candidates briefly, then focuses resources on the best ones).

In [20]:
# Basic imports
import time
import numpy as np

from sklearn.datasets import load_iris
from sklearn.model_selection import (
    train_test_split,
    GridSearchCV,
    RandomizedSearchCV,
)
from sklearn.experimental import enable_halving_search_cv  # noqa: F401
from sklearn.model_selection import HalvingGridSearchCV, HalvingRandomSearchCV
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score

# 1. Load data
iris = load_iris()
X, y = iris.data, iris.target

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42, stratify=y
)

In [15]:
# 2. Common parameter space for all searches
param_grid = {
    "criterion": ["gini", "entropy"],
    "max_depth": [3, 5, 7, None],
    "min_samples_split": [2, 5, 10],
}

# Helper to run a search and report time + scores
def run_search(name, search_obj):
    print(f"\n=== {name} ===")
    start = time.time()
    search_obj.fit(X_train, y_train)
    end = time.time()

    print("Best params:", search_obj.best_params_)
    print("Best CV score:", search_obj.best_score_)

    best_model = search_obj.best_estimator_
    y_pred = best_model.predict(X_test)
    test_acc = accuracy_score(y_test, y_pred)
    print("Test accuracy:", test_acc)
    print("Elapsed time (s):", round(end - start, 2))

In [16]:
# 3. GridSearchCV (exhaustive grid)
grid = GridSearchCV(
    estimator=DecisionTreeClassifier(random_state=0),
    param_grid=param_grid,
    cv=5,
    n_jobs=-1,
)
run_search("GridSearchCV", grid)


=== GridSearchCV ===
Best params: {'criterion': 'gini', 'max_depth': 3, 'min_samples_split': 2}
Best CV score: 0.9523809523809523
Test accuracy: 0.9333333333333333
Elapsed time (s): 0.07


In [17]:
# 4. RandomizedSearchCV (random subset of combinations)
#    n_iter controls how many random combinations are tried
rand = RandomizedSearchCV(
    estimator=DecisionTreeClassifier(random_state=0),
    param_distributions=param_grid,
    n_iter=10,        # fewer than total combinations -> faster
    cv=5,
    n_jobs=-1,
    random_state=0,
)
run_search("RandomizedSearchCV", rand)



=== RandomizedSearchCV ===
Best params: {'min_samples_split': 10, 'max_depth': None, 'criterion': 'gini'}
Best CV score: 0.9523809523809523
Test accuracy: 0.9333333333333333
Elapsed time (s): 0.04


In [18]:
# 5. HalvingGridSearchCV (successive halving over grid)
halving_grid = HalvingGridSearchCV(
    estimator=DecisionTreeClassifier(random_state=0),
    param_grid=param_grid,
    factor=3,         # how aggressively to cut candidates each round
    cv=5,
    n_jobs=-1,
)
run_search("HalvingGridSearchCV", halving_grid)


=== HalvingGridSearchCV ===
Best params: {'criterion': 'entropy', 'max_depth': 5, 'min_samples_split': 5}
Best CV score: 0.9555555555555555
Test accuracy: 0.8888888888888888
Elapsed time (s): 0.11


In [19]:
# 6. HalvingRandomSearchCV (successive halving over random samples)
halving_rand = HalvingRandomSearchCV(
    estimator=DecisionTreeClassifier(random_state=0),
    param_distributions=param_grid,
    factor=3,
    cv=5,
    n_jobs=-1,
    random_state=0,
)
run_search("HalvingRandomSearchCV", halving_rand)



=== HalvingRandomSearchCV ===
Best params: {'min_samples_split': 5, 'max_depth': None, 'criterion': 'entropy'}
Best CV score: 0.9444444444444443
Test accuracy: 0.8888888888888888
Elapsed time (s): 0.04


Notes:

- param_grid/param_distributions use the same ranges for fair comparison.
​
- GridSearchCV tests all combinations; RandomizedSearchCV tests only n_iter random ones.
​
- HalvingGridSearchCV and HalvingRandomSearchCV start with many candidates but use fewer data/resources per candidate at first, then keep only the best and allocate more resources in later iterations, often finding good hyperparameters faster.
​