## Decision Tree Example

Data Source: sample_data/diabetes.csv<br/>

Train a Decision Tree classifier for predicting diabetes cases.<br/>
Use RandomizedSearchCV with our DecisionTreeClassifier to search a hyperparameter space for the best parameters.<br/>

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

In [2]:
df = pd.read_csv('sample_data/diabetes.csv')

X = df.drop(['diabetes'], axis=1).values
y = df['diabetes'].values

### RandomizedSearchCV (saves time over GridSearchCV)

In [3]:
from scipy.stats import randint
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import RandomizedSearchCV

# Setup the parameters and distributions to sample from: param_dist
param_dist = {"max_depth": [3, None],
              "max_features": randint(1, 9),
              "min_samples_leaf": randint(1, 9),
              "criterion": ["gini", "entropy"]}

# Instantiate a Decision Tree classifier: tree
tree = DecisionTreeClassifier()

# Instantiate the RandomizedSearchCV object: tree_cv
tree_cv = RandomizedSearchCV(tree, param_dist, cv=5)

# Fit it to the data
tree_cv.fit(X, y)

# Print the tuned parameters and score
print("Tuned Decision Tree Parameters: {}".format(tree_cv.best_params_))
print("Best score is {}".format(tree_cv.best_score_))


Tuned Decision Tree Parameters: {'criterion': 'gini', 'max_depth': 3, 'max_features': 6, 'min_samples_leaf': 6}
Best score is 0.7434895833333334
