### CellStrat Hub Pack - Deep Learning

#### DL10 - Gradient Boosted Decision Tree (GBDT)

Implement a Gradient Boosted Decision Tree (GBDT) with TensorFlow. This example is using the Boston Housing Value dataset as training samples. The example supports both Classification (2 classes: value > $23000 or not) and Regression (raw home value as target).


## Boston Housing Dataset

**Link:** https://www.cs.toronto.edu/~delve/data/boston/bostonDetail.html

**Description:**

The dataset contains information collected by the U.S Census Service concerning housing in the area of Boston Mass. It was obtained from the StatLib archive (http://lib.stat.cmu.edu/datasets/boston), and has been used extensively throughout the literature to benchmark algorithms. However, these comparisons were primarily done outside of Delve and are thus somewhat suspect. The dataset is small in size with only 506 cases.

The data was originally published by Harrison, D. and Rubinfeld, D.L. `Hedonic prices and the demand for clean air', J. Environ. Economics & Management, vol.5, 81-102, 1978.`

*For the full features list, please see the link above*

In [1]:
from __future__ import print_function

# Ignore all GPUs (current TF GBDT does not support GPU).
import os
os.environ["CUDA_VISIBLE_DEVICES"] = ""
os.environ['TF_CPP_MIN_LOG_LEVEL'] = "1"

import tensorflow as tf
import numpy as np
import copy

In [2]:
# Dataset parameters.
num_classes = 2 # Total classes: greater or equal to $23,000, or not (See notes below).
num_features = 13 # data features size.

# Training parameters.
max_steps = 800
batch_size = 256
learning_rate = 1.0
l1_regul = 0.0
l2_regul = 0.1

# GBDT parameters.
num_batches_per_layer = 1000
num_trees = 10
max_depth = 4

In [3]:
# Prepare Boston Housing Dataset.
from tensorflow.keras.datasets import boston_housing
(x_train, y_train), (x_test, y_test) = boston_housing.load_data()

# For classification purpose, we build 2 classes: price greater or lower than $23,000
def to_binary_class(y):
    for i, label in enumerate(y):
        if label >= 23.0:
            y[i] = 1
        else:
            y[i] = 0
    return y

y_train_binary = to_binary_class(copy.deepcopy(y_train))
y_test_binary = to_binary_class(copy.deepcopy(y_test))

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/boston_housing.npz


### GBDT Classifier

In [4]:
# Build the input function.
train_input_fn = tf.compat.v1.estimator.inputs.numpy_input_fn(
    x={'x': x_train}, y=y_train_binary,
    batch_size=batch_size, num_epochs=None, shuffle=True)
test_input_fn = tf.compat.v1.estimator.inputs.numpy_input_fn(
    x={'x': x_test}, y=y_test_binary,
    batch_size=batch_size, num_epochs=1, shuffle=False)
test_train_input_fn = tf.compat.v1.estimator.inputs.numpy_input_fn(
    x={'x': x_train}, y=y_train_binary,
    batch_size=batch_size, num_epochs=1, shuffle=False)
# GBDT Models from TF Estimator requires 'feature_column' data format.
feature_columns = [tf.feature_column.numeric_column(key='x', shape=(num_features,))]




In [5]:
gbdt_classifier = tf.estimator.BoostedTreesClassifier(
    n_batches_per_layer=num_batches_per_layer,
    feature_columns=feature_columns, 
    n_classes=num_classes,
    learning_rate=learning_rate, 
    n_trees=num_trees,
    max_depth=max_depth,
    l1_regularization=l1_regul, 
    l2_regularization=l2_regul
)

INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_model_dir': '/tmp/tmpejokbny3', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_checkpoint_save_graph_def': True, '_service': None, '_cluster_spec': ClusterSpec({}), '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}


In [6]:
gbdt_classifier.train(train_input_fn, max_steps=max_steps)

Instructions for updating:
Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts.
Instructions for updating:
To construct input pipelines, use the `tf.data` module.
Instructions for updating:
To construct input pipelines, use the `tf.data` module.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
'_Resource' object has no attribute 'name'
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
Instructions for updating:
To construct input pipelines, use the `tf.data` module.


Exception ignored in: <bound method CapturableResource.__del__ of <tensorflow.python.ops.boosted_trees_ops.TreeEnsemble object at 0x7fa3cb9456a0>>
Traceback (most recent call last):
  File "/home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/tensorflow/python/training/tracking/tracking.py", line 269, in __del__
    with self._destruction_context():
AttributeError: 'TreeEnsemble' object has no attribute '_destruction_context'


'_Resource' object has no attribute 'name'
INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 0...
INFO:tensorflow:Saving checkpoints for 0 into /tmp/tmpejokbny3/model.ckpt.
'_Resource' object has no attribute 'name'
INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 0...
INFO:tensorflow:loss = 0.6931473, step = 0
INFO:tensorflow:loss = 0.6931473, step = 0 (0.398 sec)
INFO:tensorflow:loss = 0.6931473, step = 0 (0.182 sec)
INFO:tensorflow:loss = 0.6931473, step = 0 (0.184 sec)
INFO:tensorflow:loss = 0.6931473, step = 0 (0.181 sec)
INFO:tensorflow:loss = 0.6931473, step = 0 (0.179 sec)
INFO:tensorflow:loss = 0.6931473, step = 0 (0.188 sec)
INFO:tensorflow:loss = 0.6931473, step = 0 (0.180 sec)
INFO:tensorflow:loss = 0.6931473, step = 0 (0.182 sec)
INFO:tensorflow:loss = 0.6931473, step = 0 (0.183 sec)
INFO:tensorflow:loss = 0.6931473, step = 0 (0.177 sec)
INFO:tensorflow:global_step/sec: 43.2957
INFO:tensorflow:loss = 0.6931473, step = 100 (0.280 sec)

<tensorflow_estimator.python.estimator.canned.boosted_trees.BoostedTreesClassifier at 0x7fa3c6504780>

In [7]:
gbdt_classifier.evaluate(test_train_input_fn)

INFO:tensorflow:Calling model_fn.
Instructions for updating:
The value of AUC returned by this may race with the update so this is deprecated. Please use tf.keras.metrics.AUC instead.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Starting evaluation at 2021-05-24T09:45:58
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from /tmp/tmpejokbny3/model.ckpt-800
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.


Exception ignored in: <bound method CapturableResource.__del__ of <tensorflow.python.ops.boosted_trees_ops.TreeEnsemble object at 0x7fa3b27c20b8>>
Traceback (most recent call last):
  File "/home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/tensorflow/python/training/tracking/tracking.py", line 269, in __del__
    with self._destruction_context():
AttributeError: 'TreeEnsemble' object has no attribute '_destruction_context'


INFO:tensorflow:Inference Time : 0.48577s
INFO:tensorflow:Finished evaluation at 2021-05-24-09:45:58
INFO:tensorflow:Saving dict for global step 800: accuracy = 0.6311881, accuracy_baseline = 0.63118815, auc = 0.5, auc_precision_recall = 0.6844059, average_loss = 0.69314724, global_step = 800, label/mean = 0.36881188, loss = 0.69314724, precision = 0.0, prediction/mean = 0.5, recall = 0.0
'_Resource' object has no attribute 'name'
INFO:tensorflow:Saving 'checkpoint_path' summary for global step 800: /tmp/tmpejokbny3/model.ckpt-800


{'accuracy': 0.6311881,
 'accuracy_baseline': 0.63118815,
 'auc': 0.5,
 'auc_precision_recall': 0.6844059,
 'average_loss': 0.69314724,
 'label/mean': 0.36881188,
 'loss': 0.69314724,
 'precision': 0.0,
 'prediction/mean': 0.5,
 'recall': 0.0,
 'global_step': 800}

In [8]:
gbdt_classifier.evaluate(test_input_fn)

INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Starting evaluation at 2021-05-24T09:45:59
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from /tmp/tmpejokbny3/model.ckpt-800
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.


Exception ignored in: <bound method CapturableResource.__del__ of <tensorflow.python.ops.boosted_trees_ops.TreeEnsemble object at 0x7fa3b2563898>>
Traceback (most recent call last):
  File "/home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/tensorflow/python/training/tracking/tracking.py", line 269, in __del__
    with self._destruction_context():
AttributeError: 'TreeEnsemble' object has no attribute '_destruction_context'


INFO:tensorflow:Inference Time : 0.46106s
INFO:tensorflow:Finished evaluation at 2021-05-24-09:45:59
INFO:tensorflow:Saving dict for global step 800: accuracy = 0.5588235, accuracy_baseline = 0.5588235, auc = 0.5, auc_precision_recall = 0.7205882, average_loss = 0.6931472, global_step = 800, label/mean = 0.44117647, loss = 0.6931472, precision = 0.0, prediction/mean = 0.5, recall = 0.0
INFO:tensorflow:Saving 'checkpoint_path' summary for global step 800: /tmp/tmpejokbny3/model.ckpt-800


{'accuracy': 0.5588235,
 'accuracy_baseline': 0.5588235,
 'auc': 0.5,
 'auc_precision_recall': 0.7205882,
 'average_loss': 0.6931472,
 'label/mean': 0.44117647,
 'loss': 0.6931472,
 'precision': 0.0,
 'prediction/mean': 0.5,
 'recall': 0.0,
 'global_step': 800}

### GBDT Regressor

In [9]:
# Build the input function.
train_input_fn = tf.compat.v1.estimator.inputs.numpy_input_fn(
    x={'x': x_train}, y=y_train,
    batch_size=batch_size, num_epochs=None, shuffle=True)
test_input_fn = tf.compat.v1.estimator.inputs.numpy_input_fn(
    x={'x': x_test}, y=y_test,
    batch_size=batch_size, num_epochs=1, shuffle=False)
# GBDT Models from TF Estimator requires 'feature_column' data format.
feature_columns = [tf.feature_column.numeric_column(key='x', shape=(num_features,))]

In [10]:
gbdt_regressor = tf.estimator.BoostedTreesRegressor(
    n_batches_per_layer=num_batches_per_layer,
    feature_columns=feature_columns, 
    learning_rate=learning_rate, 
    n_trees=num_trees,
    max_depth=max_depth,
    l1_regularization=l1_regul, 
    l2_regularization=l2_regul
)

INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_model_dir': '/tmp/tmpcsgx6kt2', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_checkpoint_save_graph_def': True, '_service': None, '_cluster_spec': ClusterSpec({}), '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}


In [11]:
gbdt_regressor.train(train_input_fn, max_steps=max_steps)

INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
'_Resource' object has no attribute 'name'
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.


Exception ignored in: <bound method CapturableResource.__del__ of <tensorflow.python.ops.boosted_trees_ops.TreeEnsemble object at 0x7fa3b248cba8>>
Traceback (most recent call last):
  File "/home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/tensorflow/python/training/tracking/tracking.py", line 269, in __del__
    with self._destruction_context():
AttributeError: 'TreeEnsemble' object has no attribute '_destruction_context'


'_Resource' object has no attribute 'name'
INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 0...
INFO:tensorflow:Saving checkpoints for 0 into /tmp/tmpcsgx6kt2/model.ckpt.
'_Resource' object has no attribute 'name'
INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 0...
INFO:tensorflow:loss = 571.1727, step = 0
INFO:tensorflow:loss = 545.48486, step = 0 (0.336 sec)
INFO:tensorflow:loss = 648.48956, step = 0 (0.178 sec)
INFO:tensorflow:loss = 574.75226, step = 0 (0.180 sec)
INFO:tensorflow:loss = 569.2014, step = 0 (0.200 sec)
INFO:tensorflow:loss = 569.6898, step = 0 (0.192 sec)
INFO:tensorflow:loss = 603.42944, step = 0 (0.176 sec)
INFO:tensorflow:loss = 514.92896, step = 0 (0.195 sec)
INFO:tensorflow:loss = 588.2447, step = 0 (0.206 sec)
INFO:tensorflow:loss = 574.8129, step = 0 (0.205 sec)
INFO:tensorflow:loss = 618.4296, step = 0 (0.184 sec)
INFO:tensorflow:global_step/sec: 43.1398
INFO:tensorflow:loss = 589.30444, step = 100 (0.270 sec)
INFO:

<tensorflow_estimator.python.estimator.canned.boosted_trees.BoostedTreesRegressor at 0x7fa3b21d2860>

In [12]:
gbdt_regressor.evaluate(test_input_fn)

INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Starting evaluation at 2021-05-24T09:46:04
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from /tmp/tmpcsgx6kt2/model.ckpt-800
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.


Exception ignored in: <bound method CapturableResource.__del__ of <tensorflow.python.ops.boosted_trees_ops.TreeEnsemble object at 0x7fa3b18a4320>>
Traceback (most recent call last):
  File "/home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/tensorflow/python/training/tracking/tracking.py", line 269, in __del__
    with self._destruction_context():
AttributeError: 'TreeEnsemble' object has no attribute '_destruction_context'


INFO:tensorflow:Inference Time : 0.19375s
INFO:tensorflow:Finished evaluation at 2021-05-24-09:46:04
INFO:tensorflow:Saving dict for global step 800: average_loss = 615.85785, global_step = 800, label/mean = 23.078432, loss = 615.85785, prediction/mean = 0.0
'_Resource' object has no attribute 'name'
INFO:tensorflow:Saving 'checkpoint_path' summary for global step 800: /tmp/tmpcsgx6kt2/model.ckpt-800


{'average_loss': 615.85785,
 'label/mean': 23.078432,
 'loss': 615.85785,
 'prediction/mean': 0.0,
 'global_step': 800}