<h1> Hyperparameter tuning </h1>

This notebook is Lab4b of CPB 102, Google's course on Machine Learning using Cloud ML.

This notebook builds on Lab 4a, adding hyperparameter tuning to the feature engineering done in that lab.  To save time, we will start from the preprocessed output of Lab 4a.

In [5]:
import google.cloud.ml as ml
import tensorflow as tf
print tf.__version__
print ml.sdk_location

0.11.0rc0
gs://cloud-ml/sdk/cloudml-0.1.6-alpha.dataflow.tar.gz


<h2> Environment variables for project and bucket </h2>

Change the cell below to reflect your Project ID and bucket name. See Lab 3a for setup instructions.

In [2]:
import os
PROJECT = 'cloud-training-demos'    # CHANGE THIS
BUCKET = 'cloud-training-demos-ml'  # CHANGE THIS

os.environ['PROJECT'] = PROJECT # for bash
os.environ['BUCKET'] = BUCKET # for bash

<h1> Retreiving preprocessed data </h1>

To save time, we'll go off the preprocessed data from Lab4a. To save time, let's start off by copying my Lab4a results (which I carried out on 10m row dataset -- in Lab 4a, you ran it on just 20,000 records).

Tuning is carried out over a segment of the training data (you should not use the validation data for this).

In [4]:
%bash
SOURCE=gs://cloud-training-demos-ml/taxifare/taxi_preproc4a_full
gsutil -m rm -rf gs://$BUCKET/taxifare/taxi_preproc4b/
gsutil cp $SOURCE/metadata.yaml gs://$BUCKET/taxifare/taxi_preproc4b/metadata.yaml
for file in features_train-0000* features_train-0002*; do
    gsutil -m cp $SOURCE/$file gs://$BUCKET/taxifare/taxi_preproc4b/
done

CommandException: 1 files/objects could not be removed.
Copying gs://cloud-training-demos-ml/taxifare/taxi_preproc4a_full/metadata.yaml [Content-Type=text/plain]...
/ [0 files][    0.0 B/  2.4 KiB]                                                / [1 files][  2.4 KiB/  2.4 KiB]                                                
Operation completed over 1 objects/2.4 KiB.                                      
Copying gs://cloud-training-demos-ml/taxifare/taxi_preproc4a_full/features_train-00005-of-00301.tfrecord.gz...
/ [0/10 files][    0.0 B/446.9 MiB]   0% Done                                   Copying gs://cloud-training-demos-ml/taxifare/taxi_preproc4a_full/features_train-00006-of-00301.tfrecord.gz...
/ [0/10 files][    0.0 B/446.9 MiB]   0% Done                                   Copying gs://cloud-training-demos-ml/taxifare/taxi_preproc4a_full/features_train-00008-of-00301.tfrecord.gz...
Copying gs://cloud-training-demos-ml/taxifare/taxi_preproc4a_full/features_train-00007-of-00301

<h2> Modify TensorFlow code </h2>

We want to make the number of buckets and the number of hidden nodes an optimizable parameter.
In order to do this, we have to get them from the command-line.

This shows all the code that now references the number_buckets hyperparameter.

In [5]:
%bash
grep -3 number_buckets taxifare/trainer/*.py

taxifare/trainer/task.py-  parser.add_argument('--metadata_path', type=str)
taxifare/trainer/task.py-  parser.add_argument('--output_path', type=str)
taxifare/trainer/task.py-  parser.add_argument('--max_steps', type=int, default=2000)
taxifare/trainer/task.py:  parser.add_argument('--number_buckets', type=int, default=5)
taxifare/trainer/task.py-  parser.add_argument('--hidden_layer1_size', type=int, default=256)
taxifare/trainer/task.py-  parser.add_argument('--batch_size', type=int, default=128)
taxifare/trainer/task.py-  parser.add_argument('--learning_rate', type=float, default=0.01)
--
taxifare/trainer/task.py-  HYPERPARAMS['hidden_layer1_size'] = args.hidden_layer1_size
taxifare/trainer/task.py-  HYPERPARAMS['hidden_layer2_size'] = args.hidden_layer1_size / 2
taxifare/trainer/task.py-  HYPERPARAMS['hidden_layer3_size'] = args.hidden_layer1_size / 4
taxifare/trainer/task.py:  HYPERPARAMS['number_buckets'] = args.number_buckets
taxifare/trainer/task.py-  
taxifare/trainer/task.py-

We also have to add a summary metric named <b>training/hptuning/metric</b> to the TensorFlow graph.

In [6]:
%bash
grep -3 hptuning taxifare/trainer/task.py

      global_step = tf.Variable(0, name='global_step', trainable=False)

    tf.scalar_summary('rmse', rmse_op)
    tf.scalar_summary('training/hptuning/metric', rmse_op)
    summary = tf.merge_all_summaries() # make sure all scalar summaries are produced

    saver = tf.train.Saver()


<h2> Train once </h2>

Here, we package up the code and train as normal.

In [7]:
%bash
rm -rf taxifare.tar.gz taxi_trained
tar cvfz taxifare.tar.gz taxifare
gsutil cp taxifare.tar.gz gs://$BUCKET/taxifare/source4b/taxifare.tar.gz

taxifare/
taxifare/PKG-INFO
taxifare/setup.cfg
taxifare/setup.py
taxifare/trainer/
taxifare/trainer/__init__.py
taxifare/trainer/task.py
taxifare/trainer/taxifare.py
taxifare/trainer.egg-info/
taxifare/trainer.egg-info/dependency_links.txt
taxifare/trainer.egg-info/PKG-INFO
taxifare/trainer.egg-info/SOURCES.txt
taxifare/trainer.egg-info/top_level.txt


Copying file://taxifare.tar.gz [Content-Type=application/x-tar]...
/ [0 files][    0.0 B/  7.2 KiB]                                                / [1 files][  7.2 KiB/  7.2 KiB]                                                
Operation completed over 1 objects/7.2 KiB.                                      


In [None]:
%bash
gsutil -m cp -R gs://$BUCKET/taxifare/taxi_preproc4b /content/training-data-analyst/CPB102/lab4b

In [None]:
%%mlalpha train
package_uris: /content/training-data-analyst/CPB102/lab4b/taxifare.tar.gz
python_module: trainer.task
scale_tier: BASIC
region: us-central1
args:
  train_data_paths: /content/training-data-analyst/CPB102/lab4b/taxi_preproc4b/features_train-0000*
  eval_data_paths:  /content/training-data-analyst/CPB102/lab4b/taxi_preproc4b/features_train-0002*
  metadata_path: /content/training-data-analyst/CPB102/lab4b/taxi_preproc4b/metadata.yaml
  output_path: /content/training-data-analyst/CPB102/lab4b/taxi_trained
  max_steps: 200
  hidden_layer1_size: 8
  number_buckets: 2
  learning_rate: 0.01
  batch_size: 128

In [1]:
%mlalpha summary --dir /content/training-data-analyst/CPB102/lab4b/taxi_trained/eval --name training/hptuning/metric accuracy --step

<h2> Hyperparameter tuning </h2>

Now, we carry out the training, but this time on the cloud, and this time with some hyperparameters

In [14]:
!gsutil -m -q rm -r gs://$BUCKET/taxifare/taxi_trained4b

CommandException: 1 files/objects could not be removed.


In [None]:
# set up parameters for mlapha command.
package_uris = 'gs://' + BUCKET + '/taxifare/source4b/taxifare.tar.gz'
train_data_paths = 'gs://' + BUCKET + '/taxifare/taxi_preproc4b/features_train-0000*'
eval_data_paths = 'gs://' + BUCKET + '/taxifare/taxi_preproc4b/features_train-0002*'
metadata_path = 'gs://' + BUCKET + '/taxifare/taxi_preproc4b/metadata.yaml'
output_path = 'gs://' + BUCKET + '/taxifare/taxi_trained4b'

In [15]:
%%mlalpha train --cloud
package_uris: $package_uris
python_module: trainer.task
scale_tier: BASIC
region: us-central1
args:
  train_data_paths: $train_data_paths
  eval_data_paths: $eval_data_paths
  metadata_path: $metadata_path
  output_path: $output_path
  max_steps: 2500
hyperparameters:
  goal: MINIMIZE
  max_trials: 100
  max_parallel_trials: 3
  params:
    - parameter_name: hidden_layer1_size
      type: INTEGER
      min_value: 128
      max_value: 256
      scale_type: UNIT_LINEAR_SCALE  
    - parameter_name: number_buckets
      type: INTEGER
      min_value: 10
      max_value: 25
      scale_type: UNIT_LINEAR_SCALE  
    - parameter_name: batch_size
      type: DISCRETE
      discrete_values: [128, 256, 512, 1024]  
    - parameter_name: learning_rate
      type: DOUBLE
      min_value: 0.001
      max_value: 0.1
      scale_type: UNIT_LOG_SCALE  

In [20]:
%mlalpha jobs --name  trainer_task_161008_000023

<h2> Final training </h2>

Use the hyperparameter training to retrain on full dataset and create the final model.

In [1]:
!gsutil ls gs://$BUCKET/taxifare/taxi_preproc4a_full | head -5

gs://cloud-training-demos-ml/taxifare/taxi_preproc4a_full/features_eval-00000-of-00196.tfrecord.gz
gs://cloud-training-demos-ml/taxifare/taxi_preproc4a_full/features_eval-00001-of-00196.tfrecord.gz
gs://cloud-training-demos-ml/taxifare/taxi_preproc4a_full/features_eval-00002-of-00196.tfrecord.gz
gs://cloud-training-demos-ml/taxifare/taxi_preproc4a_full/features_eval-00003-of-00196.tfrecord.gz
gs://cloud-training-demos-ml/taxifare/taxi_preproc4a_full/features_eval-00004-of-00196.tfrecord.gz


In [21]:
!gsutil -m -q rm -r gs://$BUCKET/taxifare/taxi_trained4b_final



Updates are available for some Cloud SDK components.  To install them,
please run:
  $ gcloud components update

CommandException: 1 files/objects could not be removed.


In [None]:
# set up parameters for mlapha command.
package_uris = 'gs://' + BUCKET + '/taxifare/source4b/taxifare.tar.gz'
train_data_paths = 'gs://' + BUCKET + '/taxifare/taxi_preproc4a_full//features_train-*'
eval_data_paths = 'gs://' + BUCKET + '/taxifare/taxi_preproc4a_full/features_eval-*'
metadata_path = 'gs://' + BUCKET + '/taxifare/taxi_preproc4a_full/metadata.yaml'
output_path = 'gs://' + BUCKET + '/taxifare/taxi_trained4b_final'

In [None]:
%%mlalpha train --cloud
package_uris:  $package_uris
python_module: trainer.task
scale_tier: BASIC
region: us-central1
args:
  train_data_paths: $train_data_paths
  eval_data_paths: $eval_data_paths
  metadata_path: $metadata_path
  output_path: $output_path
  max_steps: 2500
  hidden_layer1_size: 147
  number_buckets: 19
  learning_rate: 0.047
  batch_size: 512

Copyright 2016 Google Inc. Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License