<h1> Hyper-parameter tuning </h1>

**Learning Objectives**
1. Understand various approaches to hyperparameter tuning
2. Automate hyperparameter tuning using CMLE HyperTune

In the previous notebook we achieved an RMSE of 4.79 using just two features after some feature engineering. Let's see if we can improve upon that by tuning our hyperparameters.

Hyperparameters are parameters that are set *prior* to training a model, as opposed to parameters which are learned *during* training. 

These include learning rate and batch size, but also model design parameters such as type of activation function and number of hidden units.

Here are the four most common ways to finding the ideal hyperparameters:
1. Manual
2. Grid Search
3. Random Search
4. Bayesian Optimzation

**1. Manual**

Traditionaly, hyperparameter tuning is a manual trial and error process. A data scientist has some intution about suitable hyperparameters which they use as a starting point, then they observe the result and use that information to try a new set of hyperparameters to try to beat the existing performance. 

Pros
- Educational, builds up your intuition as a data scientist
- Inexpensive because only one trial is conducted at a time

Cons
- Requires alot of time and patience

**2. Grid Search**

On the other extreme we can use grid search. Define a discrete set of values to try for each hyperparameter then try every possible combination. 

Pros
- Can run hundreds of trials in parallel using the cloud
- Gauranteed to find the best solution within the search space

Cons
- Expensive

**3. Random Search**

Alternatively define a range for each hyperparamter (e.g. 0-256) and sample uniformly at random from that range. 

Pros
- Can run hundreds of trials in parallel using the cloud
- Requires less trials than Grid Search to find a good solution

Cons
- Expensive (but less so than Grid Search)

**4. Bayesian Optimization**

Unlike Grid Search and Random Search, Bayesian Optimization takes into account information from  past trials to select parameters for future trials. The details of how this is done is beyond the scope of this notebook, but if you're interested you can read how it works here [here](https://cloud.google.com/blog/products/gcp/hyperparameter-tuning-cloud-machine-learning-engine-using-bayesian-optimization). 

Pros
- Picks values intelligenty based on results from past trials
- Less expensive because requires fewer trials to get a good result

Cons
- Requires sequential trials for best results, takes longer

**CMLE HyperTune**

CMLE HyperTune, powered by [Google Vizier](https://ai.google/research/pubs/pub46180), uses Bayesian Optimization by default, but [also supports](https://cloud.google.com/ml-engine/docs/tensorflow/hyperparameter-tuning-overview#search_algorithms) Grid Search and Random Search. 


When tuning just a few hyperparameters (say less than 4), Grid Search and Random Search work well, but when tunining several hyperparameters and the search space is large Bayesian Optimization is best.

In [7]:
PROJECT='cloud-training-demos' # CHANGE TO YOUR PROJECT
BUCKET = PROJECT # CHANGE TO YOUR BUCKET 
REGION = 'us-central1' # optionally change, see https://cloud.google.com/ml-engine/docs/tensorflow/regions
TFVERSION = '1.12' # TF version for CMLE to use

## Move Code into Python Package

Let's package our updated code with feature engineering into a package so it's CMLE compatible.

In [8]:
%%bash
mkdir taxifaremodel
touch taxifaremodel/__init__.py

mkdir: cannot create directory ‘taxifaremodel’: File exists


## Model.py

Note that any hyperparameters we want to tune need to be exposed as command line arguments, these include number of embedding dimensions, learning rate, dropout rate, and number of hiddent units.

In [15]:
%%writefile taxifaremodel/model.py
import tensorflow as tf
import shutil
print(tf.__version__)

NEMBEDS = 3 # number of embedding dimensions, will be overwritten by task.py

#1. Train and Evaluate Input Functions
CSV_COLUMN_NAMES = ['fare_amount','dayofweek','hourofday','pickuplon','pickuplat','dropofflon','dropofflat','passengers']
CSV_DEFAULTS = [[0.0],[1],[0],[-74.0], [40.0], [-74.0], [40.7], [1]]

def read_dataset(csv_path):
    def _parse_row(row):
        # Decode the CSV row into list of TF tensors
        fields = tf.decode_csv(row, record_defaults=CSV_DEFAULTS)

        # Pack the result into a dictionary
        features = dict(zip(CSV_COLUMN_NAMES, fields))
        
        # NEW: Add engineered features
        features = add_engineered_features(features)
        
        # Separate the label from the features
        label = features.pop('fare_amount') # remove label from features and store

        return features, label
    
    # Create a dataset containing the text lines.
    dataset = tf.data.Dataset.list_files(csv_path) # (i.e. data_file_*.csv)
    dataset = dataset.flat_map(lambda filename:tf.data.TextLineDataset(filename).skip(1))

    # Parse each CSV row into correct (features,label) format for Estimator API
    dataset = dataset.map(_parse_row)
    
    return dataset

def train_input_fn(csv_path, batch_size=128):
    #1. Convert CSV into tf.data.Dataset  with (features,label) format
    dataset = read_dataset(csv_path)
      
    #2. Shuffle, repeat, and batch the examples.
    dataset = dataset.shuffle(1000).repeat().batch(batch_size)
   
    return dataset

def eval_input_fn(csv_path, batch_size=128):
    #1. Convert CSV into tf.data.Dataset  with (features,label) format
    dataset = read_dataset(csv_path)

    #2.Batch the examples.
    dataset = dataset.batch(batch_size)
   
    return dataset
  
#2. Feature Engineering
def add_engineered_features(features):
    latdiff = features['pickuplat'] - features['dropofflat']
    londiff = features['pickuplon'] - features['dropofflon']
    euclidean_dist = tf.sqrt(latdiff**2 + londiff**2)
    
    features['euclidean_dist'] = euclidean_dist
    return features

fc_distance = tf.feature_column.numeric_column('euclidean_dist')
fc_dayofweek = tf.feature_column.categorical_column_with_identity('dayofweek', num_buckets = 8)
fc_hourofday = tf.feature_column.categorical_column_with_identity('hourofday', num_buckets = 24)
fc_day_hr = tf.feature_column.crossed_column([fc_dayofweek, fc_hourofday], 24 * 7)
fc_day_hr_embedded = tf.feature_column.embedding_column(fc_day_hr, NEMBEDS)

feature_cols = [fc_distance,fc_day_hr_embedded]

#3. Serving Input Receiver Function
def serving_input_receiver_fn():
    receiver_tensors = {
        'dayofweek' : tf.placeholder(tf.int32, shape=[None]), # shape is vector to allow batch of requests
        'hourofday' : tf.placeholder(tf.int32, shape=[None]),
        'pickuplon' : tf.placeholder(tf.float32, shape=[None]), 
        'pickuplat' : tf.placeholder(tf.float32, shape=[None]),
        'dropofflat' : tf.placeholder(tf.float32, shape=[None]),
        'dropofflon' : tf.placeholder(tf.float32, shape=[None]),
        'passengers' : tf.placeholder(tf.int32, shape=[None]),
    }
    
    features = add_engineered_features(receiver_tensors) # 'features' is what is passed on to the model

    return tf.estimator.export.ServingInputReceiver(features, receiver_tensors)
  
#4. Train and Evaluate
def train_and_evaluate(params):
  OUTDIR = params['output_dir']

  model = tf.estimator.DNNRegressor(
    hidden_units = [params['hidden_units_1'],params['hidden_units_2']], # specify neural architecture
    feature_columns = feature_cols, 
    model_dir = OUTDIR,
    optimizer = tf.train.AdamOptimizer(params['learning_rate']), # NEW
    dropout = params['dropout'], # NEW
    config = tf.estimator.RunConfig(
          tf_random_seed=1, # for reproducibility
          save_checkpoints_steps=max(100,params['train_steps']//10) # checkpoint every N steps
    ) 
  )
  
  # Add custom evaluation metric
  def my_rmse(labels, predictions):
    pred_values = tf.squeeze(predictions['predictions'],axis=-1)
    return {'rmse': tf.metrics.root_mean_squared_error(labels, pred_values)}
  model = tf.contrib.estimator.add_metrics(model, my_rmse)  

  train_spec=tf.estimator.TrainSpec(
                     input_fn = lambda:train_input_fn(params['train_data_path']),
                     max_steps = params['train_steps'])

  exporter = tf.estimator.FinalExporter('exporter', serving_input_receiver_fn) # export SavedModel once at the end of training
  # Note: alternatively use tf.estimator.BestExporter to export at every checkpoint that has lower loss than the previous checkpoint


  eval_spec=tf.estimator.EvalSpec(
                     input_fn=lambda:eval_input_fn(params['eval_data_path']),
                     steps = None,
                     start_delay_secs=1, # wait at least N seconds before first evaluation (default 120)
                     throttle_secs=1, # wait at least N seconds before each subsequent evaluation (default 600)
                     exporters = exporter) # export SavedModel once at the end of training

  tf.logging.set_verbosity(tf.logging.INFO) # so loss is printed during training
  shutil.rmtree(OUTDIR, ignore_errors = True) # start fresh each time

  tf.estimator.train_and_evaluate(model, train_spec, eval_spec)

Overwriting taxifaremodel/model.py


## Task.py

When doing hyperparameter tuning we need to make sure the output directory is different for each run, otherwise successive runs will overwrite previous runs. 

One way to do this is to append the trial id, look for that code below

In [16]:
%%writefile taxifaremodel/task.py
import argparse
import json
import os

from . import model

if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    parser.add_argument(
        '--nembeds',
        help = 'Embedding dimensions for day_hr (default: 3)',
        type = int,
        default = 3
    )
    parser.add_argument(
        '--dropout',
        help = 'Percent of units to drop from last layer (default: 0.0)',
        type = float,
        default = 0.0
    )
    parser.add_argument(
        '--hidden_units_1',
        help = 'Units in first hidden layer of DNN (default: 10)',
        type = int,
        default = 10
    )
    parser.add_argument(
        '--hidden_units_2',
        help = 'Units in second hidden layer of DNN (default: 10)',
        type = int,
        default = 10
    )
    parser.add_argument(
        '--learning_rate',
        help = 'Learning rate for ADAM optimzer (default: 0.1)',
        type = float,
        default = 0.1
    )
    parser.add_argument(
        '--train_data_path',
        help = 'GCS or local path to training data',
        required = True
    )
    parser.add_argument(
        '--train_steps',
        help = 'Steps to run the training job for (default: 1000)',
        type = int,
        default = 1000
    )
    parser.add_argument(
        '--eval_data_path',
        help = 'GCS or local path to evaluation data',
        required = True
    )
    parser.add_argument(
        '--output_dir',
        help = 'GCS location to write checkpoints and export models',
        required = True
    )
    parser.add_argument(
        '--job-dir',
        help='This is not used by our model, but it is required by gcloud',
    )
    args = parser.parse_args().__dict__
    
    # NEW: Append trial_id to path so trials don't overwrite each other
    # This code can be removed if you are not using hyperparameter tuning
    args['output_dir'] = os.path.join(
        args['output_dir'],
        json.loads(
            os.environ.get('TF_CONFIG', '{}')
        ).get('task', {}).get('trial', '')
    ) 
    
    model.NEMBEDS = args['nembeds'] # NEW: set global because NEMBEDS 
    
    # Run the training job
    model.train_and_evaluate(args)

Overwriting taxifaremodel/task.py


## Create HyperTune configuration 

We specify:
1. How many trials to run (`maxTrials`) and how many of those trials can be run in parrallel (`maxParallelTrials`) 
2. Which metric to optimize (`hyperparameterMetricTag`)
3. The search region in which to constrain the hyperparameter search

Full specification [here](https://cloud.google.com/ml-engine/reference/rest/v1/projects.jobs#HyperparameterSpec).

In [20]:
%writefile hyperparam.yaml
trainingInput:
  scaleTier: STANDARD_1
  hyperparameters:
    goal: MINIMIZE
    maxTrials: 30
    maxParallelTrials: 5
    hyperparameterMetricTag: rmse
    enableTrialEarlyStopping: True
    params:
    - parameterName: learning_rate
      type: DOUBLE
      minValue: .00001
      maxValue: 1
      scaleType: UNIT_LOG_SCALE
    - parameterName: dropout
      type: DOUBLE
      minValue: 0.0
      maxValue: 0.9
      scaleType: UNIT_LINEAR_SCALE
    - parameterName: hidden_units_1
      type: INTEGER
      minValue: 10
      maxValue: 512
      scaleType: UNIT_LINEAR_SCALE
    - parameterName: hidden_units_2
      type: INTEGER
      minValue: 10
      maxValue: 512
      scaleType: UNIT_LINEAR_SCALE  
    - parameterName: nembeds
      type: INTEGER
      minValue: 2
      maxValue: 24
      scaleType: UNIT_LINEAR_SCALE

Overwriting hyperparam.yaml


<h1> Run the training job </h1>

Same as before with the addition of `--config=hyperpam.yaml` to reference the file we just created.

**This will take 1 hour.** Go to [cloud console](https://pantheon.corp.google.com/mlengine/jobs) and click on the job id. As trials are completed, the choosen hyperparameters and resulting objective value (RMSE in this case) will be shown. Trials will sorted from best to worst. 

In [21]:
OUTDIR='gs://{}/taxifare/trained_hp_tune'.format(BUCKET)
!gsutil -m rm -rf {OUTDIR} # start fresh each time
!gcloud ml-engine jobs submit training taxifare_$(date -u +%y%m%d_%H%M%S) \
   --package-path=taxifaremodel \
   --module-name=taxifaremodel.task \
   --config=hyperparam.yaml \
   --job-dir=gs://{BUCKET}/taxifare \
   --python-version=3.5 \
   --runtime-version={TFVERSION} \
   --region={REGION} \
   -- \
   --train_data_path=gs://{BUCKET}/taxifare/smallinput/taxi-train.csv \
   --eval_data_path=gs://{BUCKET}/taxifare/smallinput/taxi-valid.csv  \
   --train_steps=5000 \
   --output_dir={OUTDIR}

CommandException: 1 files/objects could not be removed.
Job [taxifare_190107_232158] submitted successfully.
Your job is still active. You may view the status of your job with the command

  $ gcloud ml-engine jobs describe taxifare_190107_232158

or continue streaming the logs with the command

  $ gcloud ml-engine jobs stream-logs taxifare_190107_232158
jobId: taxifare_190107_232158
state: QUEUED


## Results

- objectiveValue: 4.185
- learning_rate: 0.029
- dropout: 
- hidden_units_1: 26
- hidden_units_2: 256
- nembeds: 5



Now that we have our ideal hyperparameters let's run on our larger dataset to see if it helps. 

Note the passing of hyperparameter values via command line

In [None]:
OUTDIR='gs://{}/taxifare/trained_large_tuned'.format(BUCKET)
!gsutil -m rm -rf {OUTDIR} # start fresh each time
!gcloud ml-engine jobs submit training taxifare_large_$(date -u +%y%m%d_%H%M%S) \
   --package-path=taxifaremodel \
   --module-name=taxifaremodel.task \
   --job-dir=gs://{BUCKET}/taxifare \
   --python-version=3.5 \
   --runtime-version=1.12 \
   --region={REGION} \
   --scale-tier=STANDARD_1 \
   -- \
   --train_data_path=gs://cloud-training-demos/taxifare/large/taxi-train*.csv \
   --eval_data_path=gs://cloud-training-demos/taxifare/small/taxi-valid.csv  \
   --train_steps=50000 \
   --output_dir={OUTDIR} \
   --learning_rate=?? \
   --dropout=?? \
   --hidden_units_1=?? \
   --hidden_units_2=?? \
   --nembeds=??

## Analysis

Our final RMSE, after feature engineering, hyperparameter tuning, and running on a large dataset is **??**. 

**Challenge Excercise**

Try to beat this! Some tips:

- We only used two features in our model, try adding more. 
- There are more hyperparameters we could tune, number of hidden layers for instance.

Copyright 2016 Google Inc. Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License