<h1> Hyper-parameter tuning </h1>

This notebook will show you how to carry out hyper-parameter tuning.


---
Before you start, **make sure that you are logged in with your student account**. Otherwise you may incur Google Cloud charges for using this notebook. 

---


Also, remember to uncheck "Reset all runtimes before running" when executing the next cell.

Reseting the runtime will delete any files you may have on your notebook file system. 

![](https://i.imgur.com/9dgw0h0.png)


In [0]:
#@markdown Copy-paste your GCP Project ID in the following field:

PROJECT = "" #@param {type: "string"}

#@markdown Next, use Shift-Enter to run this cell and complete authentication.

try:  
  from google.colab import auth
  auth.authenticate_user()  
  print("AUTHENTICATED")
except:
  print("FAILED to authenticate")

#Modify the following to use a different bucket and/or region
#for Google Cloud Storage and for Cloud MLE
BUCKET = PROJECT  
REGION = "us-central1"  

# Copy taxi-*.csv files from github if they are missing from the runtime.
!wget --quiet -nc https://github.com/osipov/training-data-analyst/raw/master/bootcamps/serverless_ml/taxi-11k-datasets.zip
!unzip -q -n taxi-11k-datasets.zip 

The following cells are based on code from the earlier lab on Cloud MLE. Revisit that lab to review what they do.

In [0]:
# for bash
import os
os.environ['PROJECT'] = PROJECT
os.environ['BUCKET'] = BUCKET
os.environ['REGION'] = REGION
os.environ['TF_VERSION'] = '1.12'  # Cloud MLE Latest supported Tensorflow version

In [0]:
%%bash
gcloud config set project $PROJECT
gcloud config set compute/region $REGION

In [0]:
%%bash
rm -rf taxifare
mkdir -p taxifare/trainer

for file in taxifare/setup.py \
            taxifare/trainer/__init__.py \
            taxifare/trainer/model.py \
            taxifare/trainer/task.py
do
  wget --quiet -nc \
  https://github.com/osipov/training-data-analyst/raw/master/bootcamps/serverless_ml/hparams/$file \
  -O $file
done

find taxifare

<h1> 1. Command-line parameters to task.py </h1>

Review the command-line parameters to task.py.  Any of these parameters could be hypertuned.

In [0]:
!grep -A 2 add_argument taxifare/trainer/task.py

<h1>2. Evaluation metric</h1>

Hyperparameter tuning requires a special evaluation metric. It could be any objective function.

In [0]:
!grep -A 5 add_eval_metrics taxifare/trainer/model.py


<h1> 3. Make sure outputs do not clobber each other </h1>

The trial-number is used as part of the path to the output directory.

In [0]:
!grep -A 5 "trial" taxifare/trainer/task.py

<h1> 4. Create hyper-parameter configuration </h1>

The file specifies the search region in parameter space.  Cloud MLE carries out a smart search algorithm within these constraints (i.e. it does not try out every single value).

In [0]:
%%writefile hyperparam.yaml
trainingInput:
  scaleTier: BASIC
  hyperparameters:
    goal: MINIMIZE
    maxTrials: 1
    maxParallelTrials: 1
    hyperparameterMetricTag: rmse
    params:
    - parameterName: train_steps
      type: INTEGER
      minValue: 900
      maxValue: 1000
      scaleType: UNIT_LINEAR_SCALE      
    - parameterName: train_batch_size
      type: INTEGER
      minValue: 64
      maxValue: 512
      scaleType: UNIT_LOG_SCALE
    - parameterName: hidden_units
      type: CATEGORICAL
      categoricalValues: ["128 32", "256 128 16", "64 64 64 8"]       

In [0]:
%%bash
gsutil -m rm -rf gs://${BUCKET}/taxifare/11k/*.csv
gsutil -m cp ${PWD}/*.csv gs://${BUCKET}/taxifare/11k/

<h1> 5. Run the training job </h1>

Just --config to the usual training command.

In [0]:
%%bash
OUTDIR=gs://${BUCKET}/taxifare/11k/taxi_trained_hparams
JOBNAME=mle_hparams_$(date -u +%y%m%d_%H%M%S)
echo $OUTDIR $REGION $JOBNAME

gsutil -m rm -rf $OUTDIR

gcloud ml-engine jobs submit training $JOBNAME \
   --region=$REGION \
   --module-name=trainer.task \
   --package-path=${PWD}/taxifare/trainer \
   --job-dir=$OUTDIR \
   --staging-bucket=gs://$BUCKET \
   --scale-tier=BASIC \
   --runtime-version=${TF_VERSION} \
   --config=hyperparam.yaml \
   -- \
   --train_data_paths="gs://$BUCKET/taxifare/11k/taxi-train*" \
   --eval_data_paths="gs://${BUCKET}/taxifare/11k/taxi-valid*"  \
   --output_dir=$OUTDIR \

To monitor the progress of hyperparameter tuning from the GCP user interface, navigate to [Jobs](https://console.cloud.google.com/mlengine/jobs) part of the Cloud ML Engine service and look for a job with a Type listed as **Hyperparameter tuning** . Use the "View Logs" link to get to the details.

Copyright 2019 Counter Factual .AI LLC Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License