# Model hyper-parameter tuning & evaluation

## Abstract

This notebook runs the hyperparameter tuning of the production model.

The production model is a deep & wide neural net regressor.
<br>For the wide part of the model, the following features are used:
- `TripStartYear`
- `TripStartMonth`
- `TripStartDay`
- `TripStartHour`
- `TripStartMinute`
- `month_day`: feature cross of `TripStartMonth` & `TripStartDay`
- `day_hour`: feature cross of `TripStartDay` & `TripStartHour`

As for the deep part of the model, the following features are used:
- `historical_tripDuration`
- `histOneWeek_tripDuration`
- `historical_tripDistance`
- `histOneWeek_tripDistance`
- `rawDistance`
- `pickup_census_tract` embedded
- `dropoff_census_tract` embedded

The hyperparameters to tune are:
- `batch-size` of the optimisation method, amongst the discrete set: `[64, 128, 256, 512]`
- `hidden-units` of the deep part of the model amongst the discrete set: `["100,50,25,5", "256,128,32,8"]`

__RMSE__ is used as the metric to evaluate for the hypertuning.

## Hyperparameter tuning

In [4]:
%%bash

sh hyperopt.sh

jobId: chicago_taxi_ml_hypertune_model_20191010_110510
state: QUEUED


Job [chicago_taxi_ml_hypertune_model_20191010_110510] submitted successfully.
Your job is still active. You may view the status of your job with the command

  $ gcloud ai-platform jobs describe chicago_taxi_ml_hypertune_model_20191010_110510

or continue streaming the logs with the command

  $ gcloud ai-platform jobs stream-logs chicago_taxi_ml_hypertune_model_20191010_110510


From the [AI Platform job details](https://console.cloud.google.com/ai-platform/jobs/chicago_taxi_ml_hypertune_model_20191010_110510/charts/cpu?project=szilard-kalosi-sandbox),

the best model trained is the __Trial ID 5__ with the following hyperparameter values:
- `batch-size` = 256
- `hidden-units` = "256,128,32,8"

First, let's train the deep & wide model with such hyperparameter values.
Then we will evaluate the trained model with [Tensorflow Model Analysis](https://www.tensorflow.org/tfx/model_analysis/get_started).
<br>More specifically, TFMA runs the model on the test set for final evaluation and provides a visual interface to show its predictive weaknesses.

## Training

In [6]:
%%bash

#!/usr/bin/env bash

TF_TRAINING_OUTPUT="gs://szilard_aliz_sandbox/pierre_tasks/demo1/model_optimised"
TFDV_OUTPUT="gs://szilard_aliz_sandbox/pierre_tasks/demo1/tfdv"
TFT_OUTPUT="gs://szilard_aliz_sandbox/pierre_tasks/demo1/tft"

gcloud ai-platform jobs submit training \
"chicago_taxi_ml_train_model_$(date +%Y%m%d_%H%M%S)" \
--region us-central1 \
--package-path trainer \
--module-name trainer.task \
--job-dir $TF_TRAINING_OUTPUT \
--config train.yaml \
-- --tfdv-output $TFDV_OUTPUT \
--tft-output $TFT_OUTPUT \
--train-size 56948734 \
--train-epochs 2 \
--throttle-secs 60 \
--batch-size 256 \
--hidden-units "256,128,32,8" \
--eval-steps 20


jobId: chicago_taxi_ml_train_model_20191010_124441
state: QUEUED


Job [chicago_taxi_ml_train_model_20191010_124441] submitted successfully.
Your job is still active. You may view the status of your job with the command

  $ gcloud ai-platform jobs describe chicago_taxi_ml_train_model_20191010_124441

or continue streaming the logs with the command

  $ gcloud ai-platform jobs stream-logs chicago_taxi_ml_train_model_20191010_124441


## Run TFMA

In [1]:
%%bash

#!/usr/bin/env bash

echo Starting distributed TFMA computation...

JOB_ID="demo1-tfma-model-optimised-$(date +%Y%m%d-%H%M%S)"
TFMA_OUTPUT="gs://szilard_aliz_sandbox/pierre_tasks/demo1/tfma_model_optimised"
TRAINED_MODEL_LOC="gs://szilard_aliz_sandbox/pierre_tasks/demo1/model_optimised"
TFDV_OUTPUT="gs://szilard_aliz_sandbox/pierre_tasks/demo1/tfdv"
TEMP_PATH=$TFMA_OUTPUT/tmp/
MYPROJECT=$(gcloud config list --format 'value(core.project)' 2>/dev/null)

python model_analysis/analyse_model.py \
    --tfma-output $TFMA_OUTPUT \
    --trained-model-loc $TRAINED_MODEL_LOC \
    --tfdv-output $TFDV_OUTPUT \
    --project $MYPROJECT \
    --region us-central1 \
    --temp_location $TEMP_PATH \
    --job_name $JOB_ID \
    --save_main_session \
    --setup_file model_analysis/setup.py \
    --runner DataflowRunner


Starting distributed TFMA computation...


  'You are using Apache Beam with Python 2. '




  | 'IncrementCounter' >> beam.Map(increment_counter))





In order to run, TFMA needs notebook extensions.
<br>To enable such extensions, keep in mind to switch on AI Platform from standard Jupyterlab to Jupyter notebook classic version.
<br>To do so, go to `Help > Launch Classic Notebook`.

Furthermore TFMA visuals cannot be saved neither in the notebook nor even in an HTML version.
<br>We need to re-run the cell everytime we want to visualize the metrics.

In [1]:
import tensorflow_model_analysis as tfma

print('TFMA version: {}'.format(tfma.version.VERSION_STRING))

  'You are using Apache Beam with Python 2. '


TFMA version: 0.14.0


In [2]:
train_result = tfma.load_eval_result(output_path='gs://szilard_aliz_sandbox/pierre_tasks/demo1/tfma_model_optimised/train/')
eval_result = tfma.load_eval_result(output_path='gs://szilard_aliz_sandbox/pierre_tasks/demo1/tfma_model_optimised/eval/')
test_result = tfma.load_eval_result(output_path='gs://szilard_aliz_sandbox/pierre_tasks/demo1/tfma_model_optimised/test/')

Instructions for updating:
Use eager execution and: 
`tf.data.TFRecordDataset(path)`


### Train set

In [5]:
tfma.view.render_slicing_metrics(train_result)

U2xpY2luZ01ldHJpY3NWaWV3ZXIoY29uZmlnPXsnd2VpZ2h0ZWRFeGFtcGxlc0NvbHVtbic6IDF9LCBkYXRhPVt7J21ldHJpY3MnOiB7dSdsYWJlbC9tZWFuJzogeydkb3VibGVWYWx1ZSc6IDHigKY=


### Eval set

In [4]:
tfma.view.render_slicing_metrics(eval_result)

U2xpY2luZ01ldHJpY3NWaWV3ZXIoY29uZmlnPXsnd2VpZ2h0ZWRFeGFtcGxlc0NvbHVtbic6IDF9LCBkYXRhPVt7J21ldHJpY3MnOiB7dSdsYWJlbC9tZWFuJzogeydkb3VibGVWYWx1ZSc6IDHigKY=


### Test set

In [3]:
tfma.view.render_slicing_metrics(test_result)

U2xpY2luZ01ldHJpY3NWaWV3ZXIoY29uZmlnPXsnd2VpZ2h0ZWRFeGFtcGxlc0NvbHVtbic6IDF9LCBkYXRhPVt7J21ldHJpY3MnOiB7dSdsYWJlbC9tZWFuJzogeydkb3VibGVWYWx1ZSc6IDHigKY=


In [3]:
tfma.view.render_slicing_metrics(test_result, slicing_column='TripStartMonth')

U2xpY2luZ01ldHJpY3NWaWV3ZXIoY29uZmlnPXsnd2VpZ2h0ZWRFeGFtcGxlc0NvbHVtbic6IDF9LCBkYXRhPVt7J21ldHJpY3MnOiB7dSdsYWJlbC9tZWFuJzogeydkb3VibGVWYWx1ZSc6IDHigKY=


In [4]:
tfma.view.render_slicing_metrics(test_result, slicing_column='TripStartDay')

U2xpY2luZ01ldHJpY3NWaWV3ZXIoY29uZmlnPXsnd2VpZ2h0ZWRFeGFtcGxlc0NvbHVtbic6IDF9LCBkYXRhPVt7J21ldHJpY3MnOiB7dSdsYWJlbC9tZWFuJzogeydkb3VibGVWYWx1ZSc6IDHigKY=


As a conclusion, there are no noticeable discrepancy in the model performance.

Here are the diverse model performances in terms of __RMSE__:
- training: __2.511__
- evaluation: __2.52__
- test: __2.624__

The model behaves well without any overfitting.
<br>Furthermore, either for training, evaluation or test, the model's performance is also very stable when partitioned with the different slices of `TripStartMonth` & `TripStartDay` - except for __Saturday__ where performance (in __RMSE__) drops from __~2.57__ to __3.045__.

Compared to the baseline model, the performance improvement on the holdout test is about __9.3%__.