<b>Define environment variables</b>

To be used in future training steps.  Note that the BUCKET_NAME defined below must exist in the GCP project. 

In [16]:
%env BUCKET_NAME=ml-workshop-chicago-taxi-demo
%env LOCAL_JOB_DIR=local-training-output
%env JOB_NAME=keras_job_1
%env REGION=us-central1
%env MODEL_NAME=keras_model_1
%env MODEL_VERSION=v4
%env PROJECT_ID=mwe-sanofi-ml-workshop

env: BUCKET_NAME=ml-workshop-chicago-taxi-demo
env: LOCAL_JOB_DIR=local-training-output
env: JOB_NAME=keras_job_6_0611
env: REGION=us-central1
env: MODEL_NAME=keras_model_1_0611
env: MODEL_VERSION=v4
env: PROJECT_ID=mwe-sanofi-ml-workshop


In [3]:
# Create BUCKET_NAME if it does not exist.
!gsutil mb gs://${BUCKET_NAME}

Creating gs://ml-workshop-chicago-taxi-demo/...


<b>Perform training locally with default parameters</b>

Training detail will be written locally to the folder referenced in the job-dir parameter.

Note - creating the data will take some time as the MinMax normalizer needs to be fit over the 100 M plus training rows.

In [15]:
# Set --create-data=False after first run. Only needs to be run once for this cell.
!gcloud ai-platform local train \
  --package-path trainer \
  --module-name trainer.task \
  --job-dir $LOCAL_JOB_DIR \
  -- \
  --create-data=True

<b>Perform training on AI Platform</b>

The training job can also be run on AI Platform.  Note that in order for AI Platform to be able to complete the training job, the "Google Cloud ML Engine Service Agent" service account must be granted Cloud Storage and BigQuery admin roles.

Important: A single training job (either locally or using AI Platform) must complete with the create-data flag set to true for the remainig functionality to compolete.

In [6]:
# Set --create-data=False after first run. Only needs to be run once for this cell.
!gcloud ai-platform jobs submit training $JOB_NAME \
  --package-path trainer/ \
  --module-name trainer.task \
  --region $REGION \
  --python-version 3.5 \
  --runtime-version 1.13 \
  --job-dir gs://${BUCKET_NAME}/keras-job-dir-${JOB_NAME} \
  -- \
  --create-data=True

Job [keras_job_1_0611] submitted successfully.
Your job is still active. You may view the status of your job with the command

  $ gcloud ai-platform jobs describe keras_job_1_0611

or continue streaming the logs with the command

  $ gcloud ai-platform jobs stream-logs keras_job_1_0611
jobId: keras_job_1_0611
state: QUEUED


<b>Perform hyperparameter tuning on AI Platform</b>

Training detail will be written to Cloud Storage in the folder referenced in the job-dir parameter

In [12]:
!gcloud ai-platform jobs submit training ${JOB_NAME}_hpt \
  --config hptuning_config.yaml \
  --package-path trainer/ \
  --module-name trainer.task \
  --region $REGION \
  --python-version 3.5 \
  --runtime-version 1.13 \
  --job-dir gs://${BUCKET_NAME}/keras-job-dir-${JOB_NAME}_hpt

Job [keras_job_4_0611_hpt] submitted successfully.
Your job is still active. You may view the status of your job with the command

  $ gcloud ai-platform jobs describe keras_job_4_0611_hpt

or continue streaming the logs with the command

  $ gcloud ai-platform jobs stream-logs keras_job_4_0611_hpt
jobId: keras_job_4_0611_hpt
state: QUEUED


<b>Complete training on AI Platform</b>

Now that hyperparameters have been tuned, perform deeper training with the optimal hyperparameters in place.  Note that we've explicitly increased the train-steps and num-epochs parameters in addition to the tuned hyperparameters.

In [17]:
# Set --create-data=False after first run. Only needs to be run once for this cell.
!gcloud ai-platform jobs submit training $JOB_NAME \
  --package-path trainer/ \
  --module-name trainer.task \
  --region $REGION \
  --python-version 3.5 \
  --runtime-version 1.13 \
  --job-dir gs://${BUCKET_NAME}/keras-job-dir-${JOB_NAME} \
  -- \
  --project-id $PROJECT_ID \
  --bucket-name ${BUCKET_NAME} \
  --create-data=True \
  --test-files gs://${BUCKET_NAME}/data/full_test_results.csv \
  --train-files gs://${BUCKET_NAME}/data/full_train_results.csv \
  --eval-files gs://${BUCKET_NAME}/data/full_val_results.csv \
  --num-deep-layers=2 \
  --first-deep-layer-size=30 \
  --first-wide-layer-size=1233 \
  --learning-rate=0.003 \
  --wide-scale-factor=0.094 \
  --train-batch-size=132 \
  --dropout-rate=0.4 \
  --train-steps=10 \
  --num-epochs=5

Job [keras_job_6_0611] submitted successfully.
Your job is still active. You may view the status of your job with the command

  $ gcloud ai-platform jobs describe keras_job_6_0611

or continue streaming the logs with the command

  $ gcloud ai-platform jobs stream-logs keras_job_6_0611
jobId: keras_job_6_0611
state: QUEUED


<b>Host the trained model on AI Platform</b>

Because we're passing a list of numpy arrays and not a single numpy array as input for inference, we'll need to establish a custom prediction module.  

First, execute the setup script to create a distribution tarball

In [18]:
!python setup.py sdist --formats=gztar

running sdist
running egg_info
creating trainer.egg-info
writing trainer.egg-info/PKG-INFO
writing dependency_links to trainer.egg-info/dependency_links.txt
writing requirements to trainer.egg-info/requires.txt
writing top-level names to trainer.egg-info/top_level.txt
writing manifest file 'trainer.egg-info/SOURCES.txt'
reading manifest file 'trainer.egg-info/SOURCES.txt'
writing manifest file 'trainer.egg-info/SOURCES.txt'
running check


creating trainer-0.1
creating trainer-0.1/trainer
creating trainer-0.1/trainer.egg-info
copying files to trainer-0.1...
copying README.md -> trainer-0.1
copying predictor.py -> trainer-0.1
copying setup.py -> trainer-0.1
copying trainer/__init__.py -> trainer-0.1/trainer
copying trainer/create_data_func.py -> trainer-0.1/trainer
copying trainer/create_scaler_func.py -> trainer-0.1/trainer
copying trainer/model.py -> trainer-0.1/trainer
copying trainer/task.py -> trainer-0.1/trainer
copying trainer.egg-info/PKG-INFO -> trainer-0.1/trainer.egg-info
cop

Copy the tarball over to Cloud Storage

In [19]:
!gsutil cp dist/trainer-0.1.tar.gz gs://${BUCKET_NAME}/staging-dir/trainer-0.1.tar.gz

Copying file://dist/trainer-0.1.tar.gz [Content-Type=application/x-tar]...
/ [1 files][  9.2 KiB/  9.2 KiB]                                                
Operation completed over 1 objects/9.2 KiB.                                      


Next, create a new model on AI Platform

In [20]:
!gcloud ai-platform models create $MODEL_NAME --regions $REGION

Created ml engine model [projects/mwe-sanofi-ml-workshop/models/keras_model_1_0611].


Next we create new version using our trained model

In [21]:
!gcloud beta ai-platform versions create $MODEL_VERSION \
  --model $MODEL_NAME \
  --runtime-version 1.13 \
  --python-version 3.5 \
  --origin gs://${BUCKET_NAME}/keras-job-dir-${JOB_NAME} \
  --package-uris gs://${BUCKET_NAME}/staging-dir/trainer-0.1.tar.gz \
  --prediction-class predictor.MyPredictor

Creating version (this might take a few minutes)......done.                    


<b>Prepare a sample for inference</b>

Note that we are using the same preprocessing methods used for training.

In [25]:
!python create_sample.py

Using TensorFlow backend.
Downloading: 100%|██████████████████████████████| 7/7 [00:00<00:00, 68.52rows/s]
Downloading: 100%|██████████████████████████████| 2/2 [00:00<00:00, 17.27rows/s]
Downloading: 100%|██████████████████████████████| 5/5 [00:00<00:00, 47.36rows/s]
Downloading: 100%|██████████████████████████████| 4/4 [00:00<00:00, 60.69rows/s]
Downloading: 100%|███████████████████████████| 61/61 [00:00<00:00, 563.05rows/s]
Produced sample with label 720 seconds.


<b>Make an inference on a new sample.</b>

Pass the sample object to the model hosted in AI Platform to return a prediction.

In [26]:
!gcloud ai-platform predict \
  --model $MODEL_NAME \
  --version $MODEL_VERSION \
  --json-instances input_sample.json

{
  "predictions": "26.667124910417414"
}


<b>Approximate an Mean Absolute Percentage Error for the test set</b>

Note that we used a log transformation on our target variable, so any attributes returned by the model during training will be associated with predicting the <i>log</i> of the trip duration and not the actual trip duration.  In order to calculate metrics associated with predicting the trip duration in seconds, we'll need to make predictions from the test set using our trained model.

The best case scenario here would be to use the batch prediction within AI Platform.  However, batch prediction is not currently available with the custom predictor module we've implented.  

As an alternativel we'll approximate the MAPE by randomly sampling values from the test set.

In [27]:
!python calc_mape.py \
  --num-samples=1000 \
  --model=$MODEL_NAME \
  --version=$MODEL_VERSION \
  --project-id $PROJECT_ID \
  --bucket-name ${BUCKET_NAME}

Using TensorFlow backend.
Downloading: 100%|██████████████████████████████| 7/7 [00:00<00:00, 61.68rows/s]
Downloading: 100%|██████████████████████████████| 2/2 [00:00<00:00,  7.39rows/s]
Downloading: 100%|██████████████████████████████| 5/5 [00:00<00:00, 48.22rows/s]
Downloading: 100%|██████████████████████████████| 4/4 [00:00<00:00, 40.01rows/s]
Downloading: 100%|███████████████████████████| 61/61 [00:00<00:00, 524.94rows/s]
Returned sample with label 3480 and prediction 73.
Returned sample with label 1020 and prediction 66.
Returned sample with label 1140 and prediction 75.
Returned sample with label 1920 and prediction 135.
Returned sample with label 1500 and prediction 44.
Returned sample with label 1860 and prediction 243.
Returned sample with label 840 and prediction 34.
Returned sample with label 720 and prediction 25.
Returned sample with label 960 and prediction 27.
Returned sample with label 480 and prediction 25.
Returned sample with label 360 and prediction 39.
Returned sa