<b>Define environment variables</b>

To be used in future training steps.  Note that the BUCKET_NAME defined below must exist in the GCP project. 

In [1]:
# Append date and time to model names to make them unique.
now = !date +"%Y%m%d_%H%M%S"

%env BUCKET_NAME=ml-workshop-chicago-taxi-demo
%env LOCAL_JOB_DIR=local-training-output
%env JOB_NAME=chicago_taxi_keras_job_$now.s
%env REGION=us-central1
# %env MODEL_NAME=keras_model_$now.s
%env MODEL_VERSION=v1
%env PROJECT_ID=mwpmltr

env: BUCKET_NAME=ml-workshop-chicago-taxi-demo
env: LOCAL_JOB_DIR=local-training-output
env: JOB_NAME=chicago_taxi_keras_job_20220526_151029
env: REGION=us-central1
env: MODEL_VERSION=v1
env: PROJECT_ID=mwpmltr


In [2]:
# Create BUCKET_NAME if it does not exist.
!gsutil mb gs://${BUCKET_NAME}

from pathlib import Path
Path("./local-training-output/").mkdir(exist_ok=True)

# Remove output from previous runs, if any.
!rm input_sample.json
!rm x_scaler
!rm -rf ./local-training-output/export

Creating gs://ml-workshop-chicago-taxi-demo/...
OSError: Permission denied.
rm: input_sample.json: No such file or directory
rm: x_scaler: No such file or directory


<b>Perform training locally with default parameters</b>

Training detail will be written locally to the folder referenced in the job-dir parameter.

Note - creating the data will take some time as the MinMax normalizer needs to be fit over the 100 M plus training rows.

In [3]:
# Set --create-data=True once for the run of this cell.
# !gcloud ai-platform local train \
#   --package-path trainer \
#   --module-name trainer.task \
#   --job-dir $LOCAL_JOB_DIR \
#   -- \
#   --project-id $PROJECT_ID \
#   --bucket-name ${BUCKET_NAME} \
#   --create-data True \
#   --test-files gs://${BUCKET_NAME}/data/full_test_results.csv \
#   --train-files gs://${BUCKET_NAME}/data/full_train_results.csv \
#   --eval-files gs://${BUCKET_NAME}/data/full_val_results.csv \
#   --num-epochs 5

<b>Perform training on AI Platform</b>

The training job can also be run on AI Platform.  Note that in order for AI Platform to be able to complete the training job, the "Google Cloud ML Engine Service Agent" service account must be granted Cloud Storage and BigQuery admin roles.

Important: A single training job (either locally or using AI Platform) must complete with the create-data flag set to true for the remainig functionality to compolete.

In [8]:
now = !date +"%Y%m%d_%H%M%S"
%env JOB_NAME=chicago_taxi_keras_job_$now.s

!gcloud ai-platform jobs submit training $JOB_NAME \
  --package-path trainer/ \
  --module-name trainer.task \
  --region $REGION \
  --python-version 3.7 \
  --runtime-version 2.5 \
  --job-dir gs://${BUCKET_NAME}/keras-job-dir-${JOB_NAME} \
  -- \
  --project-id $PROJECT_ID \
  --bucket-name ${BUCKET_NAME} \
  --create-data True \
  --test-files gs://${BUCKET_NAME}/data/full_test_results.csv \
  --train-files gs://${BUCKET_NAME}/data/full_train_results.csv \
  --eval-files gs://${BUCKET_NAME}/data/full_val_results.csv \
  --train-steps 1 \
  --num-epochs 1
                
# Stream logs so that training is done before subsequent cells are run.
# Remove  '> /dev/null' to see step-by-step output of the model build steps.
!gcloud ai-platform jobs stream-logs $JOB_NAME > /dev/null

# Model should exit with status "SUCCEEDED"
!gcloud ai-platform jobs describe $JOB_NAME --format="value(state)"

env: JOB_NAME=chicago_taxi_keras_job_20220526_160153
Job [chicago_taxi_keras_job_20220526_160153] submitted successfully.
Your job is still active. You may view the status of your job with the command

  $ gcloud ai-platform jobs describe chicago_taxi_keras_job_20220526_160153

or continue streaming the logs with the command

  $ gcloud ai-platform jobs stream-logs chicago_taxi_keras_job_20220526_160153
jobId: chicago_taxi_keras_job_20220526_160153
state: QUEUED
SUCCEEDED


<b>Perform hyperparameter tuning on AI Platform</b>

Training detail will be written to Cloud Storage in the folder referenced in the job-dir parameter

In [13]:
now = !date +"%Y%m%d_%H%M%S"
%env JOB_NAME=chicago_taxi_keras_job_$now.s

!gcloud ai-platform jobs submit training ${JOB_NAME} \
    --config hptuning_config.yaml \
    --package-path trainer/ \
    --module-name trainer.task \
    --region $REGION \
    --python-version 3.7 \
    --runtime-version 2.5 \
    --job-dir gs://${BUCKET_NAME}/keras-job-dir-${JOB_NAME} \
    -- \
    --project-id $PROJECT_ID \
    --bucket-name ${BUCKET_NAME} \
    --create-data False \
    --test-files gs://${BUCKET_NAME}/data/full_test_results.csv \
    --train-files gs://${BUCKET_NAME}/data/full_train_results.csv \
    --eval-files gs://${BUCKET_NAME}/data/full_val_results.csv \
    --train-steps 1 \
    --num-epochs 1

# Stream logs so that training is done before subsequent cells are run.
# Remove  '> /dev/null' to see step-by-step output of the model build steps.
!gcloud ai-platform jobs stream-logs ${JOB_NAME} > /dev/null

# Model should exit with status "SUCCEEDED"
!gcloud ai-platform jobs describe ${JOB_NAME}  --format="value(state)"

env: JOB_NAME=chicago_taxi_keras_job_20220526_170300
Job [chicago_taxi_keras_job_20220526_170300] submitted successfully.
Your job is still active. You may view the status of your job with the command

  $ gcloud ai-platform jobs describe chicago_taxi_keras_job_20220526_170300

or continue streaming the logs with the command

  $ gcloud ai-platform jobs stream-logs chicago_taxi_keras_job_20220526_170300
jobId: chicago_taxi_keras_job_20220526_170300
state: QUEUED
SUCCEEDED


<b>Complete training on AI Platform</b>

Now that hyperparameters have been tuned, perform deeper training with the optimal hyperparameters in place.  Note that we've explicitly increased the train-steps and num-epochs parameters in addition to the tuned hyperparameters.

In [23]:
# Set --create-data=False after first run. Only needs to be run once for this cell.
now = !date +"%Y%m%d_%H%M%S"
%env JOB_NAME=chicago_taxi_keras_job_$now.s

!gcloud ai-platform jobs submit training $JOB_NAME \
  --package-path trainer/ \
  --module-name trainer.task \
  --region $REGION \
  --python-version 3.7 \
  --runtime-version 2.5 \
  --job-dir gs://${BUCKET_NAME}/keras-job-dir-${JOB_NAME} \
  -- \
  --project-id $PROJECT_ID \
  --bucket-name ${BUCKET_NAME} \
  --create-data True \
  --test-files gs://${BUCKET_NAME}/data/full_test_results.csv \
  --train-files gs://${BUCKET_NAME}/data/full_train_results.csv \
  --eval-files gs://${BUCKET_NAME}/data/full_val_results.csv \
  --num-deep-layers 2 \
  --first-deep-layer-size 5 \
  --first-wide-layer-size 30 \
  --learning-rate 0.003 \
  --wide-scale-factor 0.094 \
  --train-batch-size 132 \
  --dropout-rate 0.4 \
  --train-steps 1 \
  --num-epochs 1
                
# Stream logs so that training is done before subsequent cells are run.
# Remove  '> /dev/null' to see step-by-step output of the model build steps.
!gcloud ai-platform jobs stream-logs ${JOB_NAME} > /dev/null

# Model should exit with status "SUCCEEDED"
!gcloud ai-platform jobs describe ${JOB_NAME} --format="value(state)"

env: JOB_NAME=chicago_taxi_keras_job_20220527_103915
Job [chicago_taxi_keras_job_20220527_103915] submitted successfully.
Your job is still active. You may view the status of your job with the command

  $ gcloud ai-platform jobs describe chicago_taxi_keras_job_20220527_103915

or continue streaming the logs with the command

  $ gcloud ai-platform jobs stream-logs chicago_taxi_keras_job_20220527_103915
jobId: chicago_taxi_keras_job_20220527_103915
state: QUEUED
SUCCEEDED


<b>Host the trained model on AI Platform</b>

Because we're passing a list of numpy arrays and not a single numpy array as input for inference, we'll need to establish a custom prediction module.  

First, execute the setup script to create a distribution tarball

In [24]:
!python setup.py sdist --formats=gztar

running sdist
running egg_info
writing trainer.egg-info/PKG-INFO
writing dependency_links to trainer.egg-info/dependency_links.txt
writing requirements to trainer.egg-info/requires.txt
writing top-level names to trainer.egg-info/top_level.txt
reading manifest file 'trainer.egg-info/SOURCES.txt'
adding license file 'LICENSE'
writing manifest file 'trainer.egg-info/SOURCES.txt'
running check


creating trainer-0.1
creating trainer-0.1/trainer
creating trainer-0.1/trainer.egg-info
copying files to trainer-0.1...
copying LICENSE -> trainer-0.1
copying README.md -> trainer-0.1
copying predictor.py -> trainer-0.1
copying setup.py -> trainer-0.1
copying trainer/__init__.py -> trainer-0.1/trainer
copying trainer/create_data_func.py -> trainer-0.1/trainer
copying trainer/create_scaler_func.py -> trainer-0.1/trainer
copying trainer/model.py -> trainer-0.1/trainer
copying trainer/task.py -> trainer-0.1/trainer
copying trainer.egg-info/PKG-INFO -> trainer-0.1/trainer.egg-info
copying trainer.egg-i

Copy the tarball over to Cloud Storage

In [8]:
!gsutil cp dist/trainer-0.1.tar.gz gs://${BUCKET_NAME}/staging-dir/trainer-0.1.tar.gz

Copying file://dist/trainer-0.1.tar.gz [Content-Type=application/x-tar]...
/ [1 files][  9.3 KiB/  9.3 KiB]                                                
Operation completed over 1 objects/9.3 KiB.                                      


Next, create a new model on AI Platform

In [25]:
now = !date +"%Y%m%d_%H%M%S"
%env MODEL_NAME=chicago_taxi_keras_model_$now.s
!gcloud ai-platform models create $MODEL_NAME --regions $REGION

env: MODEL_NAME=chicago_taxi_keras_model_20220527_112900
Using endpoint [https://ml.googleapis.com/]
Created ai platform model [projects/mwpmltr/models/chicago_taxi_keras_model_20220527_112900].


Next we create new version using our trained model

In [26]:
!gcloud beta ai-platform versions create $MODEL_VERSION \
  --model $MODEL_NAME \
  --runtime-version 2.5 \
  --python-version 3.7 \
  --origin gs://${BUCKET_NAME}/keras-job-dir-${JOB_NAME} \
  --package-uris gs://${BUCKET_NAME}/staging-dir/trainer-0.1.tar.gz \
  --prediction-class predictor.MyPredictor \
  --region global \
  --project mwpmltr

Using endpoint [https://ml.googleapis.com/]
Creating version (this might take a few minutes)......done.                    


<b>Prepare a sample for inference</b>

Note that we are using the same preprocessing methods used for training.

In [33]:
!python create_sample.py \
  --project-id ${PROJECT_ID} \
  --bucket-name ${BUCKET_NAME}

  LARGE_SPARSE_SUPPORTED = LooseVersion(scipy_version) >= '0.14.0'
Produced sample with label 1260 seconds.


<b>Make an inference on a new sample.</b>

Pass the sample object to the model hosted in AI Platform to return a prediction.

In [29]:
!gcloud ai-platform predict \
  --model $MODEL_NAME \
  --version $MODEL_VERSION \
  --json-instances input_sample.json \
  --region global

Using endpoint [https://ml.googleapis.com/]
{
  "error": "Prediction failed: unknown error."
}


<b>Approximate an Mean Absolute Percentage Error for the test set</b>

Note that we used a log transformation on our target variable, so any attributes returned by the model during training will be associated with predicting the <i>log</i> of the trip duration and not the actual trip duration.  In order to calculate metrics associated with predicting the trip duration in seconds, we'll need to make predictions from the test set using our trained model.

The best case scenario here would be to use the batch prediction within AI Platform.  However, batch prediction is not currently available with the custom predictor module we've implented.  

As an alternative we'll approximate the MAPE by randomly sampling values from the test set.

In [37]:
!python calc_mape.py \
  --num-samples=10 \
  --model=$MODEL_NAME \
  --version=${MODEL_VERSION} \
  --project-id ${PROJECT_ID} \
  --bucket-name ${BUCKET_NAME} \
  --region global

  LARGE_SPARSE_SUPPORTED = LooseVersion(scipy_version) >= '0.14.0'
Using endpoint [https://ml.googleapis.com/]
{'error': 'Prediction failed: unknown error.'}
Traceback (most recent call last):
  File "calc_mape.py", line 87, in <module>
    pred = int(round(eval(output_json['predictions'])))
KeyError: 'predictions'
