<b>Define environment variables</b>

To be used in future training steps.  Note that the BUCKET_NAME defined below must exist in the GCP project.


In [5]:
%env BUCKET_NAME=ml-workshop-black-friday
%env JOB_NAME=black_friday_trial_1

%env TRAINING_PACKAGE_PATH=./trainer/
%env MAIN_TRAINER_MODULE=trainer.rf_trainer
%env REGION=us-central1
%env RUNTIME_VERSION=1.14
%env PYTHON_VERSION=3.5
%env SCALE_TIER=CUSTOM

%env MODEL_NAME=black_friday_mod_trial_1
%env PROJECT_ID=mwe-sanofi-ml-workshop
%env DATASET_ID=black_friday
%env VERSION_NAME=v1
%env FRAMEWORK=SCIKIT_LEARN

env: BUCKET_NAME=ml-workshop-black-friday
env: JOB_NAME=black_friday_trial_1
env: TRAINING_PACKAGE_PATH=./trainer/
env: MAIN_TRAINER_MODULE=trainer.rf_trainer
env: REGION=us-central1
env: RUNTIME_VERSION=1.14
env: PYTHON_VERSION=3.5
env: SCALE_TIER=CUSTOM
env: MODEL_NAME=black_friday_mod_trial_1
env: PROJECT_ID=mwe-sanofi-ml-workshop
env: DATASET_ID=black_friday
env: VERSION_NAME=v1
env: FRAMEWORK=SCIKIT_LEARN


In [10]:
# Training and testing files must be in a cloud storage bucket before training runs.
!gsutil mb gs://${BUCKET_NAME}
!gsutil cp train.csv  gs://${BUCKET_NAME}
!gsutil cp test.csv  gs://${BUCKET_NAME}

Creating gs://ml-workshop-black-friday/...
ServiceException: 409 Bucket ml-workshop-black-friday already exists.
Copying file://train.csv [Content-Type=text/csv]...
- [1 files][ 24.3 MiB/ 24.3 MiB]                                                
Operation completed over 1 objects/24.3 MiB.                                     
Copying file://test.csv [Content-Type=text/csv]...
- [1 files][ 11.0 MiB/ 11.0 MiB]                                                
Operation completed over 1 objects/11.0 MiB.                                     


<b>Perform training locally with default parameters</b>

In [12]:
# Give the service account for this project an "Editor" role in IAM for all users of this environment
# to have Bigquery access. This is needed if create-data=True is set below.
# If the table already exists, set create-data=False.
!gcloud ai-platform local train \
  --package-path $TRAINING_PACKAGE_PATH \
  --module-name $MAIN_TRAINER_MODULE \
  -- \
  --create-data=True

1it [00:15, 15.24s/it]
Downloading: 100%|█████████████████| 627921/627921 [00:41<00:00, 15111.11rows/s]
Downloading: 100%|█████████████████| 155746/155746 [00:10<00:00, 14316.64rows/s]
Copying file://model.pkl [Content-Type=application/octet-stream]...
/ [1 files][  9.1 MiB/  9.1 MiB]                                                
Operation completed over 1 objects/9.1 MiB.                                      


<b>Perform training on AI Platform</b>

The training job can also be run on AI Platform. 

Important: A single training job (either locally or using AI Platform) must complete with the --create-data  and --hp-tune flags set to True for the remainig functionality to complete.

Note that we've updated the compute allocated to the master machine for this job to allow for more muscle.

In [13]:
!gcloud ai-platform jobs submit training $JOB_NAME \
  --job-dir gs://${BUCKET_NAME}/rf-job-dir \
  --package-path $TRAINING_PACKAGE_PATH \
  --module-name $MAIN_TRAINER_MODULE \
  --region $REGION \
  --runtime-version=$RUNTIME_VERSION \
  --python-version=$PYTHON_VERSION \
  --scale-tier $SCALE_TIER \
  --master-machine-type n1-highcpu-16 \
  -- \
  --job-id $JOB_NAME \
  --project-id $PROJECT_ID \
  --bucket-name $BUCKET_NAME \
  --dataset-id $DATASET_ID 

Job [black_friday_trial_1] submitted successfully.
Your job is still active. You may view the status of your job with the command

  $ gcloud ai-platform jobs describe black_friday_trial_1

or continue streaming the logs with the command

  $ gcloud ai-platform jobs stream-logs black_friday_trial_1
jobId: black_friday_trial_1
state: QUEUED


In [14]:
# Stream logs so that training is done before subsequent cells are run.
# Remove  '> /dev/null' to see step-by-step output of the model build steps.
!gcloud ai-platform jobs stream-logs $JOB_NAME > /dev/null

In [19]:
!gcloud ai-platform jobs describe black_friday_trial_1

createTime: '2020-06-10T21:29:48Z'
endTime: '2020-06-10T21:34:21Z'
errorMessage: |-
  The replica master 0 exited with a non-zero status of 1. 
  Traceback (most recent call last):
    [...]
    File "/usr/local/lib/python3.5/dist-packages/google/resumable_media/requests/download.py", line 153, in consume
      self._process_response(result)
    File "/usr/local/lib/python3.5/dist-packages/google/resumable_media/_download.py", line 171, in _process_response
      response, _ACCEPTABLE_STATUS_CODES, self._get_status_code
    File "/usr/local/lib/python3.5/dist-packages/google/resumable_media/_helpers.py", line 96, in require_status_code
      *status_codes
  google.resumable_media.common.InvalidResponse: ('Request failed with status code', 404, 'Expected one of', <HTTPStatus.OK: 200>, <HTTPStatus.PARTIAL_CONTENT: 206>)

  During handling of the above exception, another exception occurred:

  Traceback (most recent call last):
    File "/usr/lib/python3.5/runpy.py", line 184, in _run_mod

<b>Host the trained model on AI Platform</b>

Because our raw prediction output from the model is a numpy array that needs to be converted into a product category, we'll need to implement a custom prediction module.

First, execute the setup script to create a distribution tarball

In [15]:
!python setup.py sdist --formats=gztar

running sdist
running egg_info
creating trainer.egg-info
writing trainer.egg-info/PKG-INFO
writing dependency_links to trainer.egg-info/dependency_links.txt
writing requirements to trainer.egg-info/requires.txt
writing top-level names to trainer.egg-info/top_level.txt
writing manifest file 'trainer.egg-info/SOURCES.txt'
reading manifest file 'trainer.egg-info/SOURCES.txt'
writing manifest file 'trainer.egg-info/SOURCES.txt'

running check


creating trainer-0.1
creating trainer-0.1/trainer
creating trainer-0.1/trainer.egg-info
copying files to trainer-0.1...
copying predictor.py -> trainer-0.1
copying setup.py -> trainer-0.1
copying trainer/__init__.py -> trainer-0.1/trainer
copying trainer/create_data_func.py -> trainer-0.1/trainer
copying trainer/hp_tuning.py -> trainer-0.1/trainer
copying trainer/rf_trainer.py -> trainer-0.1/trainer
copying trainer.egg-info/PKG-INFO -> trainer-0.1/trainer.egg-info
copying trainer.egg-info/SOURCES.txt -> trainer-0.1/trainer.egg-info
copying trainer.e

Next copy the tarball over to Cloud Storage

In [16]:
!gsutil cp dist/trainer-0.1.tar.gz gs://${BUCKET_NAME}/staging-dir/trainer-0.1.tar.gz

Copying file://dist/trainer-0.1.tar.gz [Content-Type=application/x-tar]...
/ [1 files][  5.2 KiB/  5.2 KiB]                                                
Operation completed over 1 objects/5.2 KiB.                                      


Create a new model on AI Platform.  Note that this needs to be done just once, and future iterations are saved as "versions" of the model.

In [17]:
!gcloud ai-platform models create $MODEL_NAME --regions $REGION

Created ml engine model [projects/mwe-sanofi-ml-workshop/models/black_friday_mod_trial_1].


Next we create new version using our trained model

In [18]:
!gcloud beta ai-platform versions create $VERSION_NAME \
  --model $MODEL_NAME \
  --origin gs://${BUCKET_NAME}/black_friday_${JOB_NAME}/ \
  --runtime-version=1.14 \
  --python-version=3.5 \
  --package-uris gs://${BUCKET_NAME}/staging-dir/trainer-0.1.tar.gz \
  --prediction-class predictor.MyPredictor

[1;31mERROR:[0m (gcloud.beta.ai-platform.versions.create) FAILED_PRECONDITION: Field: version.deployment_uri Error: The provided URI for model files doesn't contain any objects.
- '@type': type.googleapis.com/google.rpc.BadRequest
  fieldViolations:
  - description: The provided URI for model files doesn't contain any objects.
    field: version.deployment_uri


<b>Prepare a sample for inference</b>

In [9]:
!python generate_sample.py

<b>Make an inference on a new sample.</b>

Pass the sample object to the model hosted in AI Platform to return a prediction.

In [10]:
# make an online prediction
!gcloud ai-platform predict --model $MODEL_NAME --version \
  $VERSION_NAME --json-instances input.json

{
  "predictions": "Product Category 1"
}
