<b>Define environment variables</b>

To be used in future training steps.  Note that the BUCKET_NAME defined below must exist in the GCP project.


In [1]:
%env BUCKET_NAME=ml-workshop-black-friday
%env JOB_NAME=black_friday_trial_1

%env TRAINING_PACKAGE_PATH=./trainer/
%env MAIN_TRAINER_MODULE=trainer.rf_trainer
%env REGION=us-central1
%env RUNTIME_VERSION=1.14
%env PYTHON_VERSION=3.5
%env SCALE_TIER=CUSTOM

%env MODEL_NAME=black_friday_mod_trial_1
%env PROJECT_ID=mwe-sanofi-ml-workshop
%env DATASET_ID=black_friday
%env VERSION_NAME=v1
%env FRAMEWORK=SCIKIT_LEARN

env: BUCKET_NAME=ml-workshop-black-friday
env: JOB_NAME=black_friday_trial_1
env: TRAINING_PACKAGE_PATH=./trainer/
env: MAIN_TRAINER_MODULE=trainer.rf_trainer
env: REGION=us-central1
env: RUNTIME_VERSION=1.14
env: PYTHON_VERSION=3.5
env: SCALE_TIER=CUSTOM
env: MODEL_NAME=black_friday_mod_trial_1
env: PROJECT_ID=mwe-sanofi-ml-workshop
env: DATASET_ID=black_friday
env: VERSION_NAME=v1
env: FRAMEWORK=SCIKIT_LEARN


In [2]:
# Training and testing files must be in a cloud storage bucket before training runs.
!gsutil mb gs://${BUCKET_NAME}
!gsutil cp train.csv  gs://${BUCKET_NAME}
!gsutil cp test.csv  gs://${BUCKET_NAME}
    
# Remove output from previous runs, if any.
!rm input.json

Creating gs://ml-workshop-black-friday/...
ServiceException: 409 Bucket ml-workshop-black-friday already exists.
Copying file://train.csv [Content-Type=text/csv]...
- [1 files][ 24.3 MiB/ 24.3 MiB]                                                
Operation completed over 1 objects/24.3 MiB.                                     
Copying file://test.csv [Content-Type=text/csv]...
/ [1 files][ 11.0 MiB/ 11.0 MiB]                                                
Operation completed over 1 objects/11.0 MiB.                                     
rm: cannot remove 'input.json': No such file or directory


<b>Perform training locally with default parameters</b>

In [3]:
# Give the service account for this project an "Editor" role in IAM for all users of this environment
# to have Bigquery access. The first time this cell is run, set create-data and hp-tune to True. This
# creates input files and the results of hyperparameter tuning available. You can set them to false for 
# subsequent runs.
!gcloud ai-platform local train \
  --package-path $TRAINING_PACKAGE_PATH \
  --module-name $MAIN_TRAINER_MODULE \
  -- \
  --project-id $PROJECT_ID \
  --bucket-name $BUCKET_NAME \
  --create-data=True \
  --hp-tune=True \
  --num-hp-iterations=3

1it [00:28, 28.95s/it]
Downloading: 100%|██████████████████████| 1000/1000 [00:00<00:00, 4783.19rows/s]
Downloading: 100%|██████████████████████| 1000/1000 [00:00<00:00, 3969.09rows/s]
Copying file://model.pkl [Content-Type=application/octet-stream]...
/ [1 files][  1.7 MiB/  1.7 MiB]                                                
Operation completed over 1 objects/1.7 MiB.                                      


<b>Perform training on AI Platform</b>

The training job can also be run on AI Platform. 

Important: A single training job (either locally or using AI Platform) must complete with the --create-data  and --hp-tune flags set to True for the remainig functionality to complete.

Note that we've updated the compute allocated to the master machine for this job to allow for more muscle.

In [4]:
# The first time this cell is run, set create-data and hp-tune to True. This
# creates input files and the results of hyperparameter tuning available. You can set them to false for 
# subsequent runs.
now = !date +"%Y%m%d_%H%M%S"
%env JOB_NAME=black_friday_job_$now.s

!gcloud ai-platform jobs submit training $JOB_NAME \
  --job-dir gs://${BUCKET_NAME}/rf-job-dir \
  --package-path $TRAINING_PACKAGE_PATH \
  --module-name $MAIN_TRAINER_MODULE \
  --region $REGION \
  --runtime-version=$RUNTIME_VERSION \
  --python-version=$PYTHON_VERSION \
  --scale-tier $SCALE_TIER \
  --master-machine-type n1-highcpu-16 \
  -- \
  --job-id $JOB_NAME \
  --project-id $PROJECT_ID \
  --bucket-name $BUCKET_NAME \
  --dataset-id $DATASET_ID \
  --create-data=True \
  --hp-tune=True \
  --num-hp-iterations=2
    
# Stream logs so that training is done before subsequent cells are run.
# Remove  '> /dev/null' to see step-by-step output of the model build steps.
!gcloud ai-platform jobs stream-logs $JOB_NAME > /dev/null

# Model should exit with status "SUCCEEDED"
!gcloud ai-platform jobs describe $JOB_NAME --format="value(state)"

env: JOB_NAME=black_friday_job_20200615_074523
Job [black_friday_job_20200615_074523] submitted successfully.
Your job is still active. You may view the status of your job with the command

  $ gcloud ai-platform jobs describe black_friday_job_20200615_074523

or continue streaming the logs with the command

  $ gcloud ai-platform jobs stream-logs black_friday_job_20200615_074523
jobId: black_friday_job_20200615_074523
state: QUEUED
SUCCEEDED


<b>Host the trained model on AI Platform</b>

Because our raw prediction output from the model is a numpy array that needs to be converted into a product category, we'll need to implement a custom prediction module.

First, execute the setup script to create a distribution tarball

In [5]:
!python setup.py sdist --formats=gztar

running sdist
running egg_info
writing trainer.egg-info/PKG-INFO
writing dependency_links to trainer.egg-info/dependency_links.txt
writing requirements to trainer.egg-info/requires.txt
writing top-level names to trainer.egg-info/top_level.txt
reading manifest file 'trainer.egg-info/SOURCES.txt'
writing manifest file 'trainer.egg-info/SOURCES.txt'

running check


creating trainer-0.1
creating trainer-0.1/trainer
creating trainer-0.1/trainer.egg-info
copying files to trainer-0.1...
copying predictor.py -> trainer-0.1
copying setup.py -> trainer-0.1
copying trainer/__init__.py -> trainer-0.1/trainer
copying trainer/create_data_func.py -> trainer-0.1/trainer
copying trainer/hp_tuning.py -> trainer-0.1/trainer
copying trainer/rf_trainer.py -> trainer-0.1/trainer
copying trainer.egg-info/PKG-INFO -> trainer-0.1/trainer.egg-info
copying trainer.egg-info/SOURCES.txt -> trainer-0.1/trainer.egg-info
copying trainer.egg-info/dependency_links.txt -> trainer-0.1/trainer.egg-info
copying trainer.eg

Next copy the tarball over to Cloud Storage

In [6]:
!gsutil cp dist/trainer-0.1.tar.gz gs://${BUCKET_NAME}/staging-dir/trainer-0.1.tar.gz

Copying file://dist/trainer-0.1.tar.gz [Content-Type=application/x-tar]...
/ [1 files][  5.2 KiB/  5.2 KiB]                                                
Operation completed over 1 objects/5.2 KiB.                                      


Create a new model on AI Platform.  Note that this needs to be done just once, and future iterations are saved as "versions" of the model.

In [7]:
%env MODEL_NAME=black_friday_model_$now.s
!gcloud ai-platform models create $MODEL_NAME --regions $REGION

env: MODEL_NAME=black_friday_model_20200615_074523
Created ml engine model [projects/mwe-sanofi-ml-workshop/models/black_friday_model_20200615_074523].


Next we create new version using our trained model

In [8]:
!gcloud beta ai-platform versions create $VERSION_NAME \
  --model $MODEL_NAME \
  --origin gs://${BUCKET_NAME}/black_friday_${JOB_NAME}/ \
  --runtime-version=1.14 \
  --python-version=3.5 \
  --package-uris gs://${BUCKET_NAME}/staging-dir/trainer-0.1.tar.gz \
  --prediction-class predictor.MyPredictor

Creating version (this might take a few minutes)......done.                    


<b>Prepare a sample for inference</b>

In [9]:
!python generate_sample.py \
  --project-id $PROJECT_ID \
  --bucket-name ${BUCKET_NAME}

<b>Make an inference on a new sample.</b>

Pass the sample object to the model hosted in AI Platform to return a prediction.

In [10]:
# make an online prediction
!gcloud ai-platform predict --model $MODEL_NAME --version \
  $VERSION_NAME --json-instances input.json

{
  "predictions": "Product Category 1"
}
