<b>Define environment variables</b>

To be used in future training steps.  Note that the BUCKET_NAME defined below must exist in the GCP project.


In [None]:
%env BUCKET_NAME=<ADD DETAILS HERE>
%env JOB_NAME=<ADD DETAILS HERE>

%env TRAINING_PACKAGE_PATH=./trainer/
%env MAIN_TRAINER_MODULE=trainer.rf_trainer
%env REGION=<ADD DETAILS HERE>
%env RUNTIME_VERSION=<ADD DETAILS HERE>
%env PYTHON_VERSION=<ADD DETAILS HERE>
%env SCALE_TIER=<ADD DETAILS HERE>

%env MODEL_NAME=<ADD DETAILS HERE>
%env PROJECT_ID=<ADD DETAILS HERE>
%env DATASET_ID=b<ADD DETAILS HERE>
%env VERSION_NAME=<ADD DETAILS HERE>
%env FRAMEWORK=<ADD DETAILS HERE>

In [None]:
# Training and testing files must be in a cloud storage bucket before training runs.
!gsutil mb gs://${BUCKET_NAME}
!gsutil cp train.csv  gs://${BUCKET_NAME}
!gsutil cp test.csv  gs://${BUCKET_NAME}
    
# Remove output from previous runs, if any.
!rm input.json

<b>Perform training locally with default parameters</b>

[Using AI Platform for Local Training](https://cloud.google.com/sdk/gcloud/reference/ai-platform/local/train)


In [None]:
# Give the service account for this project an "Editor" role in IAM for all users of this environment
# to have Bigquery access. The first time this cell is run, set create-data and hp-tune to True. This
# creates input files and the results of hyperparameter tuning available. You can set them to false for 
# subsequent runs.

# Fill the incomplete details to train the model locally

!gcloud ai-platform local train \
  --<ADD DETAILS HERE> \
  --<ADD DETAILS HERE> \
  -- \
  --<ADD DETAILS HERE> \
  --<ADD DETAILS HERE> \
  --create-data=True \
  --hp-tune=True \
  --num-hp-iterations=3

<b>Perform training on AI Platform</b>

The training job can also be run on AI Platform. 

Important: A single training job (either locally or using AI Platform) must complete with the --create-data  and --hp-tune flags set to True for the remainig functionality to complete.

Note that we've updated the compute allocated to the master machine for this job to allow for more muscle.

In [None]:
# The first time this cell is run, set create-data and hp-tune to True. This
# creates input files and the results of hyperparameter tuning available. You can set them to false for 
# subsequent runs.
now = !date +"%Y%m%d_%H%M%S"
%env JOB_NAME=black_friday_job_$now.s

!gcloud ai-platform jobs submit training $JOB_NAME \
  --job-dir gs://${BUCKET_NAME}/rf-job-dir \
  --<ADD DETAILS HERE> \
  --<ADD DETAILS HERE> \
  --<ADD DETAILS HERE> \
  --<ADD DETAILS HERE> \
  --<ADD DETAILS HERE> \
  --<ADD DETAILS HERE> \
  --master-machine-type n1-highcpu-16 \
  -- \
  --<ADD DETAILS HERE> \
  --<ADD DETAILS HERE> \
  --<ADD DETAILS HERE> \
  --dataset-id $DATASET_ID \
  --<ADD DETAILS HERE> \
  --<ADD DETAILS HERE> \
  --<ADD DETAILS HERE>
    
# Stream logs so that training is done before subsequent cells are run.
# Remove  '> /dev/null' to see step-by-step output of the model build steps.
!gcloud ai-platform jobs stream-logs $JOB_NAME > /dev/null

# Model should exit with status "SUCCEEDED"
!gcloud ai-platform jobs describe $JOB_NAME --format="value(state)"

<b>Host the trained model on AI Platform</b>

Because our raw prediction output from the model is a numpy array that needs to be converted into a product category, we'll need to implement a custom prediction module.

First, execute the setup script to create a distribution tarball

In [None]:
!python setup.py sdist --formats=gztar

Next copy the tarball over to Cloud Storage

In [None]:
!gsutil cp dist/trainer-0.1.tar.gz gs://${BUCKET_NAME}/staging-dir/trainer-0.1.tar.gz

Create a new model on AI Platform.  Note that this needs to be done just once, and future iterations are saved as "versions" of the model.

In [None]:
# write the command to create a ML MODEL
<ADD DETAILS HERE>

Next we create new version using our trained model

In [None]:
!gcloud beta ai-platform versions create $VERSION_NAME \
  --model $MODEL_NAME \
  --origin gs://${BUCKET_NAME}/black_friday_${JOB_NAME}/ \
  --runtime-version=1.14 \
  --python-version=3.5 \
  --package-uris gs://${BUCKET_NAME}/staging-dir/trainer-0.1.tar.gz \
  --prediction-class predictor.MyPredictor

<b>Prepare a sample for inference</b>

In [None]:
!python generate_sample.py \
  --project-id $PROJECT_ID \
  --bucket-name ${BUCKET_NAME}

<b>Make an inference on a new sample.</b>

Pass the sample object to the model hosted in AI Platform to return a prediction.

In [None]:
# make an online prediction
!gcloud ai-platform predict --model $MODEL_NAME --version \
  $VERSION_NAME --json-instances input.json