<a href="https://colab.research.google.com/github/jys-gg/ml-on-gcp/blob/master/GCP_AI_Platform_Image_Classification_Built_in_Algorithm.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [0]:
# Copyright 2019 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# GCP AI Platform Built-in Algorithm: Image Classification

In this tutorial we will train an image classification model (with TPU) and then deploy it to AI platform for prediction.

## Setting up Environment

### Set up your GCP project following the [instructions](https://cloud.google.com/ml-engine/docs/tensorflow/getting-started-training-prediction#setup):



*   Select or create a GCP project.
*   Make sure that billing is enabled for your Google Cloud Platform project.
*   Enable the AI Platform ("Cloud Machine Learning Engine") and Compute Engine APIs. 


In [0]:
!pip install google-cloud
from google.colab import auth
auth.authenticate_user()

In [0]:
# Set the GCP project-id and region, and bucket.

# Please use your own project_id.
PROJECT_ID=''
REGION='us-central1'

!gcloud config set project {PROJECT_ID}
!gcloud config set compute/region {REGION}

# Create bucket (it is ok if the bucket has already been created)
JOB_OUTPUT_BUCKET="gs://{}_image_classification".format(PROJECT_ID)
! echo $JOB_OUTPUT_BUCKET
!gsutil mkdir -p $PROJECT_ID $JOB_OUTPUT_BUCKET

### Setup TPU for training

In [0]:
import json

!curl -H "Authorization: Bearer $(gcloud auth print-access-token)"  \
    https://ml.googleapis.com/v1/projects/$PROJECT_ID:getConfig | cat > ./access_token.json
    
with open('access_token.json', 'r') as f:
  TPU_SERVICE_ACCOUNT=json.load(f)['config']['tpuServiceAccount']    

!echo "Adding TPU service account for $TPU_SERVICE_ACCOUNT"

!gcloud projects add-iam-policy-binding $PROJECT_ID \
    --member serviceAccount:$TPU_SERVICE_ACCOUNT --role roles/ml.serviceAgent

## Submitting a training Job

To submit a job we need to specify some basic training arguments and some basic arguments related to our algorithm.

Let's start with setting up arguments and using gcloud to submit the Job.
* Training Job arguments: `job_id, scale-tier, master-image-uri, region`.
* Algorithm Arguments: 
  *   `training_data_path`: Path to a TFRecord path pattern used for training.
  *   `validation_data_path`: Path to a TFRecord path pattern used for validation.
  *   `job-dir`: Path where model, checkpoints and other training artifacts will reside.
  *   `num_classes`: The number of classes in the training/validation data.
  *   `max_steps`: The number of steps that the training job will run.
  *   `train_batch_size`: The number of images used in one training step.
  *   `num_eval_images`:  The number of total images used for evaluation. Its value needs to be equal or less than the total images in the `validation_data_path`.
  *   `learning_rate_decay_type`: The type that learning rate decays during training.
  *   `warmup_learning_rate`: The initial learning rate during warm-up phase.
  *   `warmup_steps`:  The number of steps to warm-up: the step from warmup_learning_rate to reach Initial Learning Rate.
  *    `initial_learning_rate`:  The initial learning rate after warmup period.
  *    `stepwise_learning_rate_steps`:  The steps to decay/change learning rates for stepwise learning rate decay type. For example, 100,200 means the learning rate will change (with respect to  stepwise_learning_rate_levels) at step 100 and step 200. Note that it will be respected only when learning_rate_decay_type is set to stepwise.
  *    `stepwise_learning_rate_levels`:  The learning rate value of each step  for stepwise learning rate decay type. Note that it will be respected only when learning_rate_decay_type is set to stepwise.
  *    `optimizer_type`:  The optimizer used for training. It should be one of {momentum, adam, rmsprop}.
  *    `optimizer_arguments`:  The arguments for optimizer. It is a comma separated list of "name=value" pairs. It needs to be compatible with optimizer_type. For example, for Momentum optimizer, it accepts "momentum=0.9". See tf.train.MomentumOptimizer for more details. For Adam optimizer, it can be "beta1=0.9,beta2=0.999". See tf.train.AdamOptimizer for more details. For RMSProp optimizer, it can be "decay=0.9,momentum=0.1,epsilon=1e-10". See RMSPropOptimizer for more details.
  *    `image_size`:  The image size (width and height) used for training.
  *    `model_type`:  That model architecture type used to train models. It can be one of `{resnet-(18|34|50|101|152|200), efficientnet-(b0|b1|b2|b3|b4|b5|b6|b7)}`.
  *    `label_smoothing`:  Label smoothing parameter used in the softmax_cross_entropy.
  *    `weight_decay`:  Weight decay co-efficiant for l2 regularization, e.g.,  `loss = cross_entropy + params['weight_decay']*l2_loss`.
  *    `pretrained_checkpoint_path`: The path of pretrained checkpoints.


In [0]:
from time import gmtime, strftime
import json

DATASET_NAME = 'flower'
ALGORITHM = 'classification'
MODEL_TYPE = 'efficientetb4'
MODEL_NAME =  '{}_{}_{}'.format(DATASET_NAME, ALGORITHM, MODEL_TYPE)
print(MODEL_NAME)

# Give a unique name to your Codeless Cloud ML Engine training job.
timestamp = strftime("%Y%m%d%H%M%S", gmtime())
JOB_ID='{}_{}'.format(MODEL_NAME, timestamp)

# This is where all your model related files will be saved.
JOB_DIR='{}/{}'.format(JOB_OUTPUT_BUCKET, JOB_ID)

# Sets the machine configuration of training jobs.
TRAINING_INPUT = """trainingInput:
  scaleTier: CUSTOM
  masterType: n1-highmem-16
  masterConfig:
    imageUri: gcr.io/cloud-ml-algos/image_classification:latest
  workerType:  cloud_tpu
  workerConfig:
   imageUri: gcr.io/cloud-ml-algos/image_classification:latest
   acceleratorConfig:
     type: TPU_V2
     count: 8
  workerCount: 1"""
with open('config.yaml', 'w') as f:
  f.write(TRAINING_INPUT)

# Launch AI platform training job.
! gcloud ai-platform jobs submit training $JOB_ID \
  --region=us-central1 \
  --config=config.yaml \
  --master-image-uri=gcr.io/cloud-ml-algos/image_classification:latest \
  -- \
  --training_data_path=gs://builtin-algorithm-data-public/flowers/flowers_train* \
  --validation_data_path=gs://builtin-algorithm-data-public/flowers/flowers_validation* \
  --job-dir=$JOB_DIR \
  --max_steps=50000 \
  --train_batch_size=128 \
  --num_eval_images=10 \
  --num_classes=5 \
  --initial_learning_rate=0.128 \
  --optimizer_type='momentum' \
  --optimizer_arguments='momentum=0.9,use_nesterov=False' \
  --learning_rate_decay_type=cosine \
  --warmup_steps=500 \
  --model_type='efficientnet-b4'

## Submitting a training Job with Hyperparameter Tuning




In [0]:
from time import gmtime, strftime
import json

DATASET_NAME = 'flower'
ALGORITHM = 'classification'
MODEL_TYPE = 'efficientetb4'
MODEL_NAME =  '{}_{}_{}'.format(DATASET_NAME, ALGORITHM, MODEL_TYPE)

# Give a unique name to your Codeless Cloud ML Engine training job.
timestamp = strftime("%Y%m%d%H%M%S", gmtime())
JOB_ID='{}_{}'.format(MODEL_NAME, timestamp)

# This is where all your model related files will be saved.
JOB_OUTPUT_BUCKET="gs://test_oob_classification"
!gsutil mkdir $JOB_OUTPUT_BUCKET

JOB_DIR='{}/{}'.format(JOB_OUTPUT_BUCKET, JOB_ID)

# Sets the machine configuration of training jobs.
TRAINING_INPUT = """trainingInput:
  scaleTier: CUSTOM
  masterType: n1-highmem-16
  masterConfig:
    imageUri: gcr.io/cloud-ml-algos/image_classification:latest
  workerType:  cloud_tpu
  workerConfig:
   imageUri: gcr.io/cloud-ml-algos/image_classification:latest
   acceleratorConfig:
     type: TPU_V2
     count: 8
  workerCount: 1
  # The following are hyper-parameter configs.
  hyperparameters:
   goal: MAXIMIZE
   hyperparameterMetricTag: "top_1_accuracy"
   maxTrials: 6
   maxParallelTrials: 3
   enableTrialEarlyStopping: True
   params:
   - parameterName: initial_learning_rate
     type: DOUBLE
     minValue: 0.001
     maxValue: 0.1
     scaleType: UNIT_LOG_SCALE
  """
with open('config.yaml', 'w') as f:
  f.write(TRAINING_INPUT)

# Launch AI platform training job.
! gcloud ai-platform jobs submit training $JOB_ID \
  --region=us-central1 \
  --config=config.yaml \
  --master-image-uri=gcr.io/cloud-ml-algos/image_classification:latest \
  -- \
  --training_data_path=gs://builtin-algorithm-data-public/flowers/flowers_train* \
  --validation_data_path=gs://builtin-algorithm-data-public/flowers/flowers_validation* \
  --job-dir=$JOB_DIR \
  --max_steps=300000 \
  --train_batch_size=128 \
  --eval_batch_size=100 \
  --num_eval_images=0 \
  --num_classes=5 \
  --initial_learning_rate=0.128 \
  --decay_steps=300000 \
  --optimizer_type='momentum' \
  --optimizer_arguments='momentum=0.9,use_nesterov=False' \
  --learning_rate_decay_type=cosine \
  --warmup_steps=2000 \
  --model_type='efficientnet-b4'

## `Monitor submitted training job`

In [0]:
!gcloud ai-platform jobs describe {JOB_ID}
!gcloud ai-platform jobs stream-logs {JOB_ID}

## Track training progress with TensorBoard

In [0]:
# Use tensorboard to monitor the progress.
# May need to wait for a few minutes until tensorflow metrics are available.
# Need to stop the process on `Monitor submitted training job` to start tensorboard.

%load_ext tensorboard
%tensorboard --logdir $JOB_DIR

## Run prediction locally



In [0]:
# Copy SavedModel to local. Need to wait until there is SavedModel available.

JOB_DIR=''
!gsutil cp -r $JOB_DIR/model .

# Use the following command if it is a hyperparameter tuning job.
# TRIAL_ID=1
# !gsutil cp -r $JOB_DIR/{TRIAL_ID}/model .


print('\nThe generated SavedModel has the following signature:')
!saved_model_cli show --dir model --tag_set serve --signature_def classify

In [0]:
# Run prediction locally:


import tensorflow as tf
import os

predict_fn = tf.contrib.predictor.from_saved_model(
    export_dir='model',
    signature_def_key='classify')

IMAGE_URI='gs://download.tensorflow.org/example_images/grace_hopper.jpg'

with tf.gfile.FastGFile(IMAGE_URI, 'rb') as img_file:
  img_data = img_file.read()
  image = [img_data]
  predictions = predict_fn({
      'image_bytes': image,
      'key': ['test_key']
  })
  print('The predicted class is {}'.format(predictions['classes']))
  print('probabilities is {}'.format(predictions['probabilities']))


## Deploying trained model in Cloud AI platform for online prediction in production

After training is done, you should expect the following directory structure under your `JOB_DIR`.

*   model/
  * saved_model.pb
  * variables
  * deployment_config.yaml

The deployment_config.yaml file should contain something like :
```
deploymentUri: gs://JOB_DIR/model
framework: TENSORFLOW
labels:
  job_id: {JOB_NAME}
  gloabal_step: '1000'
runtimeVersion: '1.13'
```

Let's try to use this file to deploy our model in prediction and make predictions from it.

For more details on how to make deployments on AI platform, take a look at [how to deploy a TensorFlow model on CMLE](https://cloud.google.com/ml-engine/docs/tensorflow/deploying-models)



In [0]:
# Let's copy the file to our local directory and take a look at the file.
!gsutil cp {JOB_DIR}/model/deployment_config.yaml .

# Use the following command if it is a hyperparameter tuning job.
# TRIAL_ID=1
# !gsutil cp {JOB_DIR}/{TRIAL_ID}/model/deployment_config.yaml .

print('\nThe job deployment_config.yaml is:')
!cat deployment_config.yaml

Let's create the model and version in AI Platform:

In [0]:
!gcloud ai-platform models create {MODEL_NAME} --regions {REGION}

# Create a model and a version using the file above.
VERSION_NAME=JOB_ID

!echo "Deployment takes a couple of minutes. You can watch your deployment here: https://console.cloud.google.com/mlengine/models/{MODEL_NAME}"

!gcloud ai-platform versions create {VERSION_NAME} \
  --model {MODEL_NAME} \
  --config deployment_config.yaml

Now we can make prediction using the deployed model.

In [0]:
import json 
import base64

with tf.gfile.Open(IMAGE_URI, 'rb') as image_file:
  encoded_string = base64.b64encode(image_file.read()).decode('utf-8')

image_bytes = {'b64': str(encoded_string)}
instances = {'image_bytes': image_bytes, 'key': '1'}
with open("prediction_instances.json","w") as f:
  f.write(json.dumps(instances)) 
  
!gcloud ml-engine predict --model $MODEL_NAME \
  --version $VERSION_NAME \
  --json-instances prediction_instances.json
