<a href="https://colab.research.google.com/github/jys-gg/ml-on-gcp/blob/master/tutorials/builtin_algorithms/GCP_AI_Platform_Object_Detection_Built_in_Algorithm.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [0]:
# Copyright 2019 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# GCP AI Platform Built-in Algorithm: Image Object Detection

In this tutorial we will train an object detection model (with TPU) and then deploy it to AI platform for prediction.

## Setting up Environment

### Set up your GCP project following the [instructions](https://cloud.google.com/ml-engine/docs/tensorflow/getting-started-training-prediction#setup):



*   Select or create a GCP project.
*   Make sure that billing is enabled for your Google Cloud Platform project.
*   Enable the AI Platform ("Cloud Machine Learning Engine") and Compute Engine APIs. 


In [0]:
!pip install google-cloud
from google.colab import auth
auth.authenticate_user()

In [0]:
# Set the GCP project-id and region, and bucket.

# Please use your own project_id.
PROJECT_ID=''
REGION='us-central1'

!gcloud config set project {PROJECT_ID}
!gcloud config set compute/region {REGION}

# Create bucket (it is ok if the bucket has already been created)
JOB_OUTPUT_BUCKET="gs://{}_image_detection".format(PROJECT_ID)
!echo $JOB_OUTPUT_BUCKET
!gsutil mkdir -p $PROJECT_ID $JOB_OUTPUT_BUCKET

### Setup TPU for training

In [0]:
import json

!curl -H "Authorization: Bearer $(gcloud auth print-access-token)"  \
    https://ml.googleapis.com/v1/projects/$PROJECT_ID:getConfig | cat > ./access_token.json
    
with open('access_token.json', 'r') as f:
  TPU_SERVICE_ACCOUNT=json.load(f)['config']['tpuServiceAccount']    

!echo "Adding TPU service account for $TPU_SERVICE_ACCOUNT"

!gcloud projects add-iam-policy-binding $PROJECT_ID \
    --member serviceAccount:$TPU_SERVICE_ACCOUNT --role roles/ml.serviceAgent

## Submitting a training Job

To submit a job we need to specify some basic training arguments and some basic arguments related to our algorithm.

Let's start with setting up arguments and using gcloud to submit the Job.
* Training Job arguments: `job_id, job-dir, scale-tier, master-image-uri, region`.
* Algorithm Arguments: 
  *   `training_data_path`: Path to a TFRecord path pattern used for training.
  *   `validation_data_path`: Path to a TFRecord path pattern used for validation.
  *    `pretrained_checkpoint_path`: The path of pretrained checkpoints.
  *   `num_classes`: The number of classes in the training/validation data.
  *   `max_steps`: The number of steps that the training job will run.
  *   `train_batch_size`: The number of images used in one training step.
  *   `num_eval_images`:  The number of total images used for evaluation. Its value needs to be equal or less than the total images in the validation_data_path.
  *   `learning_rate_decay_type`: The type that learning rate decays during training. Needs to be one of `{cosine, stepwise}`.
  *   `warmup_learning_rate`: The initial learning rate during warm-up phase.
  *   `warmup_steps`:  The number of steps to warm-up: the step from warmup_learning_rate to reach Initial Learning Rate.
  *    `initial_learning_rate`:  The initial learning rate after warmup period.
  *    `stepwise_learning_rate_steps`:  The steps to decay/change learning rates for stepwise learning rate decay type. For example, 100,200 means the learning rate will change (with respect to  stepwise_learning_rate_levels) at step 100 and step 200. Note that it will be respected only when learning_rate_decay_type is set to stepwise.
  *    `stepwise_learning_rate_levels`:  The learning rate value of each step  for stepwise learning rate decay type. Note that it will be respected only when learning_rate_decay_type is set to stepwise.
  *    `optimizer_type`:  The optimizer used for training. It should be  one of `{momentum, adam, adadelta, adagrad, rmsprop}`.
  *    `image_size`:  The image size (height and width) used for training, e.g., "640,640".
  *    `resnet_depth`: The depth of ResNet backbone. Need to be one of `{18,34,50,101,152,200}`.
  *    `anchor_size`: The  scale of the base anchor size in Feature Pyramid Network (FPN).
  *    `fpn_type`:  The multi-level Feature Pyramid Network (FPN) type. Need to be one of `{fpn, nasfpn}`.
  *    `bbox_aspect_ratios`: The scale of size of the base anchors representing the aspect raito anchors added on each level. The number indicates the ratio of width to height. For instances, `“1.0,2.0,0.5”` adds three anchors on each scale level.
  *    `max_num_bboxes_in_training`: The maximum number of proposed bboxes proposed for training.
  *    `max_num_bboxes_in_prediction`: The maximum number of proposed bboxes in prediction outputs.
  *    `nms_iou_threshold`: The threshold to decide whether bboxes overlap with respect to 'IOU for non-maximum suppression.
  *    `nms_score_threshold`: The threshold for deciding when to remove boxes based on score.
  *    `focal_loss_alpha`: Focal loss alpha (balancing param) value.
  *    `focal_loss_gamma`: Focal loss gamma (focusing param) value.
  *    `aug_scale_min`: The minimum scale applied during image augmentation. Its value is between `[0, 1.0]`.
  *    `aug_scale_max`: The maximum scale applied during image augmentation. Its value is between `[1.0, inf]`.
  *    `aug_rand_hflip`: If true, augment training with random
        horizontal flip.

In [0]:
from time import gmtime, strftime
import json

DATASET_NAME = 'coco'
ALGORITHM = 'object_detection'
MODEL_TYPE = 'retinanet'
MODEL_NAME =  '{}_{}_{}'.format(DATASET_NAME, ALGORITHM, MODEL_TYPE)

# Give a unique name to your Codeless Cloud ML Engine training job.
timestamp = strftime("%Y%m%d%H%M%S", gmtime())
JOB_ID='{}_{}'.format(MODEL_NAME, timestamp)

# This is where all your model related files will be saved. Make sure you have access to this GCS bucket.
JOB_DIR='{}/{}'.format(JOB_OUTPUT_BUCKET, JOB_ID)

# Sets the machine configuration of training jobs.
TRAINING_INPUT = """trainingInput:
  scaleTier: CUSTOM
  masterType: n1-highmem-16
  masterConfig:
    imageUri: gcr.io/cloud-ml-algos/image_object_detection:latest
    acceleratorConfig:
      type: NVIDIA_TESLA_P100
      count: 1
  workerType:  cloud_tpu
  workerConfig:
   imageUri: gcr.io/cloud-ml-algos/image_object_detection:latest
   acceleratorConfig:
     type: TPU_V2
     count: 8
  workerCount: 1"""
with open('config.yaml', 'w') as f:
  f.write(TRAINING_INPUT)

# Launch AI platform training job.
! gcloud ai-platform jobs submit training $JOB_ID \
  --region=us-central1 \
  --config=config.yaml \
  --job-dir=$JOB_DIR \
  -- \
  --training_data_path=gs://builtin-algorithm-data-public/coco/train* \
  --validation_data_path=gs://builtin-algorithm-data-public/coco/val* \
  --max_steps=90000 \
  --train_batch_size=64 \
  --num_eval_images=5000 \
  --num_classes=91 \
  --initial_learning_rate=0.08 \
  --learning_rate_decay_type=stepwise \
  --stepwise_learning_rate_levels="0.008,0.0008" \
  --stepwise_learning_rate_steps="60000,80000" \
  --warmup_steps=1733 \
  --aug_scale_min=0.8 \
  --aug_scale_max=1.2 \
  --fpn_type=fpn \
  --pretrained_checkpoint_path="gs://builtin-algorithm-data-public/pretrained_checkpoints/detection/resnet50/"

## Submitting a training Job with Hyperparameter Tuning 


###  **Note that to support parallel jobs during hyperparameter tuning, you may need to [Requesting a quota increase](https://cloud.google.com/ml-engine/docs/quotas) to get more TPU/GPU resources.**



In [0]:
from time import gmtime, strftime
import json

DATASET_NAME = 'coco_hypertune'
ALGORITHM = 'detection'
MODEL_TYPE = 'retinanet'
MODEL_NAME =  '{}_{}_{}'.format(DATASET_NAME, ALGORITHM, MODEL_TYPE)

# Give a unique name to your Codeless Cloud ML Engine training job.
timestamp = strftime("%Y%m%d%H%M%S", gmtime())
JOB_ID='{}_{}'.format(MODEL_NAME, timestamp)

# This is where all your model related files will be saved. Make sure you have access to this GCS bucket.
JOB_DIR='{}/{}'.format(JOB_OUTPUT_BUCKET, JOB_ID)

# Sets the machine configuration of training jobs.
TRAINING_INPUT = """trainingInput:
  scaleTier: CUSTOM
  masterType: n1-highmem-16
  masterConfig:
    imageUri: gcr.io/cloud-ml-algos/image_object_detection:latest
    acceleratorConfig:
      type: NVIDIA_TESLA_P100
      count: 1
  workerType:  cloud_tpu
  workerConfig:
   imageUri: gcr.io/cloud-ml-algos/image_object_detection:latest
   acceleratorConfig:
     type: TPU_V2
     count: 8
  workerCount: 1
  # The following are hyper-parameter configs.
  hyperparameters:
   goal: MAXIMIZE
   hyperparameterMetricTag: "AP"
   maxTrials: 2
   maxParallelTrials: 2
   params:
    - parameterName: fpn_type
      type: CATEGORICAL
      categoricalValues:
      - fpn
      - nasfpn
  """


with open('config.yaml', 'w') as f:
  f.write(TRAINING_INPUT)

# Launch AI platform training job.
! gcloud ai-platform jobs submit training $JOB_ID \
  --region=us-central1 \
  --config=config.yaml \
  --job-dir=$JOB_DIR \
  -- \
  --training_data_path=gs://builtin-algorithm-data-public/coco/train* \
  --validation_data_path=gs://builtin-algorithm-data-public/coco/val* \
  --max_steps=90000 \
  --train_batch_size=64 \
  --num_eval_images=5000 \
  --num_classes=91 \
  --initial_learning_rate=0.08 \
  --learning_rate_decay_type=stepwise \
  --stepwise_learning_rate_levels="0.008,0.0008" \
  --stepwise_learning_rate_steps="60000,80000" \
  --warmup_steps=1733 \
  --aug_scale_min=0.8 \
  --aug_scale_max=1.2 \
  --fpn_type=fpn \
  --pretrained_checkpoint_path="gs://builtin-algorithm-data-public/pretrained_checkpoints/detection/resnet50/"

## Monitor submitted training job

In [0]:
!gcloud ai-platform jobs describe {JOB_ID}
!gcloud ai-platform jobs stream-logs {JOB_ID}

## Track training progress with TensorBoard

In [0]:
# Use tensorboard to monitor the progress. 
# May need to wait for a few minutes until tensorflow metrics are available.

%load_ext tensorboard
%tensorboard --logdir $JOB_DIR

## Copy `Best` SavedModel to local



In [0]:
# Copy SavedModel to local.

!gsutil cp -r $JOB_DIR/model .

# Use the following command if it is a hyperparameter tuning job.
# TRIAL_ID=1
# !gsutil cp -r $JOB_DIR/{TRIAL_ID}/model .


print('\nThe generated SavedModel has the following signature:')
!saved_model_cli show --dir model --tag_set serve --signature_def serving_default

## Run prediction locally

In [0]:
import tensorflow as tf
import os

# For downloading the image.
import matplotlib.pyplot as plt
import tempfile
from six.moves.urllib.request import urlopen
from six import BytesIO

# For drawing onto the image.
import numpy as np
from PIL import Image
from PIL import ImageColor
from PIL import ImageDraw
from PIL import ImageFont
from PIL import ImageOps

# For measuring the inference time.
import time

# Use coco sample image for prediction.
IMAGE_URI='gs://builtin-algorithm-data-public/testing_images/detection/coco_sample.jpeg'


# Check available GPU devices.
print("The following GPU devices are available: %s" % tf.test.gpu_device_name())
def display_image(image):
 fig = plt.figure(figsize=(20, 15))
 plt.grid(False)
 plt.imshow(image)


def draw_bounding_box_on_image(image,
                              ymin,
                              xmin,
                              ymax,
                              xmax,
                              color,
                              font,
                              thickness=4,
                              display_str_list=()):
 """Adds a bounding box to an image."""
 draw = ImageDraw.Draw(image)
 im_width, im_height = image.size
 (left, right, top, bottom) = (xmin * im_width, xmax * im_width,
                               ymin * im_height, ymax * im_height)
 draw.line([(left, top), (left, bottom), (right, bottom), (right, top),
            (left, top)],
           width=thickness,
           fill=color)

 # If the total height of the display strings added to the top of the bounding
 # box exceeds the top of the image, stack the strings below the bounding box
 # instead of above.
 display_str_heights = [font.getsize(ds)[1] for ds in display_str_list]
 # Each display_str has a top and bottom margin of 0.05x.
 total_display_str_height = (1 + 2 * 0.05) * sum(display_str_heights)

 if top > total_display_str_height:
   text_bottom = top
 else:
   text_bottom = bottom + total_display_str_height
 # Reverse list and print from bottom to top.
 for display_str in display_str_list[::-1]:
   text_width, text_height = font.getsize(display_str)
   margin = np.ceil(0.05 * text_height)
   draw.rectangle([(left, text_bottom - text_height - 2 * margin),
                   (left + text_width, text_bottom)],
                  fill=color)
   draw.text((left + margin, text_bottom - text_height - margin),
             display_str,
             fill="black",
             font=font)
   text_bottom -= text_height - 2 * margin

img = tf.io.read_file(IMAGE_URI)
img = tf.image.decode_jpeg(img, channels=3)

def draw_boxes(image, boxes, class_names, scores, max_boxes=10, min_score=0.1):
  """Overlay labeled boxes on an image with formatted scores and label names."""
  colors = list(ImageColor.colormap.values())

  try:
    font = ImageFont.truetype("/usr/share/fonts/truetype/liberation/LiberationSansNarrow-Regular.ttf",
                              25)
  except IOError:
    print("Font not found, using default font.")
    font = ImageFont.load_default()

  for i in range(min(boxes.shape[0], max_boxes)):
    if scores[i] >= min_score:
      ymin, xmin, ymax, xmax = tuple(boxes[i])
      display_str = "{}: {}%".format(class_names[i],
                                     int(100 * scores[i]))
      color = colors[hash(class_names[i]) % len(colors)]
      image_pil = Image.fromarray(np.uint8(image)).convert("RGB")
      draw_bounding_box_on_image(
          image_pil,
          ymin,
          xmin,
          ymax,
          xmax,
          color,
          font,
          display_str_list=[display_str])
      np.copyto(image, np.array(image_pil))
  return image


predict_fn = tf.contrib.predictor.from_saved_model(
    export_dir='model',
    signature_def_key='serving_default')


with tf.gfile.FastGFile(IMAGE_URI) as img_file:
  img_data = img_file.read()
  image = [img_data]
  predictions = predict_fn({
      'encoded_image': image,
      'key': ['key']
  })

# print predictions

sess=tf.Session()
with sess.as_default():
#  display_image(img.eval())
 image_with_boxes = draw_boxes(
   img.eval(), predictions["detection_boxes"][0],
   predictions["detection_classes"][0], predictions["detection_scores"][0])

 display_image(image_with_boxes)

## Deploying trained model in Cloud AI platform for online prediction in production

After training is done, you should expect the following directory structure under your `JOB_DIR`.

*   model/
  * saved_model.pb
  * variables
  * deployment_config.yaml

The deployment_config.yaml file should contain something like :
```
deploymentUri: gs://JOB_DIR/model
framework: TENSORFLOW
labels:
  job_id: {JOB_NAME}
  gloabal_step: '1000'
runtimeVersion: '1.14'
```

Let's try to use this file to deploy our model in prediction and make predictions from it.

For more details on how to make deployments on AI platform, take a look at [how to deploy a TensorFlow model on CMLE](https://cloud.google.com/ml-engine/docs/tensorflow/deploying-models)



In [0]:
# Let's copy the file to our local directory and take a look at the file.
!gsutil cp {JOB_DIR}/model/deployment_config.yaml .


# Use the following command if it is a hyperparameter tuning job.
# TRIAL_ID=1
# !gsutil cp {JOB_DIR}/{TRIAL_ID}/model/deployment_config.yaml .

print('\nThe job deployment_config.yaml is:')
!cat deployment_config.yaml

Let's create the model and version in AI Platform:

In [0]:
!gcloud ai-platform models create {MODEL_NAME} --regions {REGION}

# Create a model and a version using the file above.
VERSION_NAME=JOB_ID

!echo "Deployment takes a couple of minutes. You can watch your deployment here: https://console.cloud.google.com/mlengine/models/{MODEL_NAME}"

# If {MODEL_NAME} and {VERSION_NAME} already exists, you can delete the version first:
# !gcloud ai-platform versions delete {VERSION_NAME} --model={MODEL_NAME}

!gcloud ai-platform versions create {VERSION_NAME} \
  --model {MODEL_NAME} \
  --config deployment_config.yaml



Now we can make prediction using the deployed model and display image with detected bounding boxes.

In [0]:
import base64
import json 

with tf.gfile.Open(IMAGE_URI, 'rb') as image_file:
  encoded_string = base64.b64encode(image_file.read()).decode('utf-8')

image_bytes = {'b64': str(encoded_string)}
instances = {'encoded_image': image_bytes, 'key': '1'}
with open("prediction_instances.json","w") as f:
  f.write(json.dumps(instances)) 
  
!gcloud ai-platform predict --model $MODEL_NAME \
 --version $VERSION_NAME \
 --json-instances prediction_instances.json > ./output.txt
 
box_str=!tail -1 output.txt | sed 's/\]\]/\]\];/g' | cut -d ';' -f 1
class_str=!tail -1 output.txt | sed 's/\]\]/\]\];/g' | cut -d ';' -f 2- | cut -d ']' -f 1 | cut -d '[' -f 2-
score_str=!tail -1 output.txt | sed 's/\]\]/\]\];/g' | cut -d ';' -f 2- | cut -d ']' -f 2 | cut -d '[' -f 2-
cnt_str=!tail -1 output.txt | sed 's/\]\]/\]\];/g' | cut -d ';' -f 2- | cut -d ']' -f 3 |cut -d ' ' -f 5-
boxes=np.array(json.loads(box_str[0]))
classes=np.fromstring(class_str[0], dtype=np.float, sep=', ')
scores=np.fromstring(score_str[0], dtype=np.float, sep=', ')
cnt=np.fromstring(cnt_str[0], dtype=np.float, sep =' ')
sess=tf.Session()
with sess.as_default():
image_with_boxes_clme = draw_boxes(img.eval(), boxes, classes, scores)
display_image(image_with_boxes_clme)
