# Training TensorPack Faster-RCNN/Mask-RCNN in Amazon SageMaker

This notebook shows how to train [TensorPack Faster-RCNN/Mask-RCNN](https://github.com/tensorpack/tensorpack/tree/master/examples/FasterRCNN) example in [Amazon SageMaker](https://aws.amazon.com/sagemaker/).

First, we import the relevant python packages and initialize boto3 client. The ```role``` variable defined below can be initialized to a specific role ARN, or the role can be created programmatically, as shown below. Programmatically creating the role below requires IAM permissions that allow the user to create a new role.

In [24]:
%%time
import boto3
import sagemaker
from sagemaker import get_execution_role
from sagemaker.estimator import Estimator

role = get_execution_role() # provide a pre-existing role ARN as an alternative to creating a new role

client = boto3.client('sts')
account = client.get_caller_identity()['Account']

session = boto3.session.Session()
region = session.region_name

Next we define the docker image name and version available in AWS ECR used by SageMaker to launch SageMaker Training Job. 

In [25]:
repo_name = 'tf_tp_hvd_sm'
image_tag = 'tf1.12-hvd0.15.2-tp2ce43d70456afd618167c9206d868c12e80e3f5a'
image_name = f'{repo_name}:{image_tag}'

training_image = f'{account}.dkr.ecr.{region}.amazonaws.com/{image_name}'
print(training_image)

In the next step, we define SageMaker input channels using S3 bucket as the source for the input. 

In [26]:
bucket = '' # customize to your bucket name
prefix = "mask-rcnn/sagemaker" # customize to your prefix in your bucket

s3train = f's3://{bucket}/{prefix}/input/train'
s3code = f's3://{bucket}/{prefix}/input/code'


code = sagemaker.session.s3_input(s3code, distribution='FullyReplicated', 
                        content_type='text/plain', s3_data_type='S3Prefix')

train = sagemaker.session.s3_input(s3train, distribution='FullyReplicated', 
                        content_type='application/tfrecord', s3_data_type='S3Prefix')


data_channels = {'train': train,  'code': code}

Next we initialize the SageMaker session and define the model output location in S3 bucket.

In [27]:
sess = sagemaker.Session()
s3_output_location = f's3://{bucket}/{prefix}/output'

Next we define the hyper-parameters. The hyper-parameter ```train_script``` defines the location of the script used by the SageMaker docker image to launch user training job. The last element of ```lr_schedule``` list specifies total number of steps for training. The ```eval_period``` specifies the epoch period for running evaluation, and ```steps_per_epoch``` variable specifies number of steps per epoch. 

In [28]:
hyperparameters = {
                    "train_script": "/opt/ml/input/data/code/train.py",
                    "batch_norm": "FreezeBN",
                    "mode_fpn": "True",
                    "mode_mask": "True",
                    "gpus_per_host": 8,
                    "eval_period": 4,
                    "lr_schedule": '[120000, 160000, 180000]',
                    "steps_per_epoch": 1875
                  }

Next we define the metric definitions using regular expressions that match log file output to extract relevant metrics and show them in AWS CloudWatch metrics.

In [29]:
metric_definitions=[
             {
                "Name": "fastrcnn_losses/box_loss",
                "Regex": ".*\\s+fastrcnn_losses/box_loss:\\s*(\\S+).*"
            },
            {
                "Name": "fastrcnn_losses/label_loss",
                "Regex": ".*\\s+fastrcnn_losses/label_loss:\\s*(\\S+).*"
            },
            {
                "Name": "fastrcnn_losses/label_metrics/accuracy",
                "Regex": ".*\\s+fastrcnn_losses/label_metrics/accuracy:\\s*(\\S+).*"
            },
            {
                "Name": "fastrcnn_losses/label_metrics/false_negative",
                "Regex": ".*\\s+fastrcnn_losses/label_metrics/false_negative:\\s*(\\S+).*"
            },
            {
                "Name": "fastrcnn_losses/label_metrics/fg_accuracy",
                "Regex": ".*\\s+fastrcnn_losses/label_metrics/fg_accuracy:\\s*(\\S+).*"
            },
            {
                "Name": "fastrcnn_losses/num_fg_label",
                "Regex": ".*\\s+fastrcnn_losses/num_fg_label:\\s*(\\S+).*"
            },
             {
                "Name": "maskrcnn_loss/accuracy",
                "Regex": ".*\\s+maskrcnn_loss/accuracy:\\s*(\\S+).*"
            },
            {
                "Name": "maskrcnn_loss/fg_pixel_ratio",
                "Regex": ".*\\s+maskrcnn_loss/fg_pixel_ratio:\\s*(\\S+).*"
            },
            {
                "Name": "maskrcnn_loss/maskrcnn_loss",
                "Regex": ".*\\s+maskrcnn_loss/maskrcnn_loss:\\s*(\\S+).*"
            },
            {
                "Name": "maskrcnn_loss/pos_accuracy",
                "Regex": ".*\\s+maskrcnn_loss/pos_accuracy:\\s*(\\S+).*"
            },
            {
                "Name": "mAP(bbox)/IoU=0.5",
                "Regex": ".*\\s+mAP\\(bbox\\)/IoU=0\\.5:\\s*(\\S+).*"
            },
            {
                "Name": "mAP(bbox)/IoU=0.5:0.95",
                "Regex": ".*\\s+mAP\\(bbox\\)/IoU=0\\.5:0\\.95:\\s*(\\S+).*"
            },
            {
                "Name": "mAP(bbox)/IoU=0.75",
                "Regex": ".*\\s+mAP\\(bbox\\)/IoU=0\\.75:\\s*(\\S+).*"
            },
            {
                "Name": "mAP(bbox)/large",
                "Regex": ".*\\s+mAP\\(bbox\\)/large:\\s*(\\S+).*"
            },
            {
                "Name": "mAP(bbox)/medium",
                "Regex": ".*\\s+mAP\\(bbox\\)/medium:\\s*(\\S+).*"
            },
            {
                "Name": "mAP(bbox)/small",
                "Regex": ".*\\s+mAP\\(bbox\\)/small:\\s*(\\S+).*"
            }       
            
    ]

Next we define SageMaker Estimator, which is used to define a SageMaker Training Job.

In [30]:

obj_detection = Estimator(training_image,
                                         role, 
                                         train_instance_count=2, 
                                         train_instance_type='ml.p3.16xlarge',
                                         train_volume_size = 200,
                                         train_max_run = 400000,
                                         input_mode= 'File',
                                         output_path=s3_output_location,
                                         sagemaker_session=sess, 
                                         hyperparameters = hyperparameters,
                                         metric_definitions = metric_definitions,
                                         base_job_name="mask-rcnn")



Finally, we launch the SageMaker training job.

In [None]:
obj_detection.fit(inputs=data_channels, logs=True)