# Distributed Training of Mask-RCNN on Amazon SageMaker using FSx


---

This notebook's CI test result for us-west-2 is as follows. CI test results in other regions can be found at the end of the notebook. 

![This us-west-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-west-2/advanced_functionality|distributed_tensorflow_mask_rcnn|mask-rcnn-scriptmode-fsx.ipynb)

---


This notebook is a step-by-step tutorial on distributed training of [Mask R-CNN]( https://arxiv.org/abs/1703.06870) implemented in [TensorFlow](https://www.tensorflow.org/) framework.

Concretely, we will describe the steps for training [TensorPack Faster-RCNN/Mask-RCNN](https://github.com/tensorpack/tensorpack/tree/master/examples/FasterRCNN) and [AWS Samples Mask R-CNN](https://github.com/aws-samples/mask-rcnn-tensorflow) on [Amazon SageMaker](https://aws.amazon.com/sagemaker/) using [Amazon S3](https://aws.amazon.com/s3/)  and [Amazon FSx for Lustre](https://aws.amazon.com/fsx/lustre/) file-system as data sources.

The outline of steps is as follows:

1. Stage COCO 2017 dataset on [Amazon S3](https://aws.amazon.com/s3/)
2. Create Amazon FSx Lustre file-system and import data into the file-system from S3
3. Build Docker training image and push it to [Amazon ECR](https://aws.amazon.com/ecr/)
4. Configure data input channels
5. Configure hyper-prarameters
6. Define training metrics
7. Define training job and start training


## Initialize SageMaker Session

First, let us specify the ```s3_bucket``` that we will use throughout the notebook. The ```s3_bucket``` must be located in the region of this notebook instance. If you do not specify S3 bucket name in `s3_bucket`, **default SageMaker bucket is used, if it exists**. We also initialize the SageMaker session.

In [None]:
import os
import time
import boto3
import sagemaker
from sagemaker import get_execution_role
from sagemaker.tensorflow.estimator import TensorFlow

s3_bucket  = None # your-s3-bucket-name

role = get_execution_role() # you may provide a pre-existing role ARN here
print(f"SageMaker Execution Role: {role}")

session = boto3.session.Session()
aws_region = session.region_name
print(f"AWS Region: {aws_region}")

sagemaker_session = sagemaker.session.Session(boto_session=session)

if s3_bucket is None:
    s3_bucket = sagemaker_session.default_bucket()
    
print(f"Using S3 bucket: {s3_bucket}")

try:
    s3_client = boto3.client('s3')
    response = s3_client.get_bucket_location(Bucket=s3_bucket)
    bucket_region = response['LocationConstraint']
    bucket_region = 'us-east-1' if bucket_region is None else bucket_region
    
    print(f"Bucket region: {bucket_region}")
except:
    print(f"Access Error: Check if '{s3_bucket}' S3 bucket is in '{aws_region}' region")
    
sts = boto3.client("sts")
aws_account_id = sts.get_caller_identity()["Account"]

print(f"Account: {aws_account_id}")

## Configure VPC Security Group and Subnet

To use FSx Lustre file-system, you need to configure the VPC security group, and subnet. If you do not specify a VPC security group, and subnet, the notebook will use Amazon S3 data channel.

In [None]:
security_group_id =  None # 'sg-xxxxxxxx'
subnet_id =  None # 'subnet-xxxxxxx'

## Stage COCO 2017 dataset on Amazon S3

We use [COCO 2017 dataset](http://cocodataset.org/#home) for training. We download COCO 2017 training and validation dataset to this notebook instance, extract the files from the dataset archives, and upload the extracted files to your Amazon [S3 bucket](https://docs.aws.amazon.com/AmazonS3/latest/gsg/CreatingABucket.html). We only do this once for a given S3 bucket. Expected time to execute this step is 30 minutes.

In [None]:
%%time

import sys, os, subprocess

key="mask-rcnn/sagemaker/input/train/pretrained-models/ImageNet-R50-AlignPadding.npz"
response = None

try:
    response = s3_client.head_object(Bucket=s3_bucket, Key=key)
except:
    pass

file_size = response.get('ContentLength', 0) if response else 0

if file_size == 0:
    print(f"Uploading data to s3://{s3_bucket}/mask-rcnn/sagemaker/input/train/")
    print(f"Estimated time: 30 minutes")
    subprocess.check_call(['./prepare-s3-bucket.sh', s3_bucket], 
                          stderr=subprocess.DEVNULL, stdout=subprocess.DEVNULL)
    print(f"Uploaded data to s3://{s3_bucket}/mask-rcnn/sagemaker/input/train/")
else:
    print("Nothing to do: S3 bucket already has the data")

## Create FSx Lustre file-system and import data from S3

Below, we use [AWS CloudFormation stack](https://docs.aws.amazon.com/en_pv/AWSCloudFormation/latest/UserGuide/stacks.html) to create a FSx Lustre file-system and import COCO 2017 dataset into the FSx file-system from your S3 bucket. **If you did not specify a VPC security group and subnet above, the notebook will use S3 data channel.**

#### Note
For this step, the [IAM Role](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles.html) attached to this notebook instance needs full access to Amazon CloudFormation and FSx services. If you created this notebook instance using the [stack-sm.sh](stack-sm.sh) script, the IAM Role attached to this notebook instance is already setup with required access. 

In [None]:
import time

fsx_stack_name = f"fsx-{str(time.time()).replace('.','-')}"

fsx_file_system_id = None
if security_group_id and subnet_id:
    print(f"Creating FSx stack: {fsx_stack_name}")
    subprocess.check_call(['./stack-fsx.sh', 
                           aws_region, 
                           f"s3://{s3_bucket}/mask-rcnn/sagemaker/input", 
                           "3600", 
                           subnet_id, 
                           security_group_id,
                            fsx_stack_name], 
                          stderr=subprocess.DEVNULL)
    
    cloudformation = boto3.client("cloudformation")
    response = cloudformation.describe_stacks(StackName=fsx_stack_name)
    stacks = response.get('Stacks', None)
    
    if stacks:
        fsx_stack = stacks[0]
        outputs = fsx_stack['Outputs']
        for output in outputs:
            key = output['OutputKey']
            value = output['OutputValue']

            if key == 'FSxFileSystemId':
                fsx_file_system_id = value
                break

    print(f"Created FSx Lustre: {fsx_file_system_id}")
else:
    print(f"FSx Lustre not created. Security group: {security_group_id}, subnet: {subnet_id}")
    

## Specify Model Type

We have a choice of two different models:

1. [TensorPack Faster-RCNN/Mask-RCNN](https://github.com/tensorpack/tensorpack/tree/master/examples/FasterRCNN) implementation supports a maximum per-GPU batch size of 1.

2. [AWS Samples Mask R-CNN](https://github.com/aws-samples/mask-rcnn-tensorflow) is an optimized implementation that supports a maximum per GPU batch size of 4, assuming per GPU memory of 32 GB.

Below, set the `model_type` to `"aws-samples-mask-rcnn"`, or `"tensorpack-mask-rcnn"`.


In [None]:
# Select the model type you want to use
model_type = "aws-samples-mask-rcnn" # "tensorpack-mask-rcnn"

## Build and push SageMaker Training Image to ECR

Next, we build and push the training image to Amazon ECR, based on the selected model type. This may take several minutes on first-time build on this notebook. We also set the `training_script` based on the selected model type.

**Note:**
For this step, the [IAM Role](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles.html) attached to this notebook instance needs full access to Amazon ECR service. If you created this notebook instance using the [stack-sm.sh](stack-sm.sh) script, the IAM Role attached to this notebook instance is already setup with full access to ECR service. 

In [None]:
%%time
import sys, os, subprocess

with open("training-image-build.log", "w") as logfile:
    if "tensorpack" in model_type:
        print("Building and pushing Tensorpack Faster-RCNN/Mask-RCNN docker image to ECR")
        subprocess.check_call(['./container-script-mode/build_tools/build_and_push.sh', 
                               aws_region], stdout=logfile, stderr=subprocess.STDOUT)
        
        image_tag = !cat ./container-script-mode/build_tools/set_env.sh \
            | grep 'IMAGE_TAG' | sed 's/.*IMAGE_TAG=\(.*\)/\1/'
        
        image_name="mask-rcnn-tensorpack-sagemaker-script-mode"
        full_name=f"{aws_account_id}.dkr.ecr.{aws_region}.amazonaws.com/{image_name}"
        tensorpack_image = f"{full_name}:{image_tag[0]}"
        training_image = tensorpack_image
        training_script= "tensorpack-mask-rcnn.py"

    else:
        print("Building and pushing AWS Samples Mask R-CNN docker image to ECR")
        subprocess.check_call(['./container-optimized-script-mode/build_tools/build_and_push.sh',
                               aws_region], stdout=logfile, stderr=subprocess.STDOUT)
        
        image_tag = !cat ./container-optimized-script-mode/build_tools/set_env.sh \
            | grep 'IMAGE_TAG' | sed 's/.*IMAGE_TAG=\(.*\)/\1/'
        
        image_name="mask-rcnn-tensorflow-sagemaker-script-mode"
        full_name=f"{aws_account_id}.dkr.ecr.{aws_region}.amazonaws.com/{image_name}"
        aws_samples_image = f"{full_name}:{image_tag[0]}"
       
        training_image = aws_samples_image
        training_script= "aws-mask-rcnn.py" 

print(f"Training Image: {training_image}")
print(f"Training Script: {training_script}")


## Define SageMaker Data Channels

We define `train` data channels for Amazon S3, and Amazon FSx, if FSx Lustre file-system is available. 

For the training job, S3 data channel is used only if the FSx Lustre file-system is not available.

### Define S3 Train Data Channel

We first define S3 `train` data channel below.

In [None]:
from sagemaker.inputs import TrainingInput

prefix = "mask-rcnn/sagemaker"  # prefix in your S3 bucket

s3train = f"s3://{s3_bucket}/{prefix}/input/train"
train_input = TrainingInput(
    s3_data=s3train, distribution="FullyReplicated", s3_data_type="S3Prefix", input_mode="File"
)

s3_data_channels = {"train": train_input}

### Define Amazon FSx Lustre Train Data Channel 

Next, we define the *train* data channel using FSx Lustre file-system, if Amazon FSx Lustre file-system is available.

In [None]:
from sagemaker.inputs import FileSystemInput

fsx_data_channels = None

if fsx_file_system_id:
    # Specify directory path for input data on the file system. 
    # You need to provide normalized and absolute path below.
    file_system_directory_path = '/fsx/mask-rcnn/sagemaker/input/train'
    print(f'FSx file-system data input path: {file_system_directory_path}')

    # Specify the access mode of the mount of the directory associated with the file system. 
    # Directory must be mounted 'ro'(read-only).
    file_system_access_mode = 'ro'

    # Specify your file system type.
    file_system_type = 'FSxLustre'

    train = FileSystemInput(file_system_id=fsx_file_system_id,
                                        file_system_type=file_system_type,
                                        directory_path=file_system_directory_path,
                                        file_system_access_mode=file_system_access_mode)

    fsx_data_channels = {'train': train}

    # Optionally you can create a log channel on FSx file-system
    # To create a log channel, follow the steps below and uncomment relevant code
    # Specify directory path for log output on the file system.
    # This directory must exist, be empty and be writable
    # You need to provide normalized and absolute path below.
    # For example, '/fsx/mask-rcnn/sagemaker/output/log', 
    # assuming this directory exists, is empty and is writeable
    # file_system_directory_path = 


    # Specify the access mode of the mount of the directory associated with the file system. 
    # Directory must be mounted 'rw'(read-write).
    # file_system_access_mode = 'rw'

    # log = FileSystemInput(file_system_id=fsx_file_system_id,
    #                       file_system_type=file_system_type,
    #                       directory_path=file_system_directory_path,
    #                       file_system_access_mode=file_system_access_mode)

    #fsx_data_channels = {'train': train, 'log': log}
else:
    print("FSx for Lustre file-system is not available")

### Define Model Output Location

Next, we define the model output location in S3 bucket.

In [None]:
prefix = "mask-rcnn/sagemaker"  # prefix in your bucket
s3_output_location = f"s3://{s3_bucket}/{prefix}/output"
print(f"Model output location: {s3_output_location}")

## Configure Hyper-parameters
Next we define the hyper-parameters. 

Note, some hyper-parameters are different between the two implementations. The batch size per GPU in TensorPack Faster-RCNN/Mask-RCNN is fixed at 1, but is configurable in AWS Samples Mask-RCNN. The learning rate schedule is specified in units of steps in TensorPack Faster-RCNN/Mask-RCNN, but in epochs in AWS Samples Mask-RCNN.

The default learning rate schedule values shown below correspond to training for a total of 24 epochs, at 120,000 images per epoch.

<table align='left'>
    <caption>TensorPack Faster-RCNN/Mask-RCNN Hyper-parameters</caption>
    <tr>
    <th style="text-align:center">Hyper-parameter</th>
    <th style="text-align:center">Description</th>
    <th style="text-align:center">Default</th>
    </tr>
    <tr>
        <td style="text-align:center">mode_fpn</td>
        <td style="text-align:left">Flag to indicate use of Feature Pyramid Network (FPN) in the Mask R-CNN model backbone</td>
        <td style="text-align:center">"True"</td>
    </tr>
     <tr>
        <td style="text-align:center">mode_mask</td>
        <td style="text-align:left">A value of "False" means Faster-RCNN model, "True" means Mask R-CNN moodel</td>
        <td style="text-align:center">"True"</td>
    </tr>
     <tr>
        <td style="text-align:center">eval_period</td>
        <td style="text-align:left">Number of epochs period for evaluation during training</td>
        <td style="text-align:center">1</td>
    </tr>
    <tr>
        <td style="text-align:center">lr_schedule</td>
        <td style="text-align:left">Learning rate schedule in training steps</td>
        <td style="text-align:center">'[240000, 320000, 360000]'</td>
    </tr>
    <tr>
        <td style="text-align:center">batch_norm</td>
        <td style="text-align:left">Batch normalization option ('FreezeBN', 'SyncBN', 'GN', 'None') </td>
        <td style="text-align:center">'FreezeBN'</td>
    </tr>
    <tr>
        <td style="text-align:center">images_per_epoch</td>
        <td style="text-align:left">Images per epoch </td>
        <td style="text-align:center">120000</td>
    </tr>
    <tr>
        <td style="text-align:center">data_train</td>
        <td style="text-align:left">Training data under data directory</td>
        <td style="text-align:center">'coco_train2017'</td>
    </tr>
    <tr>
        <td style="text-align:center">data_val</td>
        <td style="text-align:left">Validation data under data directory</td>
        <td style="text-align:center">'coco_val2017'</td>
    </tr>
    <tr>
        <td style="text-align:center">resnet_arch</td>
        <td style="text-align:left">Must be 'resnet50' or 'resnet101'</td>
        <td style="text-align:center">'resnet50'</td>
    </tr>
    <tr>
        <td style="text-align:center">backbone_weights</td>
        <td style="text-align:left">ResNet backbone weights</td>
        <td style="text-align:center">'ImageNet-R50-AlignPadding.npz'</td>
    </tr>
    <tr>
        <td style="text-align:center">load_model</td>
        <td style="text-align:left">Pre-trained model to load</td>
        <td style="text-align:center"></td>
    </tr>
    <tr>
        <td style="text-align:center">config:</td>
        <td style="text-align:left">Any hyperparamter prefixed with <b>config:</b> is set as a model config parameter</td>
        <td style="text-align:center"></td>
    </tr>
</table>

    
<table align='left'>
    <caption>AWS Samples Mask-RCNN Hyper-parameters</caption>
    <tr>
    <th style="text-align:center">Hyper-parameter</th>
    <th style="text-align:center">Description</th>
    <th style="text-align:center">Default</th>
    </tr>
    <tr>
        <td style="text-align:center">mode_fpn</td>
        <td style="text-align:left">Flag to indicate use of Feature Pyramid Network (FPN) in the Mask R-CNN model backbone</td>
        <td style="text-align:center">"True"</td>
    </tr>
     <tr>
        <td style="text-align:center">mode_mask</td>
        <td style="text-align:left">A value of "False" means Faster-RCNN model, "True" means Mask R-CNN moodel</td>
        <td style="text-align:center">"True"</td>
    </tr>
     <tr>
        <td style="text-align:center">eval_period</td>
        <td style="text-align:left">Number of epochs period for evaluation during training</td>
        <td style="text-align:center">1</td>
    </tr>
    <tr>
        <td style="text-align:center">lr_epoch_schedule</td>
        <td style="text-align:left">Learning rate schedule in epochs</td>
        <td style="text-align:center">'[(16, 0.1), (20, 0.01), (24, None)]'</td>
    </tr>
    <tr>
        <td style="text-align:center">batch_size_per_gpu</td>
        <td style="text-align:left">Batch size per gpu ( Minimum 1, Maximum 4)</td>
        <td style="text-align:center">4</td>
    </tr>
    <tr>
        <td style="text-align:center">batch_norm</td>
        <td style="text-align:left">Batch normalization option ('FreezeBN', 'SyncBN', 'GN', 'None') </td>
        <td style="text-align:center">'FreezeBN'</td>
    </tr>
    <tr>
        <td style="text-align:center">images_per_epoch</td>
        <td style="text-align:left">Images per epoch </td>
        <td style="text-align:center">120000</td>
    </tr>
    <tr>
        <td style="text-align:center">data_train</td>
        <td style="text-align:left">Training data under data directory</td>
        <td style="text-align:center">'train2017'</td>
    </tr>
    <tr>
        <td style="text-align:center">data_val</td>
        <td style="text-align:left">Validation data under data directory</td>
        <td style="text-align:center">'val2017'</td>
    </tr>
    <tr>
        <td style="text-align:center">resnet_arch</td>
        <td style="text-align:left">Must be 'resnet50' or 'resnet101'</td>
        <td style="text-align:center">'resnet50'</td>
    </tr>
    <tr>
        <td style="text-align:center">backbone_weights</td>
        <td style="text-align:left">ResNet backbone weights</td>
        <td style="text-align:center">'ImageNet-R50-AlignPadding.npz'</td>
    </tr>
    <tr>
        <td style="text-align:center">load_model</td>
        <td style="text-align:left">Pre-trained model to load</td>
        <td style="text-align:center"></td>
    </tr>
    <tr>
        <td style="text-align:center">config:</td>
        <td style="text-align:left">Any hyperparamter prefixed with <b>config:</b> is set as a model config parameter</td>
        <td style="text-align:center"></td>
    </tr>
</table>

In [None]:
hyperparameters = {
    "mode_fpn": "True",
    "mode_mask": "True",
    "eval_period": 1,
    "batch_norm": "FreezeBN"
}

## Define Training Metrics
Next, we define the regular expressions that SageMaker uses to extract algorithm metrics from training logs and send them to [AWS CloudWatch metrics](https://docs.aws.amazon.com/en_pv/AmazonCloudWatch/latest/monitoring/working_with_metrics.html). These algorithm metrics are visualized in SageMaker console.

In [None]:
metric_definitions = [
    {"Name": "fastrcnn_losses/box_loss", "Regex": ".*fastrcnn_losses/box_loss:\\s*(\\S+).*"},
    {"Name": "fastrcnn_losses/label_loss", "Regex": ".*fastrcnn_losses/label_loss:\\s*(\\S+).*"},
    {
        "Name": "fastrcnn_losses/label_metrics/accuracy",
        "Regex": ".*fastrcnn_losses/label_metrics/accuracy:\\s*(\\S+).*",
    },
    {
        "Name": "fastrcnn_losses/label_metrics/false_negative",
        "Regex": ".*fastrcnn_losses/label_metrics/false_negative:\\s*(\\S+).*",
    },
    {
        "Name": "fastrcnn_losses/label_metrics/fg_accuracy",
        "Regex": ".*fastrcnn_losses/label_metrics/fg_accuracy:\\s*(\\S+).*",
    },
    {
        "Name": "fastrcnn_losses/num_fg_label",
        "Regex": ".*fastrcnn_losses/num_fg_label:\\s*(\\S+).*",
    },
    {"Name": "maskrcnn_loss/accuracy", "Regex": ".*maskrcnn_loss/accuracy:\\s*(\\S+).*"},
    {
        "Name": "maskrcnn_loss/fg_pixel_ratio",
        "Regex": ".*maskrcnn_loss/fg_pixel_ratio:\\s*(\\S+).*",
    },
    {"Name": "maskrcnn_loss/maskrcnn_loss", "Regex": ".*maskrcnn_loss/maskrcnn_loss:\\s*(\\S+).*"},
    {"Name": "maskrcnn_loss/pos_accuracy", "Regex": ".*maskrcnn_loss/pos_accuracy:\\s*(\\S+).*"},
    {"Name": "mAP(bbox)/IoU=0.5", "Regex": ".*mAP\\(bbox\\)/IoU=0\\.5:\\s*(\\S+).*"},
    {"Name": "mAP(bbox)/IoU=0.5:0.95", "Regex": ".*mAP\\(bbox\\)/IoU=0\\.5:0\\.95:\\s*(\\S+).*"},
    {"Name": "mAP(bbox)/IoU=0.75", "Regex": ".*mAP\\(bbox\\)/IoU=0\\.75:\\s*(\\S+).*"},
    {"Name": "mAP(bbox)/large", "Regex": ".*mAP\\(bbox\\)/large:\\s*(\\S+).*"},
    {"Name": "mAP(bbox)/medium", "Regex": ".*mAP\\(bbox\\)/medium:\\s*(\\S+).*"},
    {"Name": "mAP(bbox)/small", "Regex": ".*mAP\\(bbox\\)/small:\\s*(\\S+).*"},
    {"Name": "mAP(segm)/IoU=0.5", "Regex": ".*mAP\\(segm\\)/IoU=0\\.5:\\s*(\\S+).*"},
    {"Name": "mAP(segm)/IoU=0.5:0.95", "Regex": ".*mAP\\(segm\\)/IoU=0\\.5:0\\.95:\\s*(\\S+).*"},
    {"Name": "mAP(segm)/IoU=0.75", "Regex": ".*mAP\\(segm\\)/IoU=0\\.75:\\s*(\\S+).*"},
    {"Name": "mAP(segm)/large", "Regex": ".*mAP\\(segm\\)/large:\\s*(\\S+).*"},
    {"Name": "mAP(segm)/medium", "Regex": ".*mAP\\(segm\\)/medium:\\s*(\\S+).*"},
    {"Name": "mAP(segm)/small", "Regex": ".*mAP\\(segm\\)/small:\\s*(\\S+).*"},
]

## Define SageMaker Training Job

Next, we use SageMaker [Estimator](https://sagemaker.readthedocs.io/en/stable/estimators.html) API to define a SageMaker Training Job. 


### Define training job

We recommned using 16 GPUs for the training job, so we set ```instance_count=2```. We recommend using 100 GB [Amazon EBS](https://aws.amazon.com/ebs/) storage volume with each training instance, so we set ```volume_size = 100```. 

In [None]:

security_group_ids = [ security_group_id ] if security_group_id else None 
subnets = [ subnet_id ] if subnet_id else None

instance_type = 'ml.p3.16xlarge'  # You may optionally use 'ml.p3dn.24xlarge' or larger instance
assert instance_type in ['ml.p3.16xlarge', 'ml.p3dn.24xlarge']

if 'aws-samples' in model_type:
    hyperparameters['batch_size_per_gpu'] = 2 if instance_type == 'ml.p3.16xlarge' else 4

mpi_distribution = None
instance_count = 2 # Between 1 - 4
if instance_count > 1:
    device_min_sys_mem_mb = 2560
    custom_mpi_options = f"--verbose --output-filename /opt/ml/model/logs \
        -x TF_DEVICE_MIN_SYS_MEMORY_IN_MB={device_min_sys_mem_mb}"
    mpi_distribution = {"mpi": { "enabled": True, "custom_mpi_options": custom_mpi_options } }   


mask_rcnn_estimator = TensorFlow(image_uri=training_image,
                                role=role, 
                                py_version='py3',
                                instance_count=instance_count, 
                                instance_type=instance_type,
                                distribution=mpi_distribution,
                                entry_point=training_script,
                                volume_size = 100,
                                max_run = 400000,
                                output_path=s3_output_location,
                                sagemaker_session=sagemaker_session, 
                                hyperparameters = hyperparameters,
                                metric_definitions = metric_definitions,
                                subnets=subnets,
                                security_group_ids=security_group_ids)



### Launch Training Job

Finally, we launch the SageMaker training job. See ```Training Jobs``` in SageMaker console to monitor the training job.

In [None]:
import time

job_name = f"mask-rcnn-fsx-scriptmode-{int(time.time())}"
print(f"Launching Training Job: {job_name}")

data_channels = fsx_data_channels if fsx_data_channels else s3_data_channels

# set wait=True below if you want to print logs in cell output
mask_rcnn_estimator.fit(inputs=data_channels, job_name=job_name, logs="All", wait=False)  

## Notebook CI Test Results

This notebook was tested in multiple regions. The test results are as follows, except for us-west-2 which is shown at the top of the notebook.

![This us-east-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-east-1/advanced_functionality|distributed_tensorflow_mask_rcnn|mask-rcnn-scriptmode-fsx.ipynb)

![This us-east-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-east-2/advanced_functionality|distributed_tensorflow_mask_rcnn|mask-rcnn-scriptmode-fsx.ipynb)

![This us-west-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-west-1/advanced_functionality|distributed_tensorflow_mask_rcnn|mask-rcnn-scriptmode-fsx.ipynb)

![This ca-central-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ca-central-1/advanced_functionality|distributed_tensorflow_mask_rcnn|mask-rcnn-scriptmode-fsx.ipynb)

![This sa-east-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/sa-east-1/advanced_functionality|distributed_tensorflow_mask_rcnn|mask-rcnn-scriptmode-fsx.ipynb)

![This eu-west-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-west-1/advanced_functionality|distributed_tensorflow_mask_rcnn|mask-rcnn-scriptmode-fsx.ipynb)

![This eu-west-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-west-2/advanced_functionality|distributed_tensorflow_mask_rcnn|mask-rcnn-scriptmode-fsx.ipynb)

![This eu-west-3 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-west-3/advanced_functionality|distributed_tensorflow_mask_rcnn|mask-rcnn-scriptmode-fsx.ipynb)

![This eu-central-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-central-1/advanced_functionality|distributed_tensorflow_mask_rcnn|mask-rcnn-scriptmode-fsx.ipynb)

![This eu-north-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-north-1/advanced_functionality|distributed_tensorflow_mask_rcnn|mask-rcnn-scriptmode-fsx.ipynb)

![This ap-southeast-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-southeast-1/advanced_functionality|distributed_tensorflow_mask_rcnn|mask-rcnn-scriptmode-fsx.ipynb)

![This ap-southeast-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-southeast-2/advanced_functionality|distributed_tensorflow_mask_rcnn|mask-rcnn-scriptmode-fsx.ipynb)

![This ap-northeast-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-northeast-1/advanced_functionality|distributed_tensorflow_mask_rcnn|mask-rcnn-scriptmode-fsx.ipynb)

![This ap-northeast-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-northeast-2/advanced_functionality|distributed_tensorflow_mask_rcnn|mask-rcnn-scriptmode-fsx.ipynb)

![This ap-south-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-south-1/advanced_functionality|distributed_tensorflow_mask_rcnn|mask-rcnn-scriptmode-fsx.ipynb)
