### Automated Shiptrack detection model hyperparameter tuning

Train and deploy the shiptrack detection model using the built-in Tensorflow containers provided by Amazon SageMaker.

This is heavily based on the keras-05-keras-blog-post_Fashion MNIST-SageMaker example.  

In [1]:
import sagemaker

sess = sagemaker.Session()
role = sagemaker.get_execution_role()

## Train with Tensorflow on a GPU instance

In [2]:
from sagemaker.tensorflow import TensorFlow

metric_definitions = [
    {'Name': 'loss', 'Regex': 'loss: ([0-9\\.]+)'},
    {'Name': 'iou_score', 'Regex': 'iou_score: ([0-9\\.]+)'},
    {'Name': 'val_loss', 'Regex': 'val_loss: ([0-9\\.]+)'},
    {'Name': 'val_iou_score', 'Regex': 'val_iou_score: ([0-9\\.]+)'},
    {'Name': 'Test loss', 'Regex': 'Test loss    : ([0-9\\.]+)'},
    {'Name': 'Test accuracy', 'Regex': 'Test accuracy: ([0-9\\.]+)'}
]


In [9]:
# training_input_path = 's3://imiracli-data/MODIS_deep_cloud/compressed_training/nocrop_combined_points'
training_input_path = 's3://imiracli-data/MODIS_deep_cloud/compressed_training/nocrop_combined_points_typed_niremi'

# Things that could still be going wrong:
## TF / Keras Version
## ARguments
## The data in S3 isn't the same as I have locally 

tf_estimator = TensorFlow(entry_point='shiptrack.py', 
                          role=role,
                          train_instance_count=1, 
                          train_instance_type='ml.p3.8xlarge',
                          framework_version='1.13', 
                          py_version='py3',
                          source_dir = './shiptrack-detection/',
                          script_mode=True,
                          model_dir='/opt/ml/model',
                          metric_definitions=metric_definitions,
                          enable_cloudwatch_metrics=True,
                          hyperparameters={
                              'epochs': 60,
                              'batch-size': 8,
                              'learning-rate': 0.1,
                              'augment': True,
                              'encoder-freeze': False,
                              'backbone': "resnet152",
                              'test-prop': 5,
                              'loss': "bce_jaccard_loss"}
                         )

In [None]:
tf_estimator.fit(training_input_path)

'create_image_uri' will be deprecated in favor of 'ImageURIProvider' class in SageMaker Python SDK v2.
's3_input' class will be renamed to 'TrainingInput' in SageMaker Python SDK v2.
'create_image_uri' will be deprecated in favor of 'ImageURIProvider' class in SageMaker Python SDK v2.


2020-06-19 15:55:47 Starting - Starting the training job......
2020-06-19 15:56:18 Starting - Launching requested ML instances......
2020-06-19 15:57:22 Starting - Preparing the instances for training......
2020-06-19 15:58:20 Downloading - Downloading input data......
2020-06-19 15:59:35 Training - Downloading the training image..[34m2020-06-19 15:59:57,396 sagemaker-containers INFO     Imported framework sagemaker_tensorflow_container.training[0m
[34m2020-06-19 15:59:58,513 sagemaker-containers INFO     Installing module with the following command:[0m
[34m/usr/local/bin/python3.6 -m pip install -U . -r requirements.txt[0m
[34mProcessing /opt/ml/code[0m
[34mCollecting segmentation-models (from -r requirements.txt (line 2))[0m
[34m  Downloading https://files.pythonhosted.org/packages/da/b9/4a183518c21689a56b834eaaa45cad242d9ec09a4360b5b10139f23c63f4/segmentation_models-1.0.1-py3-none-any.whl[0m
[34mCollecting efficientnet==1.0.0 (from segmentation-models->-r requirements.t

## Configure Automatic Model Tuning

In [10]:
tf_estimator = TensorFlow(entry_point='shiptrack.py', 
                          role=role,
                          train_instance_count=1, 
                          train_instance_type='ml.p3.8xlarge',
                          framework_version='1.13', 
                          py_version='py3',
                          source_dir = './shiptrack-detection/',
                          script_mode=True,
                          metric_definitions=metric_definitions,
                          enable_cloudwatch_metrics=True,
                          hyperparameters={
#                               'epochs': 30,
                              'batch-size': 8,
#                               'learning-rate': 0.1,
                              'augment': False,
#                               'encoder-freeze': False,
#                               'backbone': "resnet152",
                              'test-prop': 5,
                              'loss': "bce_jaccard_loss"
                          }
                         )

In [11]:
from sagemaker.tuner import IntegerParameter, CategoricalParameter, ContinuousParameter, HyperparameterTuner

In [12]:
hyperparameter_ranges = {
    'epochs':        IntegerParameter(20, 60),
    'learning-rate': ContinuousParameter(0.001, 0.1, scaling_type='Logarithmic'), 
#     'batch-size':    IntegerParameter(32, 1024),
    'backbone':      CategoricalParameter(['resnet152', 'resnet50', 'resnet18']),
#     'augment':      CategoricalParameter([True, False]),
    'encoder-freeze':      CategoricalParameter([True, False])
}

objective_metric_name = 'val_iou_score'
objective_type = 'Maximize'

tuner = HyperparameterTuner(tf_estimator,
                            objective_metric_name,
                            hyperparameter_ranges,
                            metric_definitions,
                            max_jobs=10,
                            max_parallel_jobs=2,
                            objective_type=objective_type)

In [13]:
tuner.fit(training_input_path)

'create_image_uri' will be deprecated in favor of 'ImageURIProvider' class in SageMaker Python SDK v2.
'create_image_uri' will be deprecated in favor of 'ImageURIProvider' class in SageMaker Python SDK v2.
