This notebook is based on the preprocessing already locally performed. Training and Validation sets of images and LST files have been developed and placed in the correct structure, on S3. 

At this point, we can now simply perform the multiclass image classification training.

# Import

## Libraries

In [1]:
import time as t
import os
import json
import sagemaker
import boto3
from sagemaker import get_execution_role
from sagemaker.amazon.amazon_estimator import get_image_uri

# Set-Up

## Establish AWS Parameters
This step establishes AWS parameters used through this notebook.

In [2]:
role = 'arn:aws:iam::726963482731:role/dsba_6190_team_project'

bucket = "dsba-6190-final-team-project"
prefix_1 = "channels"
prefix_file_type = "rec"

sess_sage = sagemaker.Session()
sm_client = boto3.client('sagemaker')

arn:aws:iam::726963482731:role/dsba_6190_team_project


## Import Sagemaker Model
This step imports the latest version of the Amazon Sagemaker Image Classification model.

In [3]:
training_image = get_image_uri(sess_sage.boto_region_name, 'image-classification', repo_version="latest")
print (training_image)

811284229777.dkr.ecr.us-east-1.amazonaws.com/image-classification:latest


# Model Training
Two different data sets have been uploaded to S3. One is the complete dataset. The other is a 10% sample of the dataset. The 10% sample is for troubleshooting training and deployment of the Sagemaker Image Classification algorithm.

There are only two differences between training the model with the sample or complete dataset:

* __Input Location__: We need to point the algorithm to different S3 locations. We will do this with the **prefix_dataset** variable, which will be defined at the beginning of each dataset's notebook section.
* __Number of Training Samples__: The number of training samples will be different for the complete and the sample. Thes values are available in the Jupyter Notebook used to split the data and upload to S3.

We will define the number of **training** samples for each dataset below. 

**Note**: *Currently this is a manual process. Future iterations of this process will automate this calculation.*

In [4]:
num_training_samples_complete = 15686
num_training_samples_10 = 1567

## Define Dataset
This section defines the parameters of the dataset. By setting the split prefix and dataset prefix, it will direct the algorithm to the correct training and validation inputs. 

There are two varables which require definition:

1. **Dataset**: The dataset is either the complete dataset, or it is the 10% sample dataset. The 10% sample was created for troubleshooting purposes. Final production will use the complete dataset.
2. **Train/Validation Split Method**: Two different methods were developed to split the training data into a training and validation set. See the image processing notebook for more detail.
 * im2rec: This method was a random split, using the **im2rec.py** tool
 * split_drivers: This method divided the drivers into a training and validation set. Then, all the images associated with each driver are put into image training and validation sets. Using this method, all of the images associated with a driver are in either the training or validation set. No driver appears in both sets.

In [5]:
# Define Lists and Dictionary
list_dataset = ["complete", "sample",]
list_split_method = ["split_random", "split_driver"]

training_sample_dict = {
    "sample-split_random" : num_training_samples_10,
    "sample-split_driver": num_training_samples_10, 
    "complete-split_random": num_training_samples_complete,
    "complete-split_driver": num_training_samples_complete    
}

# Define Data Inputs
prefix_dataset = list_dataset[1] #0 = complete / 1 = sample
prefix_split_type = list_split_method[0]  #0 = split_random / 1 = split_drivers

# Extract Number of Training Samples
key_training_sample = prefix_dataset + "-" +prefix_split_type
num_training_samples = training_sample_dict[key_training_sample]

print("The following are the inputs for the model:")
print("Split Method:\t\t\t{}".format(prefix_split_type))
print("Dataset:\t\t\t{}".format(prefix_dataset))
print("# of Training Samples:\t\t{}".format(num_training_samples))

The following are the inputs for the model:
Split Method:			split_random
Dataset:			sample
# of Training Samples:		1567


## Model Inputs

### Model Output Location

In [6]:
s3_output_location = 's3://{}/output'.format(bucket)
print(s3_output_location)

s3://dsba-6190-final-team-project/output


### Model Input Location

First we establish the data input channels. As we are using RecordIO data format, only two channels are required.

In [7]:
s3train = 's3://{}/{}/{}/{}/train/'.format(bucket, prefix_1, prefix_split_type, prefix_dataset)
s3validation = 's3://{}/{}/{}/{}/validation/'.format(bucket, prefix_1, prefix_split_type, prefix_dataset)

print("The input data is pulled from the following S3 locations:")
print("Training:\t{}".format(s3train))
print("Validation:\t{}".format(s3validation))

The input data is pulled from the following S3 locations:
Training:	s3://dsba-6190-final-team-project/channels/split_random/sample/train/
Validation:	s3://dsba-6190-final-team-project/channels/split_random/sample/validation/


Then we define the channels as inputs into the image classification model.

In [8]:
train_data = sagemaker.session.s3_input(s3train, 
                                        distribution='FullyReplicated', 
                                        content_type='application/x-recordio', 
                                        s3_data_type='S3Prefix')

validation_data = sagemaker.session.s3_input(s3validation, 
                                             distribution='FullyReplicated', 
                                             content_type='application/x-recordio', 
                                             s3_data_type='S3Prefix')

data_channels = {'train': train_data, 
                 'validation': validation_data}

print(data_channels)

{'train': <sagemaker.inputs.s3_input object at 0x7f74b406b208>, 'validation': <sagemaker.inputs.s3_input object at 0x7f74b406b128>}


## Train Model

## Set Up Instance Types

In [9]:
# Available Instances
available_instances =['ml.p2.xlarge',              ### $1.26/hr
                      'ml.p3.2xlarge'              ### 4.284 /hr
                     ]

# Initialize Instance
train_instance_type = available_instances[1]

# Print Check
print("This training session used the following instance: {}".format(train_instance_type))

This training session used the following instance: ml.p3.2xlarge


### Initialize
#### Parameters
The following steps define the algoritm parameters and hyperparameters.

In [10]:
dist_drive_ic = sagemaker.estimator.Estimator(training_image,
                                              role, 
                                              train_instance_count=1, 
                                              train_instance_type=train_instance_type,
                                              train_volume_size = 50,
                                              train_max_run = 360000,
                                              input_mode= 'File',
                                              output_path=s3_output_location,
                                              sagemaker_session=sess_sage)

#### Hyper-Parameters

In [11]:
dist_drive_ic.set_hyperparameters(num_layers = 18,
                                  use_pretrained_model = 1,
                                  image_shape = "3,210,280", #RGB Pictures, 210 x 280
                                  num_classes = 10,
                                  mini_batch_size = 128,
                                  epochs = 2,
                                  learning_rate = 0.01,
                                  num_training_samples = num_training_samples,
                                  precision_dtype = 'float16')

### Run Model
With the data inputs defined, parameters and hyperparameters initialized, we can run the model.

In [12]:
dist_drive_ic.fit(inputs = data_channels, logs = True)

2020-04-22 20:37:28 Starting - Starting the training job...
2020-04-22 20:37:30 Starting - Launching requested ML instances......
2020-04-22 20:38:36 Starting - Preparing the instances for training......
2020-04-22 20:39:45 Downloading - Downloading input data...
2020-04-22 20:40:01 Training - Downloading the training image.....[34mDocker entrypoint called with argument(s): train[0m
[34m[04/22/2020 20:41:05 INFO 140660904576832] Reading default configuration from /opt/amazon/lib/python2.7/site-packages/image_classification/default-input.json: {u'beta_1': 0.9, u'gamma': 0.9, u'beta_2': 0.999, u'optimizer': u'sgd', u'use_pretrained_model': 0, u'eps': 1e-08, u'epochs': 30, u'lr_scheduler_factor': 0.1, u'num_layers': 152, u'image_shape': u'3,224,224', u'precision_dtype': u'float32', u'mini_batch_size': 32, u'weight_decay': 0.0001, u'learning_rate': 0.1, u'momentum': 0}[0m
[34m[04/22/2020 20:41:05 INFO 140660904576832] Merging with provided configuration from /opt/ml/input/config/hyper

## Model

### Establish Parameters

In [13]:
# Define Tag For Future Use
aws_component_name = "image-classification-drivers"

In [14]:
# Training Job
training_job_name = dist_drive_ic._current_job_name
print("Training Job Name: {}".format(training_job_name))
print()

# Extract Training Job Information
info = sm_client.describe_training_job(TrainingJobName=training_job_name)
print("Training Job Information:")
#print(info)
print()

# Define S3 Location for Model Artifacts
model_s3_loc = info['ModelArtifacts']['S3ModelArtifacts']
print("Model S3 Location: {}".format(model_s3_loc))

# Define Primary Container
primary_container = {
    'Image': training_image,
    'ModelDataUrl': model_s3_loc,
}

Training Job Name: image-classification-2020-04-22-20-37-28-601

Training Job Information:

Model S3 Location: s3://dsba-6190-final-team-project/output/image-classification-2020-04-22-20-37-28-601/output/model.tar.gz


### Create Model

In [15]:
timestamp = t.strftime('-%Y-%m-%d-%H-%M-%S', t.gmtime())
model_name = aws_component_name + "-model" + timestamp

try:
    
    create_model_response = sm_client.create_model(
        ModelName = model_name,
        ExecutionRoleArn = role,
        PrimaryContainer = primary_container)
    print("Initial creation of model")

except: 
    print("Model already created.")

print()
print("Model Name: {}".format(model_name))
print("Model ARN: {}".format(create_model_response['ModelArn']))

Initial creation of model

Model Name: image-classification-drivers-model-2020-04-22-17-33-10
Model ARN: arn:aws:sagemaker:us-east-1:726963482731:model/image-classification-drivers-model-2020-04-22-17-33-10


## Endpoint Configuration

### Establish Parameters

In [16]:
timestamp = t.strftime('-%Y-%m-%d-%H-%M-%S', t.gmtime())
endpoint_config_name = aws_component_name + '-epc' + timestamp
variant_name = "AllTraffic"
print('Endpoint Configuration name: {}'.format(endpoint_config_name))

Endpoint Configuration name: image-classification-drivers-epc-2020-04-22-17-33-10


### Create Endpoint Configuration

In [17]:
try:
    endpoint_config_response = sm_client.create_endpoint_config(
    EndpointConfigName = endpoint_config_name,
    ProductionVariants=[
        {
        'InstanceType':'ml.m4.xlarge',
        'InitialInstanceCount':1,
        'ModelName':model_name,
        'VariantName': variant_name,
        'InitialVariantWeight':1
        }
    ])
    print("Initial creation of endpoint configuration.")

except:
    print('Endpoint configuration already created')

print()
print('Endpoint configuration name: {}'.format(endpoint_config_name))
print('Endpoint configuration arn:  {}'.format(endpoint_config_response['EndpointConfigArn']))

Initial creation of endpoint configuration.

Endpoint configuration name: image-classification-drivers-epc-2020-04-22-17-33-10
Endpoint configuration arn:  arn:aws:sagemaker:us-east-1:726963482731:endpoint-config/image-classification-drivers-epc-2020-04-22-17-33-10


## Endpoint

### Establish Parameters

In [18]:
endpoint_name = aws_component_name + '-endpoint'
print('Endpoint name: {}'.format(endpoint_name))

endpoint_params = {
    'EndpointName': endpoint_name,
    'EndpointConfigName': endpoint_config_name,
}

Endpoint name: image-classification-drivers-endpoint


### Create / Update Endpoint

For the updating of and endpoint, we need to first verify the endpoint is in service. So we'll add a loop to verify that the endpoint is in service before we update it.

In [19]:
try:
    sm_client.describe_endpoint(EndpointName = endpoint_name)
    status = ""
    while True:
        if status == "InService":
            break
        status = sm_client.describe_endpoint(EndpointName = endpoint_name)['EndpointStatus']
        
    endpoint_response = sm_client.update_endpoint(**endpoint_params)
    print('Endpoint updated.')
    print()
except:
    endpoint_response = sm_client.create_endpoint(**endpoint_params)
    print("Initial creation of endpoint.")
    print()

print('EndpointArn = {}'.format(endpoint_response['EndpointArn']))

Initial creation of endpoint.

EndpointArn = arn:aws:sagemaker:us-east-1:726963482731:endpoint/image-classification-drivers-endpoint


# Auto Scaling

## Establish Boto3 Client - Application Autoscaling

In [20]:
scale_client = boto3.client("application-autoscaling")

## Set Global Parameters

In [21]:
service_name_space = 'sagemaker'
resource_id = os.path.join("endpoint", endpoint_name,"variant", variant_name)
scalable_dimension = 'sagemaker:variant:DesiredInstanceCount'

## Register Scalable Target - Sagemaker Endpoint

### Define Parameters

In [22]:
scalable_target_params = {
    'ServiceNamespace' : service_name_space,
    'ResourceId' : resource_id,
    'ScalableDimension' : scalable_dimension,
    'MinCapacity' : 1,
    'MaxCapacity' : 4,
    'RoleARN' : "arn:aws:iam::726963482731:role/aws-service-role/sagemaker.application-autoscaling.amazonaws.com/AWSServiceRoleForApplicationAutoScaling_SageMakerEndpoint"
}

json_form = json.dumps(scalable_target_params, indent=3)
print(json_form)

{
   "ServiceNamespace": "sagemaker",
   "ResourceId": "endpoint/image-classification-drivers-endpoint/variant/AllTraffic",
   "ScalableDimension": "sagemaker:variant:DesiredInstanceCount",
   "MinCapacity": 1,
   "MaxCapacity": 4,
   "RoleARN": "arn:aws:iam::726963482731:role/aws-service-role/sagemaker.application-autoscaling.amazonaws.com/AWSServiceRoleForApplicationAutoScaling_SageMakerEndpoint"
}


### Apply To Object

In [27]:
response = scale_client.register_scalable_target(**scalable_target_params)
response

## Put Scaling Policy

### Define Parameters

In [23]:
scaling_policy_params = {
    'PolicyName' : 'image-classification-driver-endpoint-scaling',
    'ServiceNamespace' : service_name_space,
    'ResourceId' : resource_id,
    'ScalableDimension' : scalable_dimension,
    'PolicyType' :  'TargetTrackingScaling',
    'TargetTrackingScalingPolicyConfiguration' : {
        'TargetValue' : 3e4,
        'PredefinedMetricSpecification' : {
            'PredefinedMetricType' : 'SageMakerVariantInvocationsPerInstance'
        },
        'ScaleInCooldown': 600,
        'ScaleOutCooldown': 300
    }
}

scale_policy_json_form = json.dumps(scaling_policy_params, indent=3)
print(scale_policy_json_form)

{
   "PolicyName": "image-classification-driver-endpoint-scaling",
   "ServiceNamespace": "sagemaker",
   "ResourceId": "endpoint/image-classification-drivers-endpoint/variant/AllTraffic",
   "ScalableDimension": "sagemaker:variant:DesiredInstanceCount",
   "PolicyType": "TargetTrackingScaling",
   "TargetTrackingScalingPolicyConfiguration": {
      "TargetValue": 30000.0,
      "PredefinedMetricSpecification": {
         "PredefinedMetricType": "SageMakerVariantInvocationsPerInstance"
      },
      "ScaleInCooldown": 600,
      "ScaleOutCooldown": 300
   }
}


### Apply To Object

In [28]:
response = scale_client.put_scaling_policy(**scaling_policy_params)
response

{'PolicyARN': 'arn:aws:autoscaling:us-east-1:726963482731:scalingPolicy:97e51a19-cf1d-4553-b11f-d49c6c6865ce:resource/sagemaker/endpoint/image-classification-drivers-endpoint/variant/AllTraffic:policyName/image-classification-driver-endpoint-scaling',
 'Alarms': [{'AlarmName': 'TargetTracking-endpoint/image-classification-drivers-endpoint/variant/AllTraffic-AlarmHigh-2f450b5c-5931-4f78-8ee7-24f890fa7cf3',
   'AlarmARN': 'arn:aws:cloudwatch:us-east-1:726963482731:alarm:TargetTracking-endpoint/image-classification-drivers-endpoint/variant/AllTraffic-AlarmHigh-2f450b5c-5931-4f78-8ee7-24f890fa7cf3'},
  {'AlarmName': 'TargetTracking-endpoint/image-classification-drivers-endpoint/variant/AllTraffic-AlarmLow-43a33091-3707-4e8b-b751-9b30501fe607',
   'AlarmARN': 'arn:aws:cloudwatch:us-east-1:726963482731:alarm:TargetTracking-endpoint/image-classification-drivers-endpoint/variant/AllTraffic-AlarmLow-43a33091-3707-4e8b-b751-9b30501fe607'}],
 'ResponseMetadata': {'RequestId': '50efe947-82ab-43a6-

## Verify Scaling Policy Was Attached

In [29]:
scalable_policy_search_params = {
    'ServiceNamespace' : service_name_space,
}

response = scale_client.describe_scaling_policies(**scalable_policy_search_params)
response

{'ScalingPolicies': [{'PolicyARN': 'arn:aws:autoscaling:us-east-1:726963482731:scalingPolicy:97e51a19-cf1d-4553-b11f-d49c6c6865ce:resource/sagemaker/endpoint/image-classification-drivers-endpoint/variant/AllTraffic:policyName/image-classification-driver-endpoint-scaling',
   'PolicyName': 'image-classification-driver-endpoint-scaling',
   'ServiceNamespace': 'sagemaker',
   'ResourceId': 'endpoint/image-classification-drivers-endpoint/variant/AllTraffic',
   'ScalableDimension': 'sagemaker:variant:DesiredInstanceCount',
   'PolicyType': 'TargetTrackingScaling',
   'TargetTrackingScalingPolicyConfiguration': {'TargetValue': 30000.0,
    'PredefinedMetricSpecification': {'PredefinedMetricType': 'SageMakerVariantInvocationsPerInstance'},
    'ScaleOutCooldown': 300,
    'ScaleInCooldown': 600},
   'Alarms': [{'AlarmName': 'TargetTracking-endpoint/image-classification-drivers-endpoint/variant/AllTraffic-AlarmHigh-2f450b5c-5931-4f78-8ee7-24f890fa7cf3',
     'AlarmARN': 'arn:aws:cloudwatch:u