This notebook is based on the preprocessing already locally performed. Training and Validation sets of images and LST files have been developed and placed in the correct structure, on S3. 

At this point, we can now simply perform the multiclass image classification training.

## Import

## Libraries

In [81]:
import sagemaker
import boto3
from sagemaker import get_execution_role
from sagemaker.amazon.amazon_estimator import get_image_uri

# Set-Up

## Establish AWS Parameters
This step establishes AWS parameters used through this notebook.

In [82]:
role = get_execution_role()
print(role)

bucket = "dsba-6190-final-team-project"
prefix_1 = "channels"
prefix_file_type = "rec"

sess_sage = sagemaker.Session()

arn:aws:iam::726963482731:role/sagemaker_execution


## Import Sagemaker Model
This step imports the latest version of the Amazon Sagemaker Image Classification model.

In [83]:
training_image = get_image_uri(sess_sage.boto_region_name, 'image-classification', repo_version="latest")
print (training_image)

811284229777.dkr.ecr.us-east-1.amazonaws.com/image-classification:latest


# Model Training
Two different data sets have been uploaded to S3. One is the complete dataset. The other is a 10% sample of the dataset. The 10% sample is for troubleshooting training and deployment of the Sagemaker Image Classification algorithm.

There are only two differences between training the model with the sample or complete dataset:

* __Input Location__: We need to point the algorithm to different S3 locations. We will do this with the **prefix_dataset** variable, which will be defined at the beginning of each dataset's notebook section.
* __Number of Training Samples__: The number of training samples will be different for the complete and the sample. Thes values are available in the Jupyter Notebook used to split the data and upload to S3.

We will define the number of **training** samples for each dataset below. 

**Note**: *Currently this is a manual process. Future iterations of this process will automate this calculation.*

In [84]:
num_training_samples_complete = 15686
num_training_samples_10 = 1567

## Define Dataset
This section defines the parameters of the dataset. By setting the split prefix and dataset prefix, it will direct the algorithm to the correct training and validation inputs. 

There are two varables which require definition:

1. **Dataset**: The dataset is either the complete dataset, or it is the 10% sample dataset. The 10% sample was created for troubleshooting purposes. Final production will use the complete dataset.
2. **Train/Validation Split Method**: Two different methods were developed to split the training data into a training and validation set. See the image processing notebook for more detail.
 * im2rec: This method was a random split, using the **im2rec.py** tool
 * split_drivers: This method divided the drivers into a training and validation set. Then, all the images associated with each driver are put into image training and validation sets. Using this method, all of the images associated with a driver are in either the training or validation set. No driver appears in both sets.

In [85]:
# Define Lists and Dictionary
list_dataset = ["complete", "sample",]
list_split_method = ["split_random", "split_driver"]

training_sample_dict = {
    "sample-split_random" : num_training_samples_10,
    "sample-split_driver": num_training_samples_10, 
    "complete-split_random": num_training_samples_complete,
    "complete-split_driver": num_training_samples_complete    
}

# Define Data Inputs
prefix_dataset = list_dataset[0] #0 = complete / 1 = sample
prefix_split_type = list_split_method[0]  #0 = split_random / 1 = split_drivers

# Extract Number of Training Samples
key_training_sample = prefix_dataset + "-" +prefix_split_type
num_training_samples = training_sample_dict[key_training_sample]

print("The following are the inputs for the model:")
print("Split Method:\t\t\t{}".format(prefix_split_type))
print("Dataset:\t\t\t{}".format(prefix_dataset))
print("# of Training Samples:\t\t{}".format(num_training_samples))

The following are the inputs for the model:
Split Method:			split_random
Dataset:			complete
# of Training Samples:		15686


## Model Inputs

### Model Output Location

In [86]:
s3_output_location = 's3://{}/{}/{}/{}/{}/output'.format(bucket, prefix_1, prefix_split_type, prefix_dataset)
print(s3_output_location)

s3://dsba-6190-final-team-project/channels/rec/split_random/complete/output


### Model Input Location

First we establish the data input channels. As we are using RecordIO data format, only two channels are required.

In [87]:
s3train = 's3://{}/{}/{}/{}/train/'.format(bucket, prefix_1, prefix_split_type, prefix_dataset)
s3validation = 's3://{}/{}/{}/{}/validation/'.format(bucket, prefix_1, prefix_split_type, prefix_dataset)

print("The input data is pulled from the following S3 locations:")
print("Training:\t{}".format(s3train))
print("Validation:\t{}".format(s3validation))

The input data is pulled from the following S3 locations:
Training:	s3://dsba-6190-final-team-project/channels/split_random/complete/train/
Validation:	s3://dsba-6190-final-team-project/channels/split_random/complete/validation/


Then we define the channels as inputs into the image classification model.

In [88]:
train_data = sagemaker.session.s3_input(s3train, 
                                        distribution='FullyReplicated', 
                                        content_type='application/x-recordio', 
                                        s3_data_type='S3Prefix')

validation_data = sagemaker.session.s3_input(s3validation, 
                                             distribution='FullyReplicated', 
                                             content_type='application/x-recordio', 
                                             s3_data_type='S3Prefix')

data_channels = {'train': train_data, 
                 'validation': validation_data}

print(data_channels)

{'train': <sagemaker.inputs.s3_input object at 0x7f13586910f0>, 'validation': <sagemaker.inputs.s3_input object at 0x7f1358691048>}


## Train Model

## Set Up Instance Types

In [89]:
# Available Instances
available_instances =['ml.p2.xlarge',              ### $1.26/hr
                      'ml.p3.2xlarge'              ### 4.284 /hr
                     ]

# Initialize Instance
train_instance_type = available_instances[0]

# Print Check
print("This training session used the following instance: {}".format(train_instance_type))

This training session used the following instance: ml.p2.xlarge


### Initialize
#### Parameters
The following steps define the algoritm parameters and hyperparameters

In [90]:
dist_drive_ic = sagemaker.estimator.Estimator(training_image,
                                              role, 
                                              train_instance_count=1, 
                                              train_instance_type=train_instance_type,
                                              train_volume_size = 50,
                                              train_max_run = 360000,
                                              input_mode= 'File',
                                              output_path=s3_output_location,
                                              sagemaker_session=sess_sage)

#### Hyper-Parameters

In [91]:
dist_drive_ic.set_hyperparameters(num_layers = 18,
                                  use_pretrained_model = 1,
                                  image_shape = "3,210,280", #RGB Pictures, 210 x 280
                                  num_classes = 10,
                                  mini_batch_size = 128,
                                  epochs = 5,
                                  learning_rate = 0.01,
                                  num_training_samples = num_training_samples,
                                  precision_dtype = 'float16')

### Run Model
With the data inputs defined, parameters and hyperparameters initialized, we can run the model.

In [None]:
%%time
dist_drive_ic.fit(inputs = data_channels, logs = True)

2020-04-21 00:02:27 Starting - Starting the training job...
2020-04-21 00:02:29 Starting - Launching requested ML instances......
2020-04-21 00:03:37 Starting - Preparing the instances for training.........
2020-04-21 00:05:13 Downloading - Downloading input data......
2020-04-21 00:05:59 Training - Downloading the training image...
2020-04-21 00:06:48 Training - Training image download completed. Training in progress.[34mDocker entrypoint called with argument(s): train[0m
[34m[04/21/2020 00:06:51 INFO 140609063552832] Reading default configuration from /opt/amazon/lib/python2.7/site-packages/image_classification/default-input.json: {u'beta_1': 0.9, u'gamma': 0.9, u'beta_2': 0.999, u'optimizer': u'sgd', u'use_pretrained_model': 0, u'eps': 1e-08, u'epochs': 30, u'lr_scheduler_factor': 0.1, u'num_layers': 152, u'image_shape': u'3,224,224', u'precision_dtype': u'float32', u'mini_batch_size': 32, u'weight_decay': 0.0001, u'learning_rate': 0.1, u'momentum': 0}[0m
[34m[04/21/2020 00:06:

## Compile - AWS Neo