This notebook is based on the preprocessing already performed. Training and Validation sets of images and LST files have been developed and placed in the correct structure, on S3. 

At this point, we can now simply perform the multiclass image classification training.

## Import

## Libraries

In [84]:
%%time
import sagemaker
from sagemaker import get_execution_role
from sagemaker.amazon.amazon_estimator import get_image_uri

CPU times: user 13 µs, sys: 2 µs, total: 15 µs
Wall time: 17.4 µs


# Set-Up

## Establish AWS Conditions

In [85]:
role = get_execution_role()
print(role)

bucket = "dsba-6190-final-team-project"
prefix_1 = "channels"
prefix_file_type = "rec"

sess_sage = sagemaker.Session()

arn:aws:iam::726963482731:role/sagemaker_execution


## Import Sagemaker Model

In [86]:
training_image = get_image_uri(sess_sage.boto_region_name, 'image-classification', repo_version="latest")
print (training_image)

811284229777.dkr.ecr.us-east-1.amazonaws.com/image-classification:latest


# Model Training
Two different data sets have been uploaded to S3. One is the complete dataset. The other is a 10% sample of the dataset. The 10% sample is for troubleshooting the algorithm.

There are only two differences between fitting the model with the sample data and the complete dataset:

* Input Location: We need to point the algorithm to two different S3 locations. We will do this with the **prefix_dataset** variable, which will be defined at the beginning of each dataset's notebook section.
* Number of Training Samples: The number of training samples will be different for the complete and the sample. Thes values are available in the Jupyter Notebook used to split the data and upload to S3. We will define the number of **training** samples for each dataset below.

In [87]:
num_training_samples_complete = 15686
num_training_samples_10 = 1567

## Define Dataset
This section defines the dataset. By setting the split prefix and dataset prefix, it will direct the algorithm to the correct training and validation inputs. 

In [88]:
# Define Lists and Dictionary
list_dataset = ["sample", "complete"]
list_split_method = ["split_im2rec", "split_drivers"]

training_sample_dict = {
    "sample-split_im2rec" : 1567,
    "sample-split_drivers":10, 
    "complete-split_im2rec": 15686,
    "complete-split_drivers": 15686    
}

# Define Data Inputs
prefix_dataset = list_dataset[1]
prefix_split_type = list_split_method[0]

# Extract Number of Training Samples
key_training_sample = prefix_dataset + "-" +prefix_split_type
num_training_samples = training_sample_dict[key_training_sample]

print("The following are the dataset inputs into the Image Classification Alogorithm:")
print("Split Method:\t\t\t{}".format(prefix_split_type))
print("Dataset:\t\t\t{}".format(prefix_dataset))
print("# of Training Samples:\t\t{}".format(num_training_samples))

The following are the dataset inputs into the Image Classification Alogorithm:
Split Method:			split_im2rec
Dataset:			complete
# of Training Samples:		15686


## Model Inputs

### Model Output Location

In [89]:
s3_output_location = 's3://{}/{}/{}/{}/{}/output'.format(bucket, prefix_1, prefix_file_type, prefix_split_type, prefix_dataset)
print(s3_output_location)

s3://dsba-6190-final-team-project/channels/rec/split_im2rec/complete/output


### Data Paths

First we establish the four channels.

In [90]:
s3train = 's3://{}/{}/{}/{}/{}/train/'.format(bucket, prefix_1, prefix_file_type, prefix_split_type, prefix_dataset)
s3validation = 's3://{}/{}/{}/{}/{}/validation/'.format(bucket, prefix_1, prefix_file_type, prefix_split_type, prefix_dataset)

print("The input data is pulled from the following S3 locations:")
print("Training:\t{}".format(s3train))
print("Validation:\t{}".format(s3validation))

The input data is pulled from the following S3 locations:
Training:	s3://dsba-6190-final-team-project/channels/rec/split_im2rec/complete/train/
Validation:	s3://dsba-6190-final-team-project/channels/rec/split_im2rec/complete/validation/


Then we define the channels as inputs into the image classification model.

In [91]:
train_data = sagemaker.session.s3_input(s3train, 
                                        distribution='FullyReplicated', 
                                        content_type='application/x-recordio', 
                                        s3_data_type='S3Prefix')

validation_data = sagemaker.session.s3_input(s3validation, 
                                             distribution='FullyReplicated', 
                                             content_type='application/x-recordio', 
                                             s3_data_type='S3Prefix')

data_channels = {'train': train_data, 
                 'validation': validation_data}

print(data_channels)

{'train': <sagemaker.inputs.s3_input object at 0x7f4da0e863c8>, 'validation': <sagemaker.inputs.s3_input object at 0x7f4da0e86470>}


## Train Model Price Tests

In order to see if it might be worth paying for a more expensive instance, I am going got test the two least expensive **Accelerated Computing – Current Generation** instances.

**Test on 3/30/20**
*Prices:*
ml.p2.xlarge: 1.26/hr
ml.p3.2xlarge: 4.284/hr

The prices listed  are taken at 3/30/20 at 12:57 PM

*Algorithm Settings:*

* epochs: 2
* learning_rate: 0.1
* instance count: 1
* num layers: 18
* mini batch size: 128
* use pretrained model: 1

**Training Times:**

*Split Method: im2rec*

Data: 10% Sample

ml.p2.xlarge: 101 secs

ml.p3.2xlarge: 86 secs

Data: Complete

ml.p2.xlarge: # secs

ml.p3.2xlarge: # secs

In [92]:
# Available Instances
available_instances =['ml.p2.xlarge',              ### $1.26/hr
                      'ml.p3.2xlarge'              ### 4.284 /hr
                     ]

# Initialize Instance
train_instance_type = available_instances[0]

# Print Check
print("This training session used the following instance: {}".format(train_instance_type))

This training session used the following instance: ml.p2.xlarge


### Initialize Parameters

In [96]:
dist_drive_ic = sagemaker.estimator.Estimator(training_image,
                                              role, 
                                              train_instance_count=1, 
                                              train_instance_type=train_instance_type,
                                              train_volume_size = 50,
                                              train_max_run = 360000,
                                              input_mode= 'File',
                                              output_path=s3_output_location,
                                              sagemaker_session=sess_sage)

### Initialize Hyper-Parameters

In [97]:
dist_drive_ic.set_hyperparameters(num_layers = 18,
                                  use_pretrained_model = 1,
                                  image_shape = "3,210,280", #RGB Pictures, 210 x 280
                                  num_classes = 10,
                                  mini_batch_size = 128,
                                  epochs = 2,
                                  learning_rate = 0.1,
                                  num_training_samples = num_training_samples,
                                  precision_dtype = 'float32')

### Run Model

In [98]:
%%time
dist_drive_ic.fit(inputs = data_channels, logs = True)

2020-03-30 18:45:47 Starting - Starting the training job...
2020-03-30 18:45:48 Starting - Launching requested ML instances.........
2020-03-30 18:47:20 Starting - Preparing the instances for training.........
2020-03-30 18:49:09 Downloading - Downloading input data...
2020-03-30 18:49:42 Training - Downloading the training image.....[34mDocker entrypoint called with argument(s): train[0m
[34m[03/30/2020 18:50:28 INFO 140391108474688] Reading default configuration from /opt/amazon/lib/python2.7/site-packages/image_classification/default-input.json: {u'beta_1': 0.9, u'gamma': 0.9, u'beta_2': 0.999, u'optimizer': u'sgd', u'use_pretrained_model': 0, u'eps': 1e-08, u'epochs': 30, u'lr_scheduler_factor': 0.1, u'num_layers': 152, u'image_shape': u'3,224,224', u'precision_dtype': u'float32', u'mini_batch_size': 32, u'weight_decay': 0.0001, u'learning_rate': 0.1, u'momentum': 0}[0m
[34m[03/30/2020 18:50:28 INFO 140391108474688] Merging with provided configuration from /opt/ml/input/config