This notebook is based on the preprocessing already performed. Training and Validation sets of images and LST files have been developed and placed in the correct structure, on S3. 

At this point, we can now simply perform the multiclass image classification training.

## Import

## Libraries

In [29]:
%%time
import sagemaker
from sagemaker import get_execution_role
from sagemaker.amazon.amazon_estimator import get_image_uri

CPU times: user 12 µs, sys: 2 µs, total: 14 µs
Wall time: 17.6 µs


# Set-Up

## Establish AWS Conditions

In [30]:
role = get_execution_role()
print(role)

bucket = "dsba-6190-final-team-project"
prefix = "imgs"

sess_sage = sagemaker.Session()

arn:aws:iam::726963482731:role/sagemaker_execution


## Import Sagemaker Model

In [31]:
training_image = get_image_uri(sess_sage.boto_region_name, 'image-classification', repo_version="latest")
print (training_image)

811284229777.dkr.ecr.us-east-1.amazonaws.com/image-classification:latest


# Model Training

## Model Inputs

### Model Output Location

In [32]:
s3_output_location = 's3://{}/{}/output'.format(bucket, prefix)
print(s3_output_location)

s3://dsba-6190-final-team-project/imgs/output


### Model Parameters

In [33]:
train_instance_count=1
train_instance_type='ml.p2.xlarge'
train_volume_size = 50
train_max_run = 360000
input_mode= 'File'
output_path=s3_output_location
sagemaker_session=sess_sage

### Model Hyper-Parameters

In [34]:
num_layers=18
use_pretrained_model=1
image_shape = "3,224,224"
num_classes=10
mini_batch_size=128
resize=256
epochs=5
learning_rate=0.1
num_training_samples=15686 # Manually verified in S3 (Action > Get Size)
use_weighted_loss=1
augmentation_type = 'crop_color_transform'
precision_dtype='float16'
multi_label=1

### Data Paths

First we establish the four channels.

In [35]:
s3train = 's3://{}/{}/train/'.format(bucket, prefix)
s3validation = 's3://{}/{}/validation/'.format(bucket, prefix)
s3train_lst = 's3://{}/{}/train_lst/'.format(bucket, prefix)
s3validation_lst = 's3://{}/{}/validation_lst/'.format(bucket, prefix)

Then we define the channels as inputs into the image classification model.

In [36]:
train_data = sagemaker.session.s3_input(s3train, distribution='FullyReplicated', 
                                        content_type='application/x-image', 
                                        s3_data_type='S3Prefix')

validation_data = sagemaker.session.s3_input(s3validation, distribution='FullyReplicated', 
                                             content_type='application/x-image', 
                                             s3_data_type='S3Prefix')

train_data_lst = sagemaker.session.s3_input(s3train_lst, distribution='FullyReplicated', 
                                            content_type='application/x-image', 
                                            s3_data_type='S3Prefix')

validation_data_lst = sagemaker.session.s3_input(s3validation_lst, distribution='FullyReplicated', 
                                                 content_type='application/x-image', 
                                                 s3_data_type='S3Prefix')

Print to verify.

In [37]:
data_channels = {'train': train_data, 'validation': validation_data, 'train_lst': train_data_lst, 
                 'validation_lst': validation_data_lst}

print(data_channels)

{'train': <sagemaker.inputs.s3_input object at 0x7f982fe0de10>, 'validation': <sagemaker.inputs.s3_input object at 0x7f982fe0dd68>, 'train_lst': <sagemaker.inputs.s3_input object at 0x7f982fe0dda0>, 'validation_lst': <sagemaker.inputs.s3_input object at 0x7f982fe0de48>}


## Train Model

### Initialize Parameters

In [38]:
dist_drive_ic = sagemaker.estimator.Estimator(training_image,
                                              role, 
                                              train_instance_count=train_instance_count, 
                                              train_instance_type=train_instance_type,
                                              train_volume_size = train_volume_size,
                                              train_max_run = train_max_run,
                                              input_mode= input_mode,
                                              output_path=s3_output_location,
                                              sagemaker_session=sess_sage)

### Initialize Hyper-Parameters

In [39]:
dist_drive_ic.set_hyperparameters(num_layers = num_layers,
                                  use_pretrained_model = use_pretrained_model,
                                  image_shape = image_shape,
                                  num_classes = num_classes,
                                  mini_batch_size = mini_batch_size,
                                  resize = resize,
                                  epochs = epochs,
                                  learning_rate = learning_rate,
                                  num_training_samples = num_training_samples,
                                  use_weighted_loss = use_weighted_loss,
                                  augmentation_type = augmentation_type,
                                  precision_dtype = precision_dtype,
                                  multi_label = multi_label)

### Run Model

In [40]:
%%time
dist_drive_ic.fit(inputs = data_channels, 
                  logs = True)

2020-03-20 02:22:01 Starting - Starting the training job...
2020-03-20 02:22:02 Starting - Launching requested ML instances......
2020-03-20 02:23:11 Starting - Preparing the instances for training.........
2020-03-20 02:24:45 Downloading - Downloading input data..................
2020-03-20 02:27:58 Training - Downloading the training image..[34mDocker entrypoint called with argument(s): train[0m
[34m[03/20/2020 02:28:17 INFO 139658910345024] Reading default configuration from /opt/amazon/lib/python2.7/site-packages/image_classification/default-input.json: {u'beta_1': 0.9, u'gamma': 0.9, u'beta_2': 0.999, u'optimizer': u'sgd', u'use_pretrained_model': 0, u'eps': 1e-08, u'epochs': 30, u'lr_scheduler_factor': 0.1, u'num_layers': 152, u'image_shape': u'3,224,224', u'precision_dtype': u'float32', u'mini_batch_size': 32, u'weight_decay': 0.0001, u'learning_rate': 0.1, u'momentum': 0}[0m
[34m[03/20/2020 02:28:17 INFO 139658910345024] Merging with provided configuration from /opt/ml/inp

UnexpectedStatusException: Error for Training job image-classification-2020-03-20-02-22-01-103: Failed. Reason: ClientError: imread read blank (None) image for file: /opt/ml/input/data/train/c1/img_56720.jpg