# AWS Sagemaker Modeling Script

> *I used this [post](https://blog.betomorrow.com/keras-in-the-cloud-with-amazon-sagemaker-67cf11fb536) from Paul Breton and the corresponding GitHub [repo](https://github.com/Pravez/KerasSageMaker) for guidance on utilizing Keras with Sagemaker.* 

## Libraries to Import

In [1]:
from PIL import Image
import tensorflow
import keras
from tensorflow.keras.preprocessing import image
from tensorflow.keras import models, layers, optimizers
from keras.layers import Dropout

import os
import gc

import sagemaker
from sagemaker.tensorflow import TensorFlow
from tensorflow.python.keras.preprocessing.image import load_img

import warnings
warnings.filterwarnings('ignore')




Using TensorFlow backend.
No handlers could be found for logger "sagemaker"


## Create Sagemaker Training Job

Create Sagemaker session and role ARN.

In [10]:
sagemaker_session = sagemaker.Session()

In [11]:
role = sagemaker.get_execution_role()

Have to define our S3 bucket name and pathway to training and validation sets, and set some hyperparameters for the model to use when calling the Keras model. 

In [12]:
bucket = "sagemaker-all-cnn"
key = "Data"
key_output = "output"                   # Path from the bucket's root to the dataset
train_instance_type='ml.m4.xlarge'      # The type of EC2 instance which will be used for training
deploy_instance_type='ml.m4.xlarge'     # The type of EC2 instance which will be used for deployment
hyperparameters={
    "learning_rate": 0.001,
    "decay": 0.0001
}

Specify locations for our data within the S3 bucket.

In [13]:
train_input_path = "s3://{}/{}/training/".format(bucket, key)
validation_input_path = "s3://{}/{}/validation/".format(bucket, key)

Create the actual training job. We specify additional hyperparameters, such as epochs and evaluation steps, in addition to loading the model architecture from a separate Python script (which can be found in the Model_Scripts directory in this repo) where we use the traditional Keras framework. 

In [18]:
estimator = TensorFlow(
  entry_point=os.path.join(os.path.dirname('__file__'), "Model_Scripts/Model_2C2D.py"),
  role=role,
  framework_version="1.12.0",               # TensorFlow's version
  hyperparameters=hyperparameters,
  training_steps=100,
  evaluation_steps=30,
  train_instance_count=1,                   # "The number of GPUs instances to use"
  train_instance_type=train_instance_type,
)

Call the .fit() method to begin the training job. Progress is tracked here, but the actual computation is done on a separate Training Jobs instance in AWS with a ml.m4.xlarge instance (a hyperparameter we can alter if the respective resource quota is satisfied). 

Depending on the model architecture, training takes from 10 mins to 1 hour with 4000 images and 100-150 epochs. 

If running this, you should see prompts in the following structure displayed over about 3 minutes. Most errors will be thrown in the last "Training in progress." step, so wait until that step is complete before navigating away.

> "Training ... 

>2020-09-22 16:09:18 Starting - Starting the training job...

>2020-09-22 16:09:20 Starting - Launching requested ML instances......

>2020-09-22 16:10:30 Starting - Preparing the instances for training...

>2020-09-22 16:11:19 Downloading - Downloading input data.........

>2020-09-22 16:12:40 Training - Training image download completed. Training in progress."

In [19]:
print("Training ...")
estimator.fit({'training': train_input_path, 'eval': validation_input_path})

Training ...
2020-09-23 19:59:14 Starting - Starting the training job...
2020-09-23 19:59:16 Starting - Launching requested ML instances......
2020-09-23 20:00:22 Starting - Preparing the instances for training.........
2020-09-23 20:01:56 Downloading - Downloading input data......
2020-09-23 20:03:10 Training - Downloading the training image...
2020-09-23 20:03:29 Training - Training image download completed. Training in progress.[34m2020-09-23 20:03:30,237 INFO - root - running container entrypoint[0m
[34m2020-09-23 20:03:30,237 INFO - root - starting train task[0m
[34m2020-09-23 20:03:30,253 INFO - container_support.training - Training starting[0m
[34mDownloading s3://sagemaker-us-east-2-526614695497/sagemaker-tensorflow-2020-09-23-19-59-14-065/source/sourcedir.tar.gz to /tmp/script.tar.gz[0m
[34m2020-09-23 20:03:34,762 INFO - tf_container - ----------------------TF_CONFIG--------------------------[0m
[34m2020-09-23 20:03:34,762 INFO - tf_container - {"environment": "clou

Model checkpoints and the final trained model will be uploaded to a new S3 bucket created by Sagemaker to store all models from this notebook instance. Billable seconds will also be calculated for the length of training and add to relevant totals in the AWS Billing dashboard.