This notebook guides you to use SageMaker for model training using a custom container.

SageMaker training requires the folllwing 2 inputs
 - **data**:  SageMaker training job requires all data inputs to be stored in S3 bucket.  You will upload your data to the S3 bucket
 - **training image**:  SageMaker training requires a docker image for training your model.  You will need to find out the image url
 

In [2]:
import sagemaker
from sagemaker import get_execution_role

In [3]:
sagemaker_session = sagemaker.Session()
role = get_execution_role()

### Training, Validation, Test data

A folder called **data** has been created, and under the data folder, 3 addtional folders(**train, validation, test**) have been created. You will need to move your training, validation, and test data to these folders respectively



### The following bucket and directories will be used for store data and source code

change the name of s3_bucket and s3_prefix to reflect your bucket and prefix name

In [4]:
s3_bucket = '<bucket name>'
s3_prefix = '<prefix name>'

# S3 location for the training data
traindata_s3_prefix = '{}/data/train'.format(s3_prefix)

# S3 location for the validation data
validationdata_s3_prefix = '{}/data/validation'.format(s3_prefix)

# S3 location for the test data
testdata_s3_prefix = '{}/data/test'.format(s3_prefix)

# S3 location where model will be saved
output_s3 = 's3://{}/{}/models/'.format(s3_bucket, s3_prefix)

# S3 location for the training scripts\
code_location_s3 = 's3://{}/{}/codes'.format(s3_bucket, s3_prefix)

## Upload data to S3

In [None]:
local_data_path = 'data'
Inputs = sagemaker_session.upload_data(path=local_data_path, bucket=s3_bucket, key_prefix='{}/data'.format(s3_prefix))

print(Inputs)

Next, we need to set up the data channels for the training job later.  Modify the code delow to reflect the channels you have.  Please note, you can not have a data channel without data in it. For example, if you don't have a test channel, then modify the **data_channel** assignment line below to remove **"test":s3_input_test**

In [6]:
s3_input_train = "s3://" + s3_bucket + "/" + traindata_s3_prefix
s3_input_validation = "s3://" + s3_bucket + "/"  + validationdata_s3_prefix
s3_input_test = "s3://" + s3_bucket + "/"  + testdata_s3_prefix

data_channel = {"train":s3_input_train, "validation": s3_input_validation, "test": s3_input_test}


## Configure hyperparamters

Change the hyperparameter below for reflect your hyperparameters.  The epochs hyperparameter below is meant as an example. If you don't have any hyperaparameters, then remove the cell below.

In [11]:
hyperparameters = {"epochs": 1}

## Get the docker image uri
Next, we will to get the docker image url in ECR. Enter the value of image_tag

In [None]:
image_tag = "<name of the image>"
account = sagemaker_session.boto_session.client('sts').get_caller_identity()['Account']
region = sagemaker_session.boto_session.region_name
image = '{}.dkr.ecr.{}.amazonaws.com/{}:latest'.format(account, region, image_tag)
print(image)

## Start training job

Now have the data uploaded to S3 bucket, the hyperparameter values set, and the model training image uri from ECR, we are ready to set up the SageMaker Estimator to kick off the training job.

In [None]:
sk_estimator = sagemaker.estimator.Estimator(image,
                       role=role, 
                       train_instance_count =1, 
                       train_instance_type = 'ml.c4.2xlarge',
                       output_path=output_s3,
                       sagemaker_session=sagemaker_session)

sk_estimator.fit(data_channel)