## SageMaker Training Job 

### Please go through this notebook only if you have finished Part 1 to Part 4 of the tutorial.

---
#### Step 1: Import packages, get IAM role, get the region and set the S3 bucket.

In [1]:
import os
import boto3
import re
import copy
import time
from time import gmtime, strftime
from sagemaker import get_execution_role

role = get_execution_role()

region = boto3.Session().region_name

bucket ='keras-sagemaker-train' # Put your s3 bucket name here

---
#### Step 2: Create the algorithm image and push to Amazon ECR.

In [2]:
%%sh

# The name of our algorithm
algorithm_name=keras-sagemaker-train

chmod +x src/*

account=$(aws sts get-caller-identity --query Account --output text)

# Get the region defined in the current configuration (default to us-west-2 if none defined)
region=$(aws configure get region)
region=${region:-us-west-2}

fullname="${account}.dkr.ecr.${region}.amazonaws.com/${algorithm_name}:latest"

# If the repository doesn't exist in ECR, create it.

aws ecr describe-repositories --repository-names "${algorithm_name}" > /dev/null 2>&1

if [ $? -ne 0 ]
then
    aws ecr create-repository --repository-name "${algorithm_name}" > /dev/null
fi

# Get the login command from ECR and execute it directly
$(aws ecr get-login --region ${region} --no-include-email)

# Build the docker image locally with the image name and then push it to ECR
# with the full name.

# On a SageMaker Notebook Instance, the docker daemon may need to be restarted in order
# to detect your network configuration correctly.  (This is a known issue.)
if [ -d "/home/ec2-user/SageMaker" ]; then
  sudo service docker restart
fi

# Comment the line below to use a GPU
docker build  -t ${algorithm_name} -f Dockerfile.cpu .

# Uncomment the below line if you wish to run on a GPU
#docker build  -t ${algorithm_name} -f Dockerfile.gpu . 

docker tag ${algorithm_name} ${fullname}

docker push ${fullname}

Login Succeeded
Stopping docker: [  OK  ]
Starting docker:	.[  OK  ]
Sending build context to Docker daemon  11.47MB
Step 1/6 : FROM phenompeople/centos-python:3.6.3
3.6.3: Pulling from phenompeople/centos-python
7dc0dca2b151: Pulling fs layer
4e13d20dd920: Pulling fs layer
4e03286a7322: Pulling fs layer
1460c005753b: Pulling fs layer
b6d4fd0c5aa4: Pulling fs layer
b291b7062d5c: Pulling fs layer
1460c005753b: Waiting
b6d4fd0c5aa4: Waiting
b291b7062d5c: Waiting
4e03286a7322: Verifying Checksum
4e03286a7322: Download complete
4e13d20dd920: Verifying Checksum
4e13d20dd920: Download complete
7dc0dca2b151: Verifying Checksum
7dc0dca2b151: Download complete
1460c005753b: Verifying Checksum
1460c005753b: Download complete
b291b7062d5c: Verifying Checksum
b291b7062d5c: Download complete
b6d4fd0c5aa4: Verifying Checksum
b6d4fd0c5aa4: Download complete
7dc0dca2b151: Pull complete
4e13d20dd920: Pull complete
4e03286a7322: Pull complete
1460c005753b: Pull complete
b6d4fd0c5aa4: Pull complete
b

https://docs.docker.com/engine/reference/commandline/login/#credentials-store



---
#### Step 3: Define variables with data location and output location in S3 bucket.

In [3]:
data_location = 's3://{}/data'.format(bucket)
print("data location - " + data_location)

output_location = 's3://{}/output'.format(bucket)
print("output location - " + output_location)

data location - s3://keras-sagemaker-train/data
output location - s3://keras-sagemaker-train/output


---
#### Step 4: Create a SageMaker session.

In [4]:
import sagemaker as sage
sess = sage.Session()

---
#### Step 5: Define variables for account, region and algorithm image.

In [5]:
account = sess.boto_session.client('sts').get_caller_identity()['Account'] # aws account 
region = sess.boto_session.region_name # aws server region
image = '{}.dkr.ecr.{}.amazonaws.com/keras-sagemaker-train'.format(account, region) # algorithm image path in ECR

---
#### Step 6: Define hyperparameters to be passed to your algorithm. 
In this project we are reading two hyperparameters for training. Use of hyperparameters in optional.

In [6]:
hyperparameters = {"batch_size":128, "epochs":30}

---
#### Step 7: Create the training job using SageMaker Estimator.

In [7]:
classifier = sage.estimator.Estimator(image_name=image, 
                                      role=role,
                                      train_instance_count=1, 
                                      train_instance_type='ml.c5.2xlarge',
                                      hyperparameters=hyperparameters,
                                      output_path=output_location,
                                      sagemaker_session=sess)

---
#### Step 8: Run the training job by passing the data location.

In [8]:
classifier.fit(data_location)

2019-06-14 09:15:48 Starting - Starting the training job...
2019-06-14 09:15:51 Starting - Launching requested ML instances......
2019-06-14 09:17:02 Starting - Preparing the instances for training...
2019-06-14 09:17:43 Downloading - Downloading input data
2019-06-14 09:17:43 Training - Downloading the training image....
[31m2019-06-14 09:18:20.768618: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA[0m
[31m2019-06-14 09:18:20.799047: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3000000000 Hz[0m
[31m2019-06-14 09:18:20.800177: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x4826440 executing computations on platform Host. Devices:[0m
[31m2019-06-14 09:18:20.800196: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): <undefined>, <undefined>[0m
[31mUsing TensorFlow backend.[0m
[31mInstructions for upda


2019-06-14 09:18:46 Uploading - Uploading generated training model
2019-06-14 09:18:46 Completed - Training job completed
[31mEpoch 25/30

 128/8000 [..............................] - ETA: 0s - loss: 0.2170 - acc: 0.9375[0m
[31m1152/8000 [===>..........................] - ETA: 0s - loss: 0.1464 - acc: 0.9583[0m
[31mEpoch 26/30

 128/8000 [..............................] - ETA: 0s - loss: 0.0881 - acc: 0.9688[0m
[31m1280/8000 [===>..........................] - ETA: 0s - loss: 0.1911 - acc: 0.9398[0m
[31mEpoch 27/30

 128/8000 [..............................] - ETA: 0s - loss: 0.1441 - acc: 0.9531[0m
[31m1152/8000 [===>..........................] - ETA: 0s - loss: 0.1517 - acc: 0.9540[0m
[31mEpoch 28/30

 128/8000 [..............................] - ETA: 0s - loss: 0.0929 - acc: 0.9766[0m
[31m1152/8000 [===>..........................] - ETA: 0s - loss: 0.1259 - acc: 0.9661[0m
[31mEpoch 29/30

 128/8000 [..............................] - ETA: 0s - loss: 0.1427 - acc: 0.96

## Congratulations! We had a successful training job run in Amazon SageMaker.
#### Please return to the tutorial for Part 6 where we will be running a training job in a GPU.