## SageMaker Training Job 

### Please go through this notebook only if you have finished Part 1 to Part 4 of the tutorial.

---
#### Step 1: Import packages, get IAM role, get the region and set the S3 bucket.

In [None]:
#Do not execute this in demo
%%sh 
pwd
cd local_test/test_dir/input/data/training/
ls -ltr
aws s3 cp data_set s3://eu.com.syngenta-datascience-model-training/Preetam.Balijepalli@syngenta.com/keras-sagemaker/train/data/data_set

---
#### Step 2: Create the algorithm image and push to Amazon ECR.

In [None]:
#Do not execute this in demo
%%sh

chmod +x create_container.sh 

./create_container.sh keras-sagemaker

---
#### Step 3: Define variables with data location and output location in S3 bucket.

In [1]:
schema = 's3:/' 
bucket = 'eu.com.syngenta-datascience-model-training'

user = 'Preetam.Balijepalli@syngenta.com' 
experiment = 'keras-sagemaker'


data_location = f'{schema}/{bucket}/{user}/{ experiment}/train/data'
print("data location - " + data_location)

output_location = f'{schema}/{bucket}/{user}/{ experiment}/output'
print("output location - " + output_location)

data location - s3://eu.com.syngenta-datascience-model-training/Preetam.Balijepalli@syngenta.com/keras-sagemaker/train/data
output location - s3://eu.com.syngenta-datascience-model-training/Preetam.Balijepalli@syngenta.com/keras-sagemaker/output


---
#### Step 4: Create a SageMaker session.

In [2]:
import sagemaker as sage

sagemaker_session = sage.Session()

# this line of code requires iam:GetRole permissions
#role = sage.get_execution_role()

---
#### Step 5: Define variables for account, region and algorithm image.

In [3]:
account = sagemaker_session.boto_session.client('sts').get_caller_identity()['Account'] # aws account 
region = sagemaker_session.boto_session.region_name # aws server region
tag='gpu'
image = '{}.dkr.ecr.{}.amazonaws.com/datascience-model-training:{}'.format(account, region, tag) # algorithm image path in ECR
#keras-sagemaker-train
print(account)
print(region)
print(image)

170605107178
eu-central-1
170605107178.dkr.ecr.eu-central-1.amazonaws.com/datascience-model-training:gpu


---
#### Step 6: Define hyperparameters to be passed to your algorithm. 
In this project we are reading two hyperparameters for training. Use of hyperparameters in optional.

In [4]:
hyperparameters = {"batch_size":128, "epochs":30}

---
#### Step 7: Create the training job using SageMaker Estimator.

In [5]:
role = 'arn:aws:iam::170605107178:role/SYN-Datascience-SageMaker-Role'

subnets_config = ['subnet-0bdd33f41f946b22a', 'subnet-0c7c8959343746db7']

security_groups_config = [ "sg-1ad4ea70",
              "sg-99d781f1",
              "sg-dfa1f7b7"]

# the instance type to be used for training. using 'local' will not trigger a job on SageMaker
instance_count = 1
instance_type = 'ml.p3.2xlarge'

#https://aws.amazon.com/sagemaker/pricing/
#https://aws.amazon.com/sagemaker/pricing/instance-types/

#ml.p3.2xlarge Accelerated Computing – Current Generation 	8(CPU cores)	1xV100(GPU)	61GiB(CPU mem)	16GiB(Gpu mem)
# ml.c5.2xlarge 	  Compute Instances - Current Generation	 $0.543 16GiB RAM 8 cores
# ml.m4.xlarge	   Compute Instances - Standard Generation	 $0.336 16GiB Ram 4 cores


classifier = sage.estimator.Estimator(image_name=image, 
                                      role=role,
                                      train_instance_count=instance_count, 
                                      train_instance_type= instance_type,
                                      hyperparameters=hyperparameters,
                                      output_path=output_location,
                                      subnets=subnets_config, 
                                      security_group_ids=security_groups_config,
                                      sagemaker_session=sagemaker_session)

---
#### Step 8: Run the training job by passing the data location.

In [6]:
from datetime import datetime

job_prefix_name = 'datascience-model-training'
training_job_name = job_prefix_name + '-'  + datetime.now().strftime("%Y-%m-%d-%H-%M-%S")

print(training_job_name)

datascience-model-training-2020-04-17-07-48-53


In [7]:
classifier.fit(inputs=data_location, 
              logs=True, 
              job_name=training_job_name)

2020-04-17 07:48:56 Starting - Starting the training job...
2020-04-17 07:48:58 Starting - Launching requested ML instances...
2020-04-17 07:49:56 Starting - Preparing the instances for training......
2020-04-17 07:51:01 Downloading - Downloading input data
2020-04-17 07:51:01 Training - Downloading the training image............
2020-04-17 07:52:54 Training - Training image download completed. Training in progress.[34m2020-04-17 07:52:55.776323: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libnvinfer.so.6[0m
[34m2020-04-17 07:52:55.819870: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libnvinfer_plugin.so.6[0m
[34m2020-04-17 07:52:57.279365: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA[0m
[34m2020-04-17 07:52:57.314500: I tensorflow/core/platform/profile_utils/cpu_utils.c


2020-04-17 07:53:25 Uploading - Uploading generated training model
2020-04-17 07:53:25 Completed - Training job completed
Training seconds: 160
Billable seconds: 160


## Congratulations! We had a successful training job run in Amazon SageMaker.
#### Please return to the tutorial for Part 6 where we will be running a training job in a GPU.