## Model training on AWS Sagemaker

Referencing [this notebook](https://github.com/aws/amazon-sagemaker-examples/blob/master/sagemaker-python-sdk/pytorch_lstm_word_language_model/pytorch_rnn.ipynb)

In [10]:
import sagemaker
import boto3
import os
os.chdir('..')

In [2]:
sagemaker_session = sagemaker.Session()

In [4]:
bucket = sagemaker_session.default_bucket()
print(f"Bucket Name: {bucket}")

Bucket Name: sagemaker-us-east-1-594409465357


In [7]:
try:
    role = sagemaker.get_execution_role()
except ValueError:
    iam = boto3.client('iam')
    role = iam.get_role(RoleName = 'mom-aws')['Role']['Arn']
print(f"Execution Role: {role}")

Couldn't call 'get_role' to get Role ARN from role name shaunkhoo to get Role path.


Execution Role: arn:aws:iam::594409465357:role/mom-aws


In [15]:
inputs = sagemaker_session.upload_data(path = "Data/Processed/Training/train-aws", 
                                       bucket = bucket, 
                                       key_prefix = prefix)

In [16]:
print(f"Inputs stored in: {inputs}")

Inputs stored in: s3://sagemaker-us-east-1-594409465357/sagemaker/ssoc-autocoder


In [22]:
env = {
    'SAGEMAKER_REQUIREMENTS': '../Notebooks/requirements.txt', # path relative to `source_dir` below.
}

In [46]:
from sagemaker.pytorch import PyTorch

estimator = PyTorch(
    entry_point = "train_aws.py",
    role = role,
    framework_version = "1.8.1",
    instance_count = 1,
    instance_type = "ml.p2.xlarge",
    source_dir = "ssoc_autocoder",
    py_version = "py3",
    env = env,
    # available hyperparameters: emsize, nhid, nlayers, lr, clip, epochs, batch_size,
    #                            bptt, dropout, tied, seed, log_interval
    hyperparameters = {"epochs": 1, "tied": True},
)

In [47]:
estimator.fit({"training": inputs})

2021-10-11 03:28:39 Starting - Starting the training job...
2021-10-11 03:29:02 Starting - Launching requested ML instancesProfilerReport-1633922942: InProgress
......
2021-10-11 03:30:03 Starting - Preparing the instances for training......
2021-10-11 03:31:27 Downloading - Downloading input data...
2021-10-11 03:32:03 Training - Downloading the training image.......................[34mbash: cannot set terminal process group (-1): Inappropriate ioctl for device[0m
[34mbash: no job control in this shell[0m
[34m2021-10-11 03:36:16,415 sagemaker-training-toolkit INFO     Imported framework sagemaker_pytorch_container.training[0m
[34m2021-10-11 03:36:16,440 sagemaker_pytorch_container.training INFO     Block until all host DNS lookups succeed.[0m
[34m2021-10-11 03:36:16,452 sagemaker_pytorch_container.training INFO     Invoking user training script.[0m
[34m2021-10-11 03:36:16,960 sagemaker-training-toolkit INFO     Invoking user script
[0m
[34mTraining Env:
[0m
[34m{
    "a

In [43]:
estimator.fit({"training": inputs})

2021-10-11 02:32:39 Starting - Starting the training job...
2021-10-11 02:33:02 Starting - Launching requested ML instancesProfilerReport-1633919582: InProgress
......
2021-10-11 02:34:13 Starting - Preparing the instances for training.........
2021-10-11 02:36:03 Downloading - Downloading input data...
2021-10-11 02:36:23 Training - Downloading the training image...........................
2021-10-11 02:41:45 Training - Training image download completed. Training in progress.[34mbash: cannot set terminal process group (-1): Inappropriate ioctl for device[0m
[34mbash: no job control in this shell[0m
[34m2021-10-11 02:41:39,012 sagemaker-training-toolkit INFO     Imported framework sagemaker_pytorch_container.training[0m
[34m2021-10-11 02:41:39,040 sagemaker_pytorch_container.training INFO     Block until all host DNS lookups succeed.[0m
[34m2021-10-11 02:41:39,049 sagemaker_pytorch_container.training INFO     Invoking user training script.[0m
[34m2021-10-11 02:41:39,756 sage