# Word-level language modeling RNN

In [1]:
import os
import boto3
import sagemaker
from sagemaker.pytorch import PyTorch
from sagemaker import get_execution_role

sagemaker_session = sagemaker.Session()

role = 'arn:aws:iam::142577830533:role/SageMakerRole'#get_execution_role()

# Download training and test data
We use raw data from the wikitext-2 dataset:
https://www.salesforce.com/products/einstein/ai-research/the-wikitext-dependency-language-modeling-dataset/


In [2]:
# script to download dataset
import os
if not 'workbookDir' in globals():
    workbookDir = os.getcwd()
print('workbookDir: ' + workbookDir)
data_dir = os.path.join(workbookDir, 'data', 'wikitext-2')
print('data_dir: ' + data_dir)


workbookDir: /workplace/nadzeya/sagemaker-pytorch-containers/notebooks/rnn
data_dir: /workplace/nadzeya/sagemaker-pytorch-containers/notebooks/rnn/data/wikitext-2


# Uploading the data
We use the sagemaker.Session.upload_data function to upload our datasets to an S3 location. The return value inputs identifies the location -- we will use this later when we start the training job.



In [3]:
inputs = sagemaker_session.upload_data(path=data_dir, key_prefix='data/DEMO-pytorch-rnn')
print('input spec (in this case, just an S3 path): {}'.format(inputs))

input spec (in this case, just an S3 path): s3://sagemaker-us-west-2-142577830533/data/DEMO-pytorch-rnn


# Run the training script on SageMaker
The PyTorch class allows us to run our training function as a distributed training job on SageMaker infrastructure. We need to configure it with our training script, an IAM role, the number of training instances, and the training instance type. In this case we will run our training job on ml.p2.xlarge instance.

In [6]:
estimator = PyTorch(entry_point="train.py",
                    role=role,
                    framework_version='0.4.0',
                    train_instance_count=1,
                    train_instance_type='ml.p2.8xlarge',
                    source_dir='source',
                    hyperparameters={'batch_size': 30, 'epochs': 50})

After we've constructed our PyTorch object, we can fit it using the data we uploaded to S3. SageMaker makes sure our data is available in the local filesystem, so our training script can simply read the data from disk.

In [7]:
estimator.fit({'wikitext-2': inputs})

INFO:sagemaker:Creating training-job with name: sagemaker-pytorch-2018-04-30-23-23-17-597


.....................
[31m2018-04-30 23:28:39,397 INFO - root - running container entrypoint[0m
[31m2018-04-30 23:28:39,397 INFO - root - starting train task[0m
[31m2018-04-30 23:28:39,460 INFO - container_support.app - started training: {'train_fn': <function train at 0x7fd36fe9b488>}[0m
[31mDownloading s3://sagemaker-us-west-2-142577830533/sagemaker-pytorch-2018-04-30-23-23-17-597/source/sourcedir.tar.gz to /tmp/script.tar.gz[0m
[31m2018-04-30 23:28:39,600 INFO - botocore.vendored.requests.packages.urllib3.connectionpool - Starting new HTTP connection (1): 169.254.170.2[0m
[31m2018-04-30 23:28:39,687 INFO - botocore.vendored.requests.packages.urllib3.connectionpool - Starting new HTTPS connection (1): sagemaker-us-west-2-142577830533.s3.amazonaws.com[0m
[31m2018-04-30 23:28:39,723 INFO - botocore.vendored.requests.packages.urllib3.connectionpool - Starting new HTTPS connection (2): sagemaker-us-west-2-142577830533.s3.amazonaws.com[0m
[31m2018-04-30 23:28:39,740 INFO - 

===== Job Complete =====
Billable seconds: 806


# Implement the training function
We need to provide a training script that can run on the SageMaker platform. The training scripts are essentially the same as one you would write for local training, except that you need to provide a train function. When SageMaker calls your function, it will pass in arguments that describe the training environment. Check the script below to see how this works.