# DLAD Exercise 2: Multitask Learning

## Setup

This notebook was created and tested on an ml.p2.xlarge notebook instance.

Let's start by creating a SageMaker session and specifying:

- The S3 bucket and prefix that you want to use for training and model data.  This should be within the same region as the Notebook Instance, training, and hosting.
- The IAM role arn used to give training and hosting access to your data.

In [None]:
import sagemaker

sagemaker_session = sagemaker.Session()

bucket = sagemaker_session.default_bucket()
prefix = 'sagemaker/DLAD-ex2-baseline'

role = sagemaker.get_execution_role()

## Data
### Getting the data

In [None]:
from torchvision import datasets

data_dir = '../data'
url_dataset = 'https://s3.amazonaws.com/dlad-miniscapes/miniscapes.zip'
filename = 'miniscapes.zip'
datasets.utils.download_url(url_dataset, data_dir, filename=filename)

### Uploading the data to S3
We are going to use the `sagemaker.Session.upload_data` function to upload our datasets to an S3 location. The return value inputs identifies the location -- we will use later when we start the training job.


In [None]:
inputs = sagemaker_session.upload_data(path=data_dir, bucket=bucket, key_prefix=prefix)
print('input spec (in this case, just an S3 path): {}'.format(inputs))

## Train

### Run training in SageMaker

The `PyTorch` class allows us to run our training function as a training job on SageMaker infrastructure. We need to configure it with our training script, an IAM role, the number of training instances, the training instance type, and hyperparameters. 

The 'hyperparameters' parameter is a dict of command line parameters, which you would normally pass via command line as `--key <value>`. All command line parameters have default values as given in `mtl/utils/config.py`. You can override a subset or all of the parameters by specifying new values in hyperparameters dictionary.

In [None]:
from sagemaker.pytorch import PyTorch

estimator = PyTorch(
    source_dir='./',
    entry_point='train_sagemaker.py',
    role=role,
    framework_version='1.4.0',
    train_instance_count=1,
    train_instance_type='ml.p2.xlarge',
    train_volume_size=30,
    train_use_spot_instances=True,
    train_max_run=86000,
    train_max_wait=86400,
    hyperparameters={
        # 'batch_size': <new value>,       # an example how to override default value of batch_size
        'tensorboard_daemon_start': True,  # tensorboard server is run alongside the training code (you need this)
        'ngrok_daemon_start': True,        # ngrok forwarding server is run alongside with tensorboard and training code
        'ngrok_auth_token': <AUTHTOKEN>,   # use your personal ngrok.com account authtoken to access tensorboard
    }
)

After we've constructed our `PyTorch` object, we can fit it using the data we uploaded to S3. SageMaker makes sure our data is available in the local filesystem, so our training script can simply read the data from disk.


In [None]:
estimator.fit({'training': inputs})