In [85]:
import sagemaker
import boto3
from datetime import datetime
import ipywidgets as widgets

# Step X: Configure Amazon Sagemaker and Train Model

Amazon Sagemaker is a managed machine learning service which allows us to easily experiment with different configurations of input data without having to worry about the computational infrastructure involved in training a complex model on large volumes of data. We use this tool to produce our trained models. 

Sagemaker works by ingesting a pre-packaged set of **training and inference code**, along with the necessary environment, and running this package on hardware you specify. This package comes in the form of a Docker container. We have created this container such that it contains all of our custom ML code (you can see the specification in `../pipeline/sagemaker/Dockerfile`), and it lives in the Amazon Elastic Container Registry with the following tag: `675906086666.dkr.ecr.us-west-2.amazonaws.com/planet-snowcover:latest`. This container is not publicly accessible, but is accessible to anyone with AWS credentials. 

We have to do a bit of configuration, then we can begin training our models!

## Configuration 
### Environment

So the AWS machinery knows who you are, you must specify your AWS profile name, which describes a set of credentials configured via the AWS command line interface. If you've run `aws configure`, you'll have an AWS credentials file containing your credentials. Running the following cell will tell you which profile names have associated credentials currently configured on your computer. ⚠️ **Note**: if the following cell has no output, you don't have AWS credentials configured. Go back to the [Deployment Guide](../deployment/README.md) to learn how to do this.

In [88]:
profiles = boto3.session.Session().available_profiles
print('\n'.join(profiles))

default
esip


Given that information, choose a profile from the dropdown below. 

In [91]:
aws_profile = widgets.Dropdown(options = profiles, value = 'default', description="AWS Profile")
aws_profile

Dropdown(description='AWS Profile', options=('default', 'esip'), value='default')

### Sagemaker

In [100]:
# configure sagemaker session with AWS profile
botosess = boto3.Session(profile_name=aws_profile.value, region_name = 'us-west-2') # need us-west-2 to access sagemaker image
sage_session = sagemaker.Session(boto_session=sess)

### ⚠️ Model Specification 

This is the most important part of the machine learning process. Our algorithm and infrastructure tools rely on configuration files to specify which data to use when training, among other parameters relevant to the machine learning process.

**Important**: in order to give our machine learning algorithm data to train with, all data must be stored Amazon S3 buckets that you have access to, in the Spherical Web Mercator tile format. If you've completed the previous two steps in this tutorial, that's going to be the case for you. 

We've created a configuration file template, which resides at [`/experiments/CONFIG-TEMPLATE.toml`](./experiments/CONFIG-TEMPLATE.toml). **Duplicate this file**, and name it something meaningful. Then, open it in a text editor. 

The most important part of the file to pay attention to is this: 

```
# Planet-Snowcover Model Configuration File – TEMPLATE
# Tony Cannistra, 2019. tonycan@uw.edu.
# University of Washington.


#@@@@@@@ LOOK HERE! @@@@@@@@
[dataset]
  ### This defines the IMAGERY and SNOW MASK locations that we need to access to 
  ### complete training, as well as our AWS credentials and other parameters.
  
  ## CREDENTIALS
  aws_profile = "esip" # your aws profile as stored in ~/.aws/credentials. Look there to see your stored profiles. 
  ## DATA - IMAGERY
  image_bucket = "planet-snowcover-imagery" # The S3 bucket where your imagery is stored.
  # regex defines each slippy-map directory, for buckets with many
  imagery_directory_regex = '2018042\d_.*_tiled' # A Regular Expression to select individual image directories.
  
  ## DATA – MASK
  mask_bucket = "planet-snowcover-snow/ASO_3M_SD_USCAJW_20180423"
  mask_directory_regex = "ASO_3M_SD_USCAJW_20180423-MASK_02-COG$" # $ = end of string = only dirs (no .tif)
  
  ## ML – CONFIG
  train_percent = 0.7 # percentage of imagery used for training (1 - train_percent used for validation). 


```

You want to edit the following lines: 

| Variable                | Description                                                                                     | Notes                                                                                                                                                           |
|-------------------------|-------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------|
| `aws_profile`           | The named AWS credentials profile you used in the above step.                                   | Verify that the named profile is in your `~/.aws/credentials` file.                                                                                             |
| `image_bucket`          | The named S3 bucket (that you have access to!) where all of the imagery is stored.              | The code requires that this bucket contain sub-directories which each contain a slippy-map directory structure (eg. `<aws_bucket>/<image-id>`/<z>/<x>/<y>.tif`) |
| `image_directory_regex` | A regular expression to select which directories within `image_bucket` to select for training.  | For help building this regular expression, check out [RegExr.com](https://regexr.com).                                                                          |
| `mask_bucket`           | The named S3 bucket where the masks are stored, similar to above.                               | Same idea to the above imagery bucket structure. This structure allows for multiple images and multiple masks to be considered in training.                     |
| `mask_directory_regex`  | See above.                                                                                      | See above.                                                                                                                                                      |

    
**Note** that this format of regular expressions means that **you can specify multiple masks and images** to a single training run. The ML code automagically checks for overlapping image and mask tiles, and only selects image tiles which have a matching mask tile for ground truth. This means you don't need to be *too* careful how you specify your image paths. However, if you provide too many image tiles for the code to sift through, this matching process will take a long time. 
    
Once you've saved this file somewhere, we'll upload the file to another bucket so the algorithm has access to it. 

    
### Upload Model Specification
    
Next, you'll need to create an S3 bucket to contain your experimentation. This is where we'll put our model specifications, and optionally our training results. For help, check out ["*How Do I Create an S3 Bucket?*"](https://docs.aws.amazon.com/AmazonS3/latest/user-guide/create-bucket.html). Once you've done that, run the following cell and choose the bucket that you'd like to use. 

In [116]:
buckets = [b['Name'] for b in botosess.client('s3').list_buckets()['Buckets']]
model_bucket = widgets.Dropdown(description='Bucket', options=buckets)
model_bucket

Dropdown(description='Bucket', options=('planet-snowcover-analysis', 'planet-snowcover-experiments', 'planet-s…

Next, specify the **absolute location** of the configuration file you've just created above. For example, if the file is called "`config1.toml`" and it's in the `experiments` folder, you might give `/Users/you/planet-snowcover/experiments/config1.toml`.  

In [118]:
config_location = widgets.Text(description="Config Path")
config_location

Text(value='', description='Config Path')

Finally, we upload this file to the specified S3 bucket. 

In [121]:
! aws s3 --profile {aws_profile.value} cp {config_location.value} s3://{model_bucket.value}

upload: ../experiments/CONFIG-TEMPLATE.toml to s3://planet-snowcover-experiments/CONFIG-TEMPLATE.toml


---

In [17]:
sess = boto3.Session(profile_name='esip', region_name='us-west-2')
sage_sess = sagemaker.Session(boto_session=sess)

In [21]:
account = sage_sess.boto_session.client('sts').get_caller_identity()['Account']
region = sage_sess.boto_session.region_name
image = '{}.dkr.ecr.{}.amazonaws.com/planet-snowcover:latest'.format(account, region)

In [43]:
role = 'arn:aws:iam::675906086666:role/service-role/AmazonSageMaker-ExecutionRole-20190501T152299'


In [22]:
image

'675906086666.dkr.ecr.us-west-2.amazonaws.com/planet-snowcover:latest'

In [26]:
sage_sess.default_bucket

'sagemaker-us-west-2-675906086666'

In [59]:
e = sagemaker.estimator.Estimator(image, 
                             role, 1, "ml.p2.xlarge", 
                             output_path = "s3://planet-snowcover-models", 
                             sagemaker_session = sage_sess)

In [60]:
e.fit({'config': "s3://planet-snowcover-experiments/ASO_3M_SD_USCATE_20180528.toml"}, wait=False)

In [73]:
sage_sess.logs_for_job(e.latest_training_job.job_name, wait=True)

2019-10-24 23:16:49 Starting - Preparing the instances for training
2019-10-24 23:16:49 Downloading - Downloading input data
2019-10-24 23:16:49 Training - Training image download completed. Training in progress.
2019-10-24 23:16:49 Stopping - Stopping the training job[31mInitiating training with config .[0m
[31m/opt/ml/input/data/config/ASO_3M_SD_USCATE_20180528.toml[0m
[31m{'dataset': {'aws_profile': 'esip', 'image_bucket': 'planet-snowcover-imagery', 'imagery_directory_regex': '20180528_.*', 'mask_bucket': 'planet-snowcover-snow', 'mask_directory_regex': 'ASO_3M_SD_USCATE_20180528_binary$', 'train_percent': 0.7}, 'classes': [{'name': 'snow', 'color': 'white'}], 'channels': [{'bands': [1, 2, 3, 4], 'mean': [0.485, 0.456, 0.406, 1.0], 'std': [0.229, 0.224, 0.225, 1.0]}], 'model': {'name': 'albunet', 'encoder': 'resnet50', 'pretrained': True, 'loss': 'lovasz', 'batch_size': 7, 'tile_size': 512, 'epochs': 50, 'lr': 2.5e-05, 'data_augmentation': 0.75, 'decay': 0.0}}[0m
[31mRoboSat

KeyboardInterrupt: 

In [62]:
sageclient = boto3.client('sagemaker')

In [67]:
sageclient.stop_training_job(TrainingJobName = name)

NameError: name 'name' is not defined

In [65]:
e.latest_training_job.name

'planet-snowcover-2019-10-24-22-54-59-196'

In [72]:
sess.client('sagemaker').stop_training_job(TrainingJobName = e.latest_training_job.name)

{'ResponseMetadata': {'RequestId': '542f72be-0f54-4be1-9e11-022ce70222ae',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amzn-requestid': '542f72be-0f54-4be1-9e11-022ce70222ae',
   'content-type': 'application/x-amz-json-1.1',
   'content-length': '0',
   'date': 'Thu, 24 Oct 2019 23:16:49 GMT'},
  'RetryAttempts': 0}}

In [74]:
e.training_job_analytics

KeyError: 'MetricDefinitions'

In [76]:
sage_sess.logs_for_job(e.latest_training_job.job_name)

2019-10-24 23:19:02 Starting - Preparing the instances for training
2019-10-24 23:19:02 Downloading - Downloading input data
2019-10-24 23:19:02 Training - Training image download completed. Training in progress.
2019-10-24 23:19:02 Stopping - Stopping the training job
2019-10-24 23:19:02 Uploading - Uploading generated training model[31mInitiating training with config .[0m
[31m/opt/ml/input/data/config/ASO_3M_SD_USCATE_20180528.toml[0m
[31m{'dataset': {'aws_profile': 'esip', 'image_bucket': 'planet-snowcover-imagery', 'imagery_directory_regex': '20180528_.*', 'mask_bucket': 'planet-snowcover-snow', 'mask_directory_regex': 'ASO_3M_SD_USCATE_20180528_binary$', 'train_percent': 0.7}, 'classes': [{'name': 'snow', 'color': 'white'}], 'channels': [{'bands': [1, 2, 3, 4], 'mean': [0.485, 0.456, 0.406, 1.0], 'std': [0.229, 0.224, 0.225, 1.0]}], 'model': {'name': 'albunet', 'encoder': 'resnet50', 'pretrained': True, 'loss': 'lovasz', 'batch_size': 7, 'tile_size': 512, 'epochs': 50, 'lr': 