In [2]:
import sagemaker
import boto3
from datetime import datetime
import ipywidgets as widgets
from os import path

  (fname, cnt))
  (fname, cnt))


# Step X: Configure Amazon Sagemaker and Train Model

Amazon Sagemaker is a managed machine learning service which allows us to easily experiment with different configurations of input data without having to worry about the computational infrastructure involved in training a complex model on large volumes of data. We use this tool to produce our trained models. 

Sagemaker works by ingesting a pre-packaged set of **training and inference code**, along with the necessary environment, and running this package on hardware you specify. This package comes in the form of a Docker container. We have created this container such that it contains all of our custom ML code (you can see the specification in `../pipeline/sagemaker/Dockerfile`), and it lives in the Amazon Elastic Container Registry with the following tag: `675906086666.dkr.ecr.us-west-2.amazonaws.com/planet-snowcover:latest`. This container is not publicly accessible, but is accessible to anyone with AWS credentials. 

We have to do a bit of configuration, then we can begin training our models!

## Configuration 
### Environment

In order to tell the AWS machinery knows who you are, you must specify your AWS profile name, which describes a set of credentials configured via the AWS command line interface. If you've run `aws configure`, you'll have an AWS credentials file containing your credentials, most likely under the `default` profile. Running the following cell will tell you which profile names have associated credentials currently configured on your computer. ‚ö†Ô∏è **Note**: if the following cell has no output, you don't have AWS credentials configured. Go back to the [Deployment Guide](../deployment/README.md) to learn how to do this.

In [50]:
profiles = boto3.session.Session().available_profiles
print('\n'.join(profiles))

default
esip


Given that information, choose a profile from the dropdown below. 

In [51]:
aws_profile = widgets.Dropdown(options = profiles, value = 'default', description="AWS Profile")
aws_profile.value='esip'

### AWS Credentials

In [52]:
# configure sagemaker session with AWS profile
botosess = boto3.Session(profile_name=aws_profile.value, region_name = 'us-west-2') # need us-west-2 to access sagemaker image

### ‚ö†Ô∏è Model Specification 

This is the most important part of the machine learning process. Our algorithm and infrastructure tools rely on configuration files to specify which data to use when training, among other parameters relevant to the machine learning process.

**Important**: in order to give our machine learning algorithm data to train with, all data must be stored Amazon S3 buckets that you have access to, in the Spherical Web Mercator tile format. If you've completed the previous two steps in this tutorial, that's going to be the case for you. If you'd like more information on the tools necessary to do this, see the [`preprocess`](../preprocess) toolkit. 

We've created a configuration file template, which resides at [`/experiments/CONFIG-TEMPLATE.toml`](./experiments/CONFIG-TEMPLATE.toml). **Duplicate this file**, and name it something meaningful. Then, open it in a text editor. 

The most important part of the file to pay attention to is this: 

```
#@@@@@@@ LOOK HERE! @@@@@@@@
[dataset]
  ### This defines the IMAGERY and SNOW MASK locations that we need to access to 
  ### complete training, as well as our AWS credentials and other parameters.
  
  ## CREDENTIALS
  aws_profile = "esip" # your aws profile as stored in ~/.aws/credentials. Look there to see your stored profiles. 
  ## DATA - IMAGERY
  image_bucket = "planet-snowcover-imagery" # The S3 bucket where your imagery is stored.
  # regex defines each slippy-map directory, for buckets with many
  imagery_directory_regex = '2018042\d_.*_tiled' # A Regular Expression to select individual image directories.
  
  ## DATA ‚Äì MASK
  mask_bucket = "planet-snowcover-snow/ASO_3M_SD_USCAJW_20180423"
  mask_directory_regex = "ASO_3M_SD_USCAJW_20180423-MASK_02-COG$" # $ = end of string = only dirs (no .tif)
  
  ## ML ‚Äì CONFIG
  train_percent = 0.7 # percentage of imagery used for training (1 - train_percent used for validation). 


```

You want to edit the following lines: 

| Variable                | Description                                                                                     | Notes                                                                                                                                                           |
|-------------------------|-------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------|
| `aws_profile`           | The named AWS credentials profile you used in the above step.                                   | Verify that the named profile is in your `~/.aws/credentials` file.                                                                                             |
| `image_bucket`          | The named S3 bucket (that you have access to!) where all of the imagery is stored.              | The code requires that this bucket contain sub-directories which each contain a slippy-map directory structure (eg. `<aws_bucket>/<image-id>`/<z>/<x>/<y>.tif`) |
| `image_directory_regex` | A regular expression to select which directories within `image_bucket` to select for training.  | For help building this regular expression, check out [RegExr.com](https://regexr.com).                                                                          |
| `mask_bucket`           | The named S3 bucket where the masks are stored, similar to above.                               | Same idea to the above imagery bucket structure. This structure allows for multiple images and multiple masks to be considered in training.                     |
| `mask_directory_regex`  | See above.                                                                                      | See above.                                                                                                                                                      |

    
**Note** that this format of regular expressions means that **you can specify multiple masks and images** to a single training run. The ML code automagically checks for overlapping image and mask tiles, and only selects image tiles which have a matching mask tile for ground truth. This means you don't need to be *too* careful how you specify your image paths. However, if you provide too many image tiles for the code to sift through, this matching process will take a long time. 
    
Once you've saved this file somewhere, we'll upload the file to another bucket so the algorithm has access to it. 

    
### Upload Model Specification
    
Next, you'll need to create an S3 bucket to contain your experimentation. This is where we'll put our model specifications, and optionally our training results. For help, check out ["*How Do I Create an S3 Bucket?*"](https://docs.aws.amazon.com/AmazonS3/latest/user-guide/create-bucket.html). **Be sure to create this bucket in the `us-west-2` region!** Once you've done that, run the following cell and choose the bucket that you'd like to use. 

In [53]:
buckets = [b['Name'] for b in botosess.client('s3').list_buckets()['Buckets']]
model_bucket = widgets.Dropdown(description='Bucket', options=buckets)
model_bucket.value = 'planet-snowcover-experiments'
model_bucket.value

'planet-snowcover-experiments'

Next, specify the **absolute location** of the configuration file you've just created above. For example, if the file is called "`config1.toml`" and it's in the `experiments` folder, you might give `/Users/you/planet-snowcover/experiments/config1.toml`.  

In [54]:
config_location = widgets.Text(description="Config Path")
config_location.value ='/home/ubuntu/planet-snowcover/experiments/co-train-veg-validate-aug.toml'

Finally, we upload this file to the specified S3 bucket. 

In [55]:
! aws s3 --profile {aws_profile.value} cp {config_location.value} s3://{model_bucket.value}

upload: ../experiments/co-train-veg-validate-aug.toml to s3://planet-snowcover-experiments/co-train-veg-validate-aug.toml


## Sagemaker Configuration

Now that we've setup the infrastructure necessary to actually train our models, we'll tell Amazon Sagemaker all about our training specifications and preferences and use those to actually train a model. We will use the Python Sagemaker API for this. 

First, we'll need a piece of information from the Terraform configuration that you've completed in the [Deployment Guide](../deployment/README.md). It's a specific kind of AWS credential known as an "IAM Role," which gives permission to the sagemaker service to access your S3 data and control sagemaker from the API. 

We've already configured an IAM Role for Sagemaker during the deployment process. To access the role, `cd` into the `/deployment` directory from the command line and run `terraform output`. You should see the following: 
    
    sagemaker_role_arn = arn:aws:iam::.....
   
Copy this value starting with `arn:aws:iam:` and paste it in the box below. 

In [56]:
sageRole = widgets.Text(description="Role")
sageRole.value='arn:aws:iam::675906086666:role/service-role/AmazonSageMaker-ExecutionRole-20190501T152299'

Next, we'll configure a SageMaker "estimator." An Estimator is a Python object which links into the Sagemaker training infrastructure. We have to specify the following parameters: 

| Parameter         | Description                                                                                                 |
|-------------------|-------------------------------------------------------------------------------------------------------------|
| Image             | The Docker image, stored in Amazon ECS, with all training code and environment within it. (We provide this) |
| Role              | The Sagemaker Role ARN that you provided above.                                                             |
| Instance Count    | The number of instances you'd like Sagemaker to use (1 is all we can handle at this time).                  |
| Instance Type     | The type of AWS EC2 instance we'll use for training (`ml.p2.xlarge` is ideal, since it's GPU-enabled)       |
| output_path       | The S3 bucket you'd like to store the output of training (including the trained model) in.                  |
| sagemaker_session | The credentialed sagemaker session.                                                                         |

First we'll configure the session with the AWS profile specified above:

In [57]:
sess = boto3.Session(profile_name=aws_profile.value, region_name='us-west-2')
sage_sess = sagemaker.Session(boto_session=sess)

Then we specify the image name (we've created this for you---it contains all the code necessary to train)

In [58]:
image = '675906086666.dkr.ecr.us-west-2.amazonaws.com/planet-snowcover:latest'
image

'675906086666.dkr.ecr.us-west-2.amazonaws.com/planet-snowcover:latest'

In [59]:
model_bucket.value

'planet-snowcover-experiments'

Finally, we create the estimator:

In [3]:
e = sagemaker.estimator.Estimator(image, 
                             sageRole.value, 1, "ml.p2.xlarge", train_volume_size=150,
                             output_path = "s3://"+ model_bucket.value, train_max_run=259200,
                             sagemaker_session = sage_sess)

NameError: name 'image' is not defined

## Model Training

The final step in this notebook is to actually train (or "fit") the ML model. All of the preparation we've done makes this a one-line operation, but it's worth explaining. 

The `Estimator` we just created has a `fit` method, which accepts a dictionary of "input channels", in Sagemaker parlance. Our Docker container is configured to receive one channel named `config`, which is the S3 location of the configuration file you created earlier. Here we'll create a URL to that location: 

In [61]:
config_url = "s3://{}/{}".format(model_bucket.value, path.split(config_location.value)[-1])
config_url

's3://planet-snowcover-experiments/co-train-veg-validate-aug.toml'

**Finally**: we'll train the model. 

üö®**WARNING** üö®: Running the next cell **will cost you money!!** To reverse this operation, go to the [AWS Web console](https://us-west-2.console.aws.amazon.com/sagemaker/home?region=us-west-2#/jobs) and stop the training job.  

In [62]:
e.fit({
    'config': config_url
}, wait=False)

This operation will take a while (at least **6 hours**, if using the default configuration). To check out the initial progress, run the next cell to get the logs. 

In [63]:
sage_sess.logs_for_job(e.latest_training_job.job_name, wait=False) # change wait=True to see live updates. 

2020-06-10 08:39:44 Starting - Starting the training job.

In [64]:
e.latest_training_job.name

'planet-snowcover-2020-06-10-08-39-44-083'

**Since this is running on separate SageMaker infrastructure**, you can shut down this notebook/lab instance. Just sit back and have some coffee ‚òïÔ∏è. 

### Appendix: Stop the Job

If you'd like to stop the currently-running training job, uncomment and run the cell below. You can also do this in the [AWS Web console](https://us-west-2.console.aws.amazon.com/sagemaker/home?region=us-west-2#/jobs).

In [65]:
#sess.client('sagemaker').stop_training_job(TrainingJobName = e.latest_training_job.name)