# Chapter-3: Create and train a ML Segmentation Model using AWS Sagemaker

The objectives you complete during the course of this chapter introduce you to the process of implementing the SageMaker model training tool. You will engage in this process by completing the following objectives:

### AWS SageMaker:
Amazon SageMaker helps data scientists and developers to prepare, build, train, and deploy high-quality machine learning (ML) models quickly by bringing together a broad set of capabilities purpose-built for ML.

Simply put, It is a set of cloud based (specifically, AWS) apps that focus on labeling, training, testing and deploying models.

## How it Works in context of our HLD problem:

### Prepare:

Sagemaker provides in-house labelling tools and data wrangling tools for some common ML workflows. Owing to the spatio temporal nature of the HLD dataset, we will be skipping this functionality of sagemaker and use IMPACT-Built ImageLabeler tool for identifying features and labeling them. This workflow has been covered in Chapter-0 and Chapter-1 of this workshop

### Build, Train & Tune:

Sagemaker provides access to cloud-hosted jupyter notebooks along with pre-built ML models. For our purposes, The model we are using for this demo is Unet segmentation model (https://lmb.informatik.uni-freiburg.de/people/ronneber/u-net/). The architecture is a stack of convolutions followed by de-convolutions. This model assigns a class label to each pixel of the input and gives a output matching the size of the input. The resulting output, once trained with high-latitude dust(HLD) masks, will segment any given image to HLD and non-HLD pixels. we will be covering the process in this chapter (chapter-3).

### Deploy and Manage:

Sagemaker provides endpoints to infer from the trained models. This functionality will be showcased in Chapter-4


In [None]:
!pip install -r src/requirements.txt

## Required Imports

In [None]:
import boto3
import fiona

import math
import numpy as np
import os
import random
import rasterio.features
import re
import requests
import shutil

from datetime import datetime
from glob import glob
from io import BytesIO
from IPython.display import Image as Display
from PIL import Image


## From Chapter-2: setup access, download, and process data into ML ready format.

In [None]:
ACCOUNT_NUMBER = "350996086543"
ROLE_NAME = "notebookAccessRole"
ROLE_ARN = f"arn:aws:iam::{ACCOUNT_NUMBER}:role/{ROLE_NAME}"
SOURCE_BUCKET = "impact-datashare"
BUCKET_NAME = f"{ACCOUNT_NUMBER}-model-bucket"
DESTINATION_BUCKET = f"s3://{BUCKET_NAME}"

## Initialize a sagemaker session to upload data

In [None]:
import sagemaker
sagemaker_session = sagemaker.Session()
train_images = sagemaker_session.upload_data(path='data/train')
val_images = sagemaker_session.upload_data(path='data/val')
test_images = sagemaker_session.upload_data(path='data/test')

## Use tensorflow wrapper provided by sagemaker to train a unet model

The SageMaker Python SDK TensorFlow estimators and models and the SageMaker open-source TensorFlow containers make writing a TensorFlow script and running it in SageMaker easier.

We define a sagemaker instance using `sagemaker.tensorflow.Tensorflow` class. 

- `entry_point` parameter should point to the underlying tensorflow model implementation.
- `source_dir` points to the folder that contains the `entry_point`.
- `role` should be given tthe appropriate notebook access role we've created in chaper-2
- `instance_count`, `instance_type` defines the number of instances and the type of instance of the EC2 instance that will be used for compute.
- `output_path` - dictates where the output files from training the model will reside
- `image_uri` should point t the appropriate tensorflow ecr container image


In [None]:
from sagemaker.tensorflow import TensorFlow
keras_metric_definition = [
    {"Name": "train:loss", "Regex": ".*loss: ([0-9\\.]+) - accuracy: [0-9\\.]+.*"},
    {"Name": "train:accuracy", "Regex": ".*loss: [0-9\\.]+ - accuracy: ([0-9\\.]+).*"},
    {
        "Name": "validation:accuracy",
        "Regex": ".*step - loss: [0-9\\.]+ - accuracy: [0-9\\.]+ - val_loss: [0-9\\.]+ - val_accuracy: ([0-9\\.]+).*",
    },
    {
        "Name": "validation:loss",
        "Regex": ".*step - loss: [0-9\\.]+ - accuracy: [0-9\\.]+ - val_loss: ([0-9\\.]+) - val_accuracy: [0-9\\.]+.*",
    },
    {
        "Name": "sec/steps",
        "Regex": ".* (\d+)[mu]s/step - loss: [0-9\\.]+ - accuracy: [0-9\\.]+ - val_loss: [0-9\\.]+ - val_accuracy: [0-9\\.]+",
    },
]

LOG_FOLDER = "tensorboard_logs"

TENSORFLOW_LOGS_PATH = f"s3://{BUCKET_NAME}/{LOG_FOLDER}"

estimator = TensorFlow(
    entry_point='hld_sagemaker_demo.py',
    source_dir="/home/ec2-user/SageMaker/workshop_notebooks/chapter-3/src",
    role=ROLE_NAME,
    instance_count=1,
    instance_type='ml.p3.2xlarge',
    py_version='py3',
    hyperparameters={"log_dir": TENSORFLOW_LOGS_PATH},
    output_path=DESTINATION_BUCKET,
    image_uri='763104351884.dkr.ecr.us-east-1.amazonaws.com/tensorflow-training:2.4.1-gpu-py37-cu110-ubuntu18.04',
    metric_definition=keras_metric_definition,
    distribution={
        'parameter_server': {'enabled': True}
    }
)


## Train the model

Training the model is as simple as calling estimator.fit(), providing it appropriate arguments that is expected by `hld_sagemaker_demo.py`, which is the entry point for training the custom model (from the previous step)

In [None]:
estimator.fit(
    {
        'train': train_images,
        'eval': val_images, 
        'test': test_images
    }
)

## Attach a Tensorboard session

Tensorboard in an interactive model training visualization tool. This lets us monitor the training/validation losses over time and gives an idea of how good the model is training

In [32]:
aws_region = sagemaker_session.boto_region_name
!AWS_REGION={aws_region} tensorboard --logdir {TENSORFLOW_LOGS_PATH}

^C


# Note the trained model name in the DESTINATION_BUCKETdashboard

This will be used in the next chapter to deploy and infer from the model.