# Chapter-3: Create and train a ML segmentation model using AWS SageMaker

The objectives you complete during the course of this chapter introduce you to the process of implementing the SageMaker model training tool. 

### AWS SageMaker:
Amazon SageMaker helps data scientists and developers prepare, build, train, and deploy high-quality machine learning (ML) models quickly by bringing together a broad set of capabilities purpose-built for ML.

Simply put, It is a set of cloud-based (specifically, AWS) apps that focus on labeling, training, testing and deploying models.

## How it works in context of our HLD problem:

### Prepare

SageMaker provides in-house labelling tools and data wrangling tools for some common ML workflows. Owing to the spatiotemporal nature of the HLD dataset, we will be skipping this functionality of SageMaker and use the ImageLabeler tool, built by IMPACT, for identifying features and labeling them. This workflow has been covered in Chapter-0 and Chapter-1 of this workshop.

### Build, train, and tune

SageMaker provides access to cloud-hosted Jupyter notebooks along with pre-built ML models. For our purposes, the model we are using for this demo is the UNet segmentation model (https://lmb.informatik.uni-freiburg.de/people/ronneber/u-net/). The architecture is a stack of convolutions followed by de-convolutions. This model assigns a class label to each pixel of the input and gives an output matching the size of the input. The resulting output, once trained with high-latitude dust (HLD) masks, will segment any given image into HLD and non-HLD pixels. We will be covering the process in this chapter (Chapter-3).

### Deploy and manage

SageMaker provides endpoints to infer from the trained models. This functionality will be showcased in Chapter-4.


In [3]:
!pip install -r src/requirements.txt

Collecting tensorboard<2.5
  Using cached tensorboard-2.4.1-py3-none-any.whl (10.6 MB)
Installing collected packages: tensorboard
  Attempting uninstall: tensorboard
    Found existing installation: tensorboard 2.3.0
    Uninstalling tensorboard-2.3.0:
      Successfully uninstalled tensorboard-2.3.0
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
tensorflow-serving-api 2.1.0 requires tensorflow~=2.1.0, but you have tensorflow 2.4.1 which is incompatible.
tensorflow-cpu 2.1.3 requires gast==0.2.2, but you have gast 0.3.3 which is incompatible.
tensorflow-cpu 2.1.3 requires keras-preprocessing==1.1.0, but you have keras-preprocessing 1.1.2 which is incompatible.
tensorflow-cpu 2.1.3 requires numpy<1.19.0,>=1.16.0, but you have numpy 1.19.5 which is incompatible.
tensorflow-cpu 2.1.3 requires tensorboard<2.2.0,>=2.1.0, but you have tensorboard 2.4.1 which is

## Required Imports

In [4]:
import boto3
import fiona

import math
import numpy as np
import os
import random
import rasterio.features
import re
import requests
import shutil

from datetime import datetime
from glob import glob
from io import BytesIO
from IPython.display import Image as Display
from PIL import Image


## From Chapter-2: setup access, download, and process data into ML-ready format.

In [5]:
ACCOUNT_NUMBER = <your account number>
ROLE_NAME = "notebookAccessRole"
ROLE_ARN = f"arn:aws:iam::{ACCOUNT_NUMBER}:role/{ROLE_NAME}"
SOURCE_BUCKET = "impact-datashare"
BUCKET_NAME = f"{ACCOUNT_NUMBER}-model-bucket"
DESTINATION_BUCKET = f"s3://{BUCKET_NAME}"

## Initialize a SageMaker session to upload data

In [6]:
import sagemaker
sagemaker_session = sagemaker.Session()
train_images = sagemaker_session.upload_data(path='data/train', bucket=BUCKET_NAME, key_prefix='data/train')
val_images = sagemaker_session.upload_data(path='data/val', bucket=BUCKET_NAME, key_prefix='data/val')
test_images = sagemaker_session.upload_data(path='data/test', bucket=BUCKET_NAME, key_prefix='data/test')

## Deep Learning
Deep learning refers to neural networks with multiple hidden layers that can learn increasingly abstract representations of the input data.

deep learning has led to major advances in computer vision. We’re now able to classify images, find objects in them, and even label them with captions. To do so, deep neural networks with many hidden layers can sequentially learn more complex features from the raw input image:
* The first hidden layers might only learn local edge patterns.
* Then, each subsequent layer (or filter) learns more complex representations.
* Finally, the last layer can classify the image as a cat or kangaroo.
These types of deep neural networks are called Convolutional Neural Networks.

## Convolutional Neural Networks
Convolutional Neural Networks (CNN’s) are multi-layer neural networks (sometimes up to 17 or more layers) that assume the input data to be images.

<img src="Feature_maps.png">

By making this requirement, CNN's can drastically reduce the number of parameters that need to be tuned. Therefore, CNN's can efficiently handle the high dimensionality of raw images.

There are multitude of different neural network architectures that use CNN that are used for various tasks. For Image segmeenttation task, we use U-Net model:

## U-Net Segmentation model

The model we are using for this demo is the U-Net segmentation model (https://lmb.informatik.uni-freiburg.de/people/ronneber/u-net/). The architecture is a stack of convolutions followed by de-convolutions that gives it's U-shape. 
<img src="u-net-architecture.png">

This model assigns a class label to each pixel of the input and gives an output matching the size of the input. The resulting output, once trained with high-latitude dust (HLD) masks, will segment any given image into HLD and non-HLD pixels.




## Keras: A Deep Learning Framework
Keras is a library for deep learning in Python. Its minimalistic, modular approach makes it easy to get deep neural networks up and running. You can read more about it here: https://keras.io/

Assuming keras is installed in your python environment, The main usage of the library is listed as follows:

### import keras modules

```
from keras.layers import Dense, Dropout, Activation, Flatten
```
### Preprocess / Load Images.

Using Image Libraries and preprocessing, convert images into numpy arrays.

we use keras sequence library (https://www.tensorflow.org/api_docs/python/tf/keras/utils/Sequence) to load a subset of input images in memory at a time and process in batches. This makes sure we are not running out of memory trying to load all images at once. imgaug library (https://github.com/aleju/imgaug) is used to augment images by translations and introducing random noise to inclrease the variability in input images and help with model generalization.

### Define model architecture

We define a neural network model as a set of keras layer objects, starting from Input to Output, and hidden layers in between. Here are some example models: https://keras.io/examples/

### Add Regularization methods to prevent overfitting
Overfitting is a case when the model learns too much of training data that it fails to replicate the performance in real-world (test) data. regularization techniques are measures used to prevent this case.
#### Dropouts
This is a method for regularizing our model in order to prevent overfitting. You can read more about it here.
#### MaxPooling
MaxPooling2D is a way to reduce the number of parameters in our model by sliding a 2x2 pooling filter across the previous layer and taking the max of the 4 values in the 2x2 filter.

#### BatchNormalization

Batch normalization is a technique for training very deep neural networks that standardizes the inputs to a layer for each mini-batch. This has the effect of stabilizing the learning process and dramatically reducing the number of training epochs required to train deep networks.

### Compile model with loss metric

When we compile the model, we declare the loss function and the optimizer (SGD, Adam, etc.)
Keras has a variety of [loss functions](https://keras.io/api/losses/) and out-of-the-box [optimizers](https://keras.io/api/optimizers/) to choose from.
```
Loss function dictates how far the model estimate is from the actual output (truth value), and the optimizer makes adjustments to the model variables so that the estimate is closer to the actual output.

Model.compile(
    optimizer="rmsprop",
    loss=None,
    metrics=None,
    loss_weights=None,
    weighted_metrics=None,
    run_eagerly=None,
    steps_per_execution=None,
    **kwargs
)
```

### Define Callbacks
Callbacks are pieces of code that get executed every time the model trains through one pass of all available input images. This is particularly useful functionality provided by keras to do various tasks such as stopping training if the model does not improve significantly, saving the weights of the best model only, and plotting training graph to better understand how the model learns in training phase.

 ### training the model with the generators
 
 Finally, training the model is as simple as calling `model.fit()` method. 

```
Model.fit(
    x=None,
    y=None,
    batch_size=None,
    epochs=1,
    verbose="auto",
    callbacks=None,
    validation_split=0.0,
    validation_data=None,
    shuffle=True,
    class_weight=None,
    sample_weight=None,
    initial_epoch=0,
    steps_per_epoch=None,
    validation_steps=None,
    validation_batch_size=None,
    validation_freq=1,
    max_queue_size=10,
    workers=1,
    use_multiprocessing=False,
)
``` 
 some notes:

'steps_per_epoch' defines the number of times the generator should be called for each epoch. this number is the number of input samples (54) divided by the batch_size (4) ~= 13. similrly, validation_step is num of images in validation split (38) divided by batch_size ~= 10

'epochs' just need to be sufficiently large, since we are using EarlyStopping callback to preemptively stop model training if the loss does not improve for several epochs

### Test Model:

Predictions can be made on any input image by calling `model.predict()` with the list of input images to predict.


## Use the TensorFlow wrapper provided by SageMaker to train a UNet model

Tensorflow is a low-level deeplearning library used by keras developed and maintained by google.

The SageMaker Python SDK TensorFlow estimators and models and the SageMaker open-source TensorFlow containers make writing a TensorFlow script and running it in SageMaker easier.

We define a SageMaker instance using `sagemaker.tensorflow.Tensorflow` class. 

- `entry_point` parameter should point to the underlying TensorFlow model implementation.
- `source_dir` points to the folder that contains the `entry_point`.
- `role` should be given the appropriate notebook access role we created in Chaper-2.
- `instance_count`, `instance_type` defines the number of instances and the type of instance of the EC2 instance that will be used for compute.
- `output_path` - dictates where the output files from training the model will reside.
- `image_uri` should point to the appropriate TensorFlow ECR container image.


In [9]:
from sagemaker.tensorflow import TensorFlow
keras_metric_definition = [
    {"Name": "train:loss", "Regex": ".*loss: ([0-9\\.]+) - accuracy: [0-9\\.]+.*"},
    {"Name": "train:accuracy", "Regex": ".*loss: [0-9\\.]+ - accuracy: ([0-9\\.]+).*"},
    {
        "Name": "validation:accuracy",
        "Regex": ".*step - loss: [0-9\\.]+ - accuracy: [0-9\\.]+ - val_loss: [0-9\\.]+ - val_accuracy: ([0-9\\.]+).*",
    },
    {
        "Name": "validation:loss",
        "Regex": ".*step - loss: [0-9\\.]+ - accuracy: [0-9\\.]+ - val_loss: ([0-9\\.]+) - val_accuracy: [0-9\\.]+.*",
    },
    {
        "Name": "sec/steps",
        "Regex": ".* (\d+)[mu]s/step - loss: [0-9\\.]+ - accuracy: [0-9\\.]+ - val_loss: [0-9\\.]+ - val_accuracy: [0-9\\.]+",
    },
]

LOG_FOLDER = "tensorboard_logs"

TENSORBOARD_LOGS_PATH = f"s3://{BUCKET_NAME}/{LOG_FOLDER}/"

estimator = TensorFlow(
    entry_point='hld_sagemaker_demo.py',
    source_dir="/home/ec2-user/SageMaker/workshop_notebooks/chapter-3/src",
    role=ROLE_NAME,
    instance_count=1,
    instance_type='ml.p3.2xlarge',
    py_version='py3',
    hyperparameters={"log_dir": TENSORBOARD_LOGS_PATH, 'epochs': 35, 'batch_size': 20, 'learning_rate': 0.01},
    output_path=DESTINATION_BUCKET,
    image_uri='763104351884.dkr.ecr.us-east-1.amazonaws.com/tensorflow-training:2.4.1-gpu-py37-cu110-ubuntu18.04',
    metric_definition=keras_metric_definition,
    distribution={
        'parameter_server': {'enabled': True}
    }
)


## Train the model

Training the model is as simple as calling estimator.fit() and providing it appropriate arguments that are expected by `hld_sagemaker_demo.py`, which is the entry point for training the custom model (from the previous step).

In [10]:
estimator.fit(
    {
        'train': train_images,
        'eval': val_images, 
        'test': test_images
    }
)

2021-06-02 17:54:56 Starting - Starting the training job...
2021-06-02 17:54:58 Starting - Launching requested ML instancesProfilerReport-1622656495: InProgress
......
2021-06-02 17:56:12 Starting - Preparing the instances for training.........
2021-06-02 17:57:52 Downloading - Downloading input data...
2021-06-02 17:58:12 Training - Downloading the training image...........[34m2021-06-02 17:59:59.764654: W tensorflow/core/profiler/internal/smprofiler_timeline.cc:460] Initializing the SageMaker Profiler.[0m
[34m2021-06-02 17:59:59.769554: W tensorflow/core/profiler/internal/smprofiler_timeline.cc:105] SageMaker Profiler is not enabled. The timeline writer thread will not be started, future recorded events will be dropped.[0m
[34m2021-06-02 17:59:59.880204: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0[0m
[34m2021-06-02 17:59:59.998353: W tensorflow/core/profiler/internal/smprofiler_timeline.cc:460] Initializ

## Attach a TensorBoard session

TensorBoard in an interactive model training visualization tool. It lets us monitor the training/validation losses over time and gives an idea of how well the model is training. (Since we are passing the logs to S3, we can interact with the logs directly from our local setup.)
Here's how you'd connect your TensorBoard to your S3 bucket:
1. install TensorBoard version 2.4.1 using `pip install tensorboard==2.4.1`
2. run `AWS_REGION=<the region you selected> tensorboard --logdir s3://<your account number>-model-bucket/tensorboard_logs/`
3. navigate to localhost:6006 to view the model metrics, model graph timeline, and more.


You can also try running it via this very notebook using the following cell. 

In [None]:
# The corresponding TensorBoard can be accessed using `https://<your-notebook-instance-name>/proxy/6006/` the https and the trailing spaces are very important.
aws_region = sagemaker_session.boto_region_name
!AWS_REGION={aws_region} tensorboard --logdir {TENSORBOARD_LOGS_PATH}

# Note the trained model name in the DESTINATION_BUCKET dashboard

This will be used in the next chapter to deploy and infer from the model.