# Semantic Segmentation of Water using U-Net
# Part 7 - Hyperparameter Tuning

In [1]:
%matplotlib inline
import tensorflow as tf
from tensorflow.keras.layers import Input, Conv2D, MaxPooling2D, UpSampling2D
from tensorflow.keras.layers import concatenate, Conv2DTranspose
from tensorflow.keras.models import Model
from tensorflow.keras.preprocessing.image import ImageDataGenerator, load_img, save_img
import numpy as np
import json, os
from random import shuffle
from PIL import Image
import matplotlib
import matplotlib.pyplot as plt
import pickle
import pandas as pd
import warnings
import re
import time

from unetlib.metrics import BinaryMeanIoU
from unetlib.model import UNet
from unetlib.preprocessing import get_lakes_with_masks, make_dataframes_for_flow, make_img_msk_flows
import unetlib.visualisation as vs
from unetlib.pipelines import train_unet

To tune the learning rate hyperparameter, and any others for that matter, a selection of values should be tried and whichever yields the best validation loss should be kept. It is not practical or efficient to test every possible value so a common strategy in the literature is to test powers of 10 e.g. 0.001, 0.01, 0.1 etc.

Another common approach is to allow the learning rate to be decreased during the training process. This means that earlier steps can make larger movements but as the model converges, the learning rate gets smaller so small steps can me made to avoid overshooting the minimum. One way of implementing this is to use a learning rate scheduler to reduce the rate after a certain number of epochs, though it can be difficult to determine at which epochs the rate should be reudced. An alternative is to reduce the learning rate whenever the loss doesnt improve for a certain amount of time.

The `RMSProp` optimiser i'm using also has a `momentum` parameter. This essentially allows gradient descent to build up speed and can help pass local minima or saddle points. RMSProp also includes a dampening factor, `rho` which helps slow the process to avoid overshooting the minimum.

Speed up convergence - batch norm / learning rate / momentum

Dropout (if overfitting)

Activations - could try sigmoid or tanh (adjut batch norm appropriately)

Ensemble? e.g. train several smaller models and average the predictions

In [None]:
model = UNet_BN(n_filters=64, n_blocks=4, bn_pos=bn_pos, model_name='deepwide')

In [2]:
# Imagery directories
nwpu_data_dir = 'nwpu_lake_images/data/'
nwpu_mask_dir = 'nwpu_lake_images/masks/'

In [None]:
train_unet(model, optimiser='adam', save_as = os.path.join(output_dir, model.name))