# Comparing instance segmentation of trees and watershed-based instance segmentation of semantically segmented trees

## To-dos
- **Perform grid search to determine best hyperparameters (batch size, epochs, learning rate)**
    - Don't forget to also incorporate KFolds CV!
- **Improve quality of watershed labels**
    - Review the labels and remove non-tree labels
- **Generate boundaries on label data**
    - Only applicable if using the U-Net approach (I think?)
    - Consider using a positive boundary (dilate) and a negative (erode) and compare
- **Incorporate KFolds (10 folds) cross-validation into model training**
    - I.e. Leave a test set alone and then use KFolds on the training set to derive train/val subsets)
- **Consider the Mask RCNN approach: https://github.com/matterport/Mask_RCNN**
   
---

## Instance segmentation

---

### Imports and function defs

In [None]:
import glob

import numpy as np
import tifffile as tiff
from patchify import patchify
from sklearn.model_selection import train_test_split
from tensorflow import keras
from tensorflow.keras.preprocessing.image import load_img


def patch_train_label(raster, labels, img_size, channels=False, merge_channel=False):
    samp_rast = tiff.imread(raster[0])
    img_base_size = samp_rast.shape[0]
    n = len(raster)
    m = (img_base_size // img_size) ** 2

    if not channels:
        channels = samp_rast.shape[-1]

    if merge_channel:
        channels += tiff.imread(merge_channel[0]).shape[-1]

    data_train = np.zeros((n * m, img_size, img_size, channels))
    data_label = np.zeros((n * m, img_size, img_size))

    for k in range(n):
        if merge_channel:
            r = np.concatenate(
                (tiff.imread(raster[k]), tiff.imread(merge_channel[k])), axis=-1
            )
        else:
            r = tiff.imread(raster[k])[..., :channels]

        # Only read in the specified number of channels from input raster
        patches_train = patchify(
            r,
            (img_size, img_size, channels),
            step=img_size,
        )
        patches_label = patchify(
            tiff.imread(labels[k]), (img_size, img_size), step=img_size
        )
        data_train[k * m : (k + 1) * m, :, :, :] = patches_train.reshape(
            -1, img_size, img_size, channels
        )
        data_label[k * m : (k + 1) * m, :, :] = patches_label.reshape(
            -1, img_size, img_size
        )

    data_label = (data_label > 0).astype("int")
    data_label = np.expand_dims(data_label, axis=-1)
    data_train = data_train.astype("float") / 255

    print(
        f"\nData sizes:\ndata_train: {data_train.shape}\ndata_label: {data_label.shape}\n"
    )

    return data_train, data_label

### Load, patchify, and split the data

In [None]:
# Patchify hand-labeled data PLUS NIR data
data_dir = "../data/"
hand_rgb_dir = f"{data_dir}train_rgb/"
hand_nir_dir = f"{data_dir}train_nir/"
hand_label_dir = f"{data_dir}label/"

patch_rgb = glob.glob(f"{hand_rgb_dir}*.tif")
patch_nir = glob.glob(f"{hand_nir_dir}*.tif")
patch_label = glob.glob(f"{hand_label_dir}*.tif")
patch_rgb.sort()
patch_label.sort()

print("Patchifying RGB + NIR data...")
data_train, data_label = patch_train_label(
    patch_rgb, patch_label, 128, merge_channel=patch_nir
)

# Patchify watershed data (pre-patchified)
patched_watershed_rgbi_dir = f"{data_dir}watershed/512/rgbi/"
patched_watershed_label_dir = f"{data_dir}watershed/512/labels/"

watershed_rgbi = glob.glob(f"{patched_watershed_rgbi_dir}*.tif")
watershed_labels = glob.glob(f"{patched_watershed_label_dir}*.tif")
watershed_rgbi.sort()
watershed_labels.sort()

print("Patchifying watershed data...")
data_train_ws, data_label_ws = patch_train_label(watershed_rgbi, watershed_labels, 128)

data_train = np.vstack((data_train, data_train_ws))
data_label = np.vstack((data_label, data_label_ws))

print(
    f"\nSizes after adding watershed data:\n\
data_train: {data_train.shape}\n\
data_label: {data_label.shape}\n"
)

x_train, x_test, y_train, y_test = train_test_split(
    data_train, data_label, test_size=0.1, random_state=157
)

print(
    f"\nSizes after splitting data:\n\
x_train: {x_train.shape}\n\
y_train: {y_train.shape}\n\
x_test: {x_test.shape}\n\
y_test: {y_test.shape}"
)

## Messing around

In [None]:
im_fn = glob.glob("../data/watershed/512/labels/*.tif")
im = tiff.imread(im_fn[0])

In [None]:
import sys
numpy.set_printoptions(threshold=sys.maxsize)

In [None]:
import matplotlib.pyplot as plt
im.astype("float")
im[im == 0] = np.nan
plt.imshow(im, cmap=plt.cm.tab20c)

## Semantic watershed segmentation

## Mask R-CNN Training

### 08 Oct, 2022
---

#### Run 1 (well, first run taking notes)

- Full dataset
- Split 0.1
- Seed = 1009
- Weights: Coco

In [None]:
class Config2(tree.TreeConfig):

    # Number of images to train with on each GPU. A 12GB GPU can typically
    # handle 2 images of 1024x1024px.
    # Adjust based on your GPU memory and image sizes. Use the highest
    # number that your GPU can handle for best performance.
    IMAGES_PER_GPU = 8

    # Number of training steps per epoch
    # This doesn't need to match the size of the training set. Tensorboard
    # updates are saved at the end of each epoch, so setting this to a
    # smaller number means getting more frequent TensorBoard updates.
    # Validation stats are also calculated at each epoch end and they
    # might take a while, so don't set this too small to avoid spending
    # a lot of time on validation stats.
    STEPS_PER_EPOCH = 100

    # Reduce validation steps because the epoch is also reduced
    VALIDATION_STEPS = 10

    # Number of classification classes (including background)
    NUM_CLASSES = 1 + 1  # Background + tree
    
    TRAIN_ROIS_PER_IMAGE = 200

    # Length of square anchor side in pixels
    RPN_ANCHOR_SCALES = (16, 32, 64, 128)
    
    # Ratios of anchors at each cell (width/height)
    # A value of 1 represents a square anchor, and 0.5 is a wide anchor
    RPN_ANCHOR_RATIOS = [0.5, 1, 1.5]

    # Non-max suppression threshold to filter RPN proposals.
    # You can increase this during training to generate more proposals.
    RPN_NMS_THRESHOLD = 0.9

    # How many anchors per image to use for RPN training
    RPN_TRAIN_ANCHORS_PER_IMAGE = 64

    # Input image resizing
    # Generally, use the "square" resizing mode for training and predicting
    # and it should work well in most cases. In this mode, images are scaled
    # up such that the small side is = IMAGE_MIN_DIM, but ensuring that the
    # scaling doesn't make the long side > IMAGE_MAX_DIM. Then the image is
    # padded with zeros to make it a square so multiple images can be put
    # in one batch.
    # Available resizing modes:
    # none:   No resizing or padding. Return the image unchanged.
    # square: Resize and pad with zeros to get a square image
    #         of size [max_dim, max_dim].
    # pad64:  Pads width and height with zeros to make them multiples of 64.
    #         If IMAGE_MIN_DIM or IMAGE_MIN_SCALE are not None, then it scales
    #         up before padding. IMAGE_MAX_DIM is ignored in this mode.
    #         The multiple of 64 is needed to ensure smooth scaling of feature
    #         maps up and down the 6 levels of the FPN pyramid (2**6=64).
    # crop:   Picks random crops from the image. First, scales the image based
    #         on IMAGE_MIN_DIM and IMAGE_MIN_SCALE, then picks a random crop of
    #         size IMAGE_MIN_DIM x IMAGE_MIN_DIM. Can be used in training only.
    #         IMAGE_MAX_DIM is not used in this mode.
    IMAGE_RESIZE_MODE = "crop"
    IMAGE_MIN_DIM = 128
    IMAGE_MAX_DIM = 128

    # Image mean (RGB)
    MEAN_PIXEL = np.array([107.0, 105.2, 101.5])
    
    # Max number of final detections
    DETECTION_MAX_INSTANCES = 100
    
    # Maximum number of ground truth instances to use in one image
    MAX_GT_INSTANCES = 101

    # Don't exclude based on confidence. Since we have two classes
    # then 0.5 is the minimum anyway as it picks between tree and BG
    DETECTION_MIN_CONFIDENCE = 0.5

    # If enabled, resizes instance masks to a smaller size to reduce
    # memory load. Recommended when using high-resolution images.
    USE_MINI_MASK = False
    MINI_MASK_SHAPE = (56, 56)  # (height, width) of the mini-mask
    
    # Weight decay regularization
    WEIGHT_DECAY = 0.005

    # Loss weights for more precise optimization.
    # Can be used for R-CNN training setup.
    LOSS_WEIGHTS = {
        "rpn_class_loss": 1.0,
        "rpn_bbox_loss": 1.0,
        "mrcnn_class_loss": 1.0,
        "mrcnn_bbox_loss": 1.0,
        "mrcnn_mask_loss": 1.0,
    }

In [None]:
# Training schedule

print("Train network heads")
model.train(dataset_train, dataset_val,
            learning_rate=config.LEARNING_RATE,
            epochs=20,
            augmentation=augmentation,
            layers='heads')
# Finetune layers from ResNet stage 4 and up
print("Fine tune Resnet stage 4 and up")
model.train(dataset_train, dataset_val,
            learning_rate=config.LEARNING_RATE,
            epochs=40,
            layers='4+')

print("Train all layers")
model.train(dataset_train, dataset_val,
            learning_rate=config.LEARNING_RATE/10,
            epochs=60,
            augmentation=augmentation,
            layers='all')

### 09 Oct, 2022
---

**Goals**
1. Explore yesterday's run
2. Decide if continuing with M-RCNN is worth it
3. Develop approach for applying watershed to semantic seg output
4. (optional) Explore GridSearch approach to optimizing semi-automatic segmentation

#### ~~Run 2~~ Canceled because I realized I set the steps per epoch way too high.

- Full dataset
- Split 0.1
- Seed = 1009
- Weights: Coco

##### Config

**Config Changes**
- Increase steps per epoch from 100 to 500
- Increase validation steps from 10 to 50

```python
class TreeConfig(Config):

    # Number of images to train with on each GPU. A 12GB GPU can typically
    # handle 2 images of 1024x1024px.
    # Adjust based on your GPU memory and image sizes. Use the highest
    # number that your GPU can handle for best performance.
    IMAGES_PER_GPU = 8

    # Number of training steps per epoch
    # This doesn't need to match the size of the training set. Tensorboard
    # updates are saved at the end of each epoch, so setting this to a
    # smaller number means getting more frequent TensorBoard updates.
    # Validation stats are also calculated at each epoch end and they
    # might take a while, so don't set this too small to avoid spending
    # a lot of time on validation stats.
    STEPS_PER_EPOCH = 500

    # Reduce validation steps because the epoch is also reduced
    VALIDATION_STEPS = 100

    # Number of classification classes (including background)
    NUM_CLASSES = 1 + 1  # Background + tree
    
    TRAIN_ROIS_PER_IMAGE = 200

    # Length of square anchor side in pixels
    RPN_ANCHOR_SCALES = (16, 32, 64, 128)
    
    # Ratios of anchors at each cell (width/height)
    # A value of 1 represents a square anchor, and 0.5 is a wide anchor
    RPN_ANCHOR_RATIOS = [0.5, 1, 1.5]

    # Non-max suppression threshold to filter RPN proposals.
    # You can increase this during training to generate more proposals.
    RPN_NMS_THRESHOLD = 0.9

    # How many anchors per image to use for RPN training
    RPN_TRAIN_ANCHORS_PER_IMAGE = 64

    # Input image resizing
    # Generally, use the "square" resizing mode for training and predicting
    # and it should work well in most cases. In this mode, images are scaled
    # up such that the small side is = IMAGE_MIN_DIM, but ensuring that the
    # scaling doesn't make the long side > IMAGE_MAX_DIM. Then the image is
    # padded with zeros to make it a square so multiple images can be put
    # in one batch.
    # Available resizing modes:
    # none:   No resizing or padding. Return the image unchanged.
    # square: Resize and pad with zeros to get a square image
    #         of size [max_dim, max_dim].
    # pad64:  Pads width and height with zeros to make them multiples of 64.
    #         If IMAGE_MIN_DIM or IMAGE_MIN_SCALE are not None, then it scales
    #         up before padding. IMAGE_MAX_DIM is ignored in this mode.
    #         The multiple of 64 is needed to ensure smooth scaling of feature
    #         maps up and down the 6 levels of the FPN pyramid (2**6=64).
    # crop:   Picks random crops from the image. First, scales the image based
    #         on IMAGE_MIN_DIM and IMAGE_MIN_SCALE, then picks a random crop of
    #         size IMAGE_MIN_DIM x IMAGE_MIN_DIM. Can be used in training only.
    #         IMAGE_MAX_DIM is not used in this mode.
    IMAGE_RESIZE_MODE = "crop"
    IMAGE_MIN_DIM = 128
    IMAGE_MAX_DIM = 128

    # Image mean (RGB)
    MEAN_PIXEL = np.array([107.0, 105.2, 101.5])
    
    # Max number of final detections
    DETECTION_MAX_INSTANCES = 100
    
    # Maximum number of ground truth instances to use in one image
    MAX_GT_INSTANCES = 101

    # Don't exclude based on confidence. Since we have two classes
    # then 0.5 is the minimum anyway as it picks between tree and BG
    DETECTION_MIN_CONFIDENCE = 0.5

    # If enabled, resizes instance masks to a smaller size to reduce
    # memory load. Recommended when using high-resolution images.
    USE_MINI_MASK = False
    MINI_MASK_SHAPE = (56, 56)  # (height, width) of the mini-mask
    
    # Weight decay regularization
    WEIGHT_DECAY = 0.005

    # Loss weights for more precise optimization.
    # Can be used for R-CNN training setup.
    LOSS_WEIGHTS = {
        "rpn_class_loss": 1.0,
        "rpn_bbox_loss": 1.0,
        "mrcnn_class_loss": 1.0,
        "mrcnn_bbox_loss": 1.0,
        "mrcnn_mask_loss": 1.0,
    }
```

##### Training Schedule

**Training schedule changes**
- Reduce Gaussian blur from 5.0 to 1.0
- Increased steps per epoch from config
```python
# Image augmentation
    # http://imgaug.readthedocs.io/en/latest/source/augmenters.html
    if augmentation:
        augmentation = iaa.SomeOf(
            (0, 2),
            [
                iaa.Fliplr(0.5),
                iaa.Flipud(0.5),
                iaa.OneOf(
                    [
                        iaa.Affine(rotate=90),
                        iaa.Affine(rotate=180),
                        iaa.Affine(rotate=270),
                    ]
                ),
                iaa.Multiply((0.8, 1.5)),
                iaa.GaussianBlur(sigma=(0.0, 1.0)),
            ],
        )

    print("Train network heads")
    model.train(dataset_train, dataset_val,
                learning_rate=config.LEARNING_RATE,
                epochs=20,
                augmentation=augmentation,
                layers='heads')
    # Finetune layers from ResNet stage 4 and up
    print("Fine tune Resnet stage 4 and up")
    model.train(dataset_train, dataset_val,
                learning_rate=config.LEARNING_RATE,
                epochs=40,
                layers='4+')

    print("Train all layers")
    model.train(dataset_train, dataset_val,
                learning_rate=config.LEARNING_RATE/10,
                epochs=60,
                augmentation=augmentation,
                layers='all')
```

#### Run 3

- Full dataset
- Split 0.1
- Seed = 1009
- Weights: Coco
- Augumentation = True

##### Config

**Config Changes**
- Update steps/val steps per epoch to represent dataset size
- Use Resnet50

```python
class TreeConfig(Config):

    # Number of images to train with on each GPU. A 12GB GPU can typically
    # handle 2 images of 1024x1024px.
    # Adjust based on your GPU memory and image sizes. Use the highest
    # number that your GPU can handle for best performance.
    IMAGES_PER_GPU = 8

    # Number of training steps per epoch
    # This doesn't need to match the size of the training set. Tensorboard
    # updates are saved at the end of each epoch, so setting this to a
    # smaller number means getting more frequent TensorBoard updates.
    # Validation stats are also calculated at each epoch end and they
    # might take a while, so don't set this too small to avoid spending
    # a lot of time on validation stats.
    STEPS_PER_EPOCH = 324 // IMAGES_PER_GPU
    VALIDATION_STEPS = 36 // IMAGES_PER_GPU

    # Number of classification classes (including background)
    NUM_CLASSES = 1 + 1  # Background + tree
    
    TRAIN_ROIS_PER_IMAGE = 200

    # Length of square anchor side in pixels
    RPN_ANCHOR_SCALES = (16, 32, 64, 128)
    
    # Ratios of anchors at each cell (width/height)
    # A value of 1 represents a square anchor, and 0.5 is a wide anchor
    RPN_ANCHOR_RATIOS = [0.5, 1, 1.5]

    # Non-max suppression threshold to filter RPN proposals.
    # You can increase this during training to generate more proposals.
    RPN_NMS_THRESHOLD = 0.9

    # How many anchors per image to use for RPN training
    RPN_TRAIN_ANCHORS_PER_IMAGE = 64

    # Backbone network architecture
    # Supported values are: resnet50, resnet101
    BACKBONE = "resnet50"

    # Input image resizing
    # Generally, use the "square" resizing mode for training and predicting
    # and it should work well in most cases. In this mode, images are scaled
    # up such that the small side is = IMAGE_MIN_DIM, but ensuring that the
    # scaling doesn't make the long side > IMAGE_MAX_DIM. Then the image is
    # padded with zeros to make it a square so multiple images can be put
    # in one batch.
    # Available resizing modes:
    # none:   No resizing or padding. Return the image unchanged.
    # square: Resize and pad with zeros to get a square image
    #         of size [max_dim, max_dim].
    # pad64:  Pads width and height with zeros to make them multiples of 64.
    #         If IMAGE_MIN_DIM or IMAGE_MIN_SCALE are not None, then it scales
    #         up before padding. IMAGE_MAX_DIM is ignored in this mode.
    #         The multiple of 64 is needed to ensure smooth scaling of feature
    #         maps up and down the 6 levels of the FPN pyramid (2**6=64).
    # crop:   Picks random crops from the image. First, scales the image based
    #         on IMAGE_MIN_DIM and IMAGE_MIN_SCALE, then picks a random crop of
    #         size IMAGE_MIN_DIM x IMAGE_MIN_DIM. Can be used in training only.
    #         IMAGE_MAX_DIM is not used in this mode.
    IMAGE_RESIZE_MODE = "crop"
    IMAGE_MIN_DIM = 128
    IMAGE_MAX_DIM = 128

    # Image mean (RGB)
    MEAN_PIXEL = np.array([107.0, 105.2, 101.5])
    
    # Max number of final detections
    DETECTION_MAX_INSTANCES = 100
    
    # Maximum number of ground truth instances to use in one image
    MAX_GT_INSTANCES = 101

    # ROIs kept after non-maximum supression (training and inference)
    POST_NMS_ROIS_TRAINING = 1000
    POST_NMS_ROIS_INFERENCE = 2000

    # Don't exclude based on confidence. Since we have two classes
    # then 0.5 is the minimum anyway as it picks between tree and BG
    DETECTION_MIN_CONFIDENCE = 0

    # If enabled, resizes instance masks to a smaller size to reduce
    # memory load. Recommended when using high-resolution images.
    USE_MINI_MASK = True
    MINI_MASK_SHAPE = (56, 56)  # (height, width) of the mini-mask
    
    # Weight decay regularization
    WEIGHT_DECAY = 0.005

    # Loss weights for more precise optimization.
    # Can be used for R-CNN training setup.
    LOSS_WEIGHTS = {
        "rpn_class_loss": 1.0,
        "rpn_bbox_loss": 1.0,
        "mrcnn_class_loss": 1.0,
        "mrcnn_bbox_loss": 1.0,
        "mrcnn_mask_loss": 1.0,
    }
```

##### Training Schedule

**Training schedule changes**
- Reduce Gaussian blur from 5.0 to 1.0
- Increased 4+ and all layers # of epochs from 20 and 20 to 80 and 100, respectively.
```python
# Image augmentation
    # http://imgaug.readthedocs.io/en/latest/source/augmenters.html
    if augmentation:
        augmentation = iaa.SomeOf(
            (0, 2),
            [
                iaa.Fliplr(0.5),
                iaa.Flipud(0.5),
                iaa.OneOf(
                    [
                        iaa.Affine(rotate=90),
                        iaa.Affine(rotate=180),
                        iaa.Affine(rotate=270),
                    ]
                ),
                iaa.Multiply((0.8, 1.5)),
                iaa.GaussianBlur(sigma=(0.0, 1.0)),
            ],
        )

    print("Train network heads")
    model.train(dataset_train, dataset_val,
                learning_rate=config.LEARNING_RATE,
                epochs=20,
                augmentation=augmentation,
                layers='heads')
    # Finetune layers from ResNet stage 4 and up
    print("Fine tune Resnet stage 4 and up")
    model.train(dataset_train, dataset_val,
                learning_rate=config.LEARNING_RATE,
                epochs=100,
                layers='4+')

    print("Train all layers")
    model.train(dataset_train, dataset_val,
                learning_rate=config.LEARNING_RATE/10,
                epochs=200,
                augmentation=augmentation,
                layers='all')
```

### 10 Oct, 2022
---

#### Goals
- Try to address the grappler `CropAndResize` error to see if that is causing issues
- Add Grid Search (or maybe Random Search) capability to `tree.py`
- Apply watershed to the output of `model_rgb.hdf5`
- Watch some more of the Stanford course

#### Grid Search plan
- Use low # of epochs for coarse exploration
- LR first
- LR decay
- Test anchor scales
- Weight decay
- Use Resnet101
- Train only classifiers???

**Grid Search 1**
- Update min conf to 0.6
- Use resnet101
- learning rates: 0.02, 0.0001, 0.00001
- ~~lr decay: use custom callback with `ReduceLROnPlateau` (true, false)~~
- anchor scales: (16, 32, 64, 128), (8, 16, 32, 64)

#### Run 4 (13:34)

`python3 tree.py train --dataset=data/mrcnn --subset=hand --weights=coco --seed=1009 --gs=True`

#### Run 5 (17:17)

`python3 tree.py train --dataset=data/mrcnn --subset=hand --weights=coco --seed=1009 --gs=True`

```python
etas = [0.02, 0.001, 0.0001, 0.00001]
anchor_scales = [(16, 32, 64, 128), (8, 16, 32, 64)]
augmentation = None
LRD = False
```

## 11 Oct, 2022

### Goals
- One more grid search with values that didn't get run
- Run detection with proper config (based on GS results)
- Apply watershed to semantic output
- More Stanford
- Consider Deep Forest detection instead
- Clean data

### Run 6

`python3 tree.py train --dataset=data/mrcnn --subset=hand --weights=coco --seed=1009 --gs=True`

```python
etas = [0.0001, 0.00001]
anchor_scales = [(16, 32, 64, 128), (8, 16, 32, 64)]
augmentation = None
LRD = False
```

### Watershed Instance Segmentation

Use labs/lab4/lusk_lab4-task4.py with RGB(I) weights file `model_rgb.hdf5`.