# Farm ponds identification pipeline: Network Selection

**Make sure you have tiles of the satellite images and corresponding masks in the corresponding files in the training and validation folders.** The tiles and the mask should follow the naming scheme: ```tile_x_y.png```, where x is the top left coordinates ```(x,y)``` of the image (in pixles). If you don't have a set of images in the train and val files, or do not have the COCO JSON files, please visit the [generate-trainning notebook](../0_generate-training/generate-training.ipynb)

## Install the packages for the pipeline
Make sure you have the environment set up done, so that we can import the packages used in this notebook. Check out the [setup info](../../README.md). Traning the instance segmentation model requires ```torch``` and ```detectron2```, so here we need to install the libraries first. 


In [None]:
!python3 -m pip install 'git+https://github.com/facebookresearch/detectron2.git'

## Import the libraries that are used in the pipeline
Part of the libraries and requirements used in this pipeline are based on Detectron2 documentation. You can find more details of the pre-train models from [Meta Research's Github repository](https://github.com/facebookresearch/detectron2)

In [None]:
import torch, detectron2

# Setup detectron2 logger
from detectron2.utils.logger import setup_logger
setup_logger()

import numpy as np
import os, cv2, random, sys
import matplotlib.pyplot as plt
import yaml
from detectron2 import model_zoo
from detectron2.config import get_cfg, LazyConfig
from detectron2.utils.visualizer import Visualizer
from detectron2.data import MetadataCatalog, DatasetCatalog
from detectron2.engine import DefaultTrainer
from detectron2.data import transforms as T
from detectron2.data import DatasetMapper, build_detection_train_loader
from detectron2.data.datasets import register_coco_instances
from detectron2.projects import point_rend # uncomment if pointrend

TORCH_VERSION = ".".join(torch.__version__.split(".")[:2])
CUDA_VERSION = torch.__version__.split("+")[-1]
print("torch: ", TORCH_VERSION, "; cuda: ", CUDA_VERSION)
print("detectron2:", detectron2.__version__)

### 1. Setting up the folder paths and parameters
You don't need to change the folder paths as the missing folders should be created if they do not exist. The data produced in the pipeline will be stored in the corresponding folders (e.g. training data in train)

In [None]:
# Paths
ponds_root = os.path.dirname(os.path.dirname(os.getcwd())) 
if ponds_root not in sys.path:
    sys.path.append(ponds_root)
train_image_path = os.path.join(ponds_root, "data/train.png")  # Path to the input image
train_mask_path =  os.path.join(ponds_root,"data/train_mask.png")
train_folder =  os.path.join(ponds_root,"data/train/")  # Output folder for tiles
train_mask_folder =  os.path.join(ponds_root,"data/train_mask/")
train_not_used_folder =  os.path.join(ponds_root,"data/train_not_used/")
val_folder =  os.path.join(ponds_root,"data/val/")
val_mask_folder =   os.path.join(ponds_root,"data/val_mask/")
output_folder = os.path.join(ponds_root, "output/")
parameter_folder = os.path.join(output_folder, "parameters/")
model_folder = os.path.join(output_folder, "model/")


### 2. Resigter custom COCO dataset

Import a COCO Json file here for the training and validation data. Our goal here is to identify farm ponds with certain artifacts. So here we register a custom dataset that includes the training and validation image set for farm ponds that was used as an example in [the previous step](../0_generate-training/generate-training.ipynb). 

To make sure that the code properly registers the instance labels, place the ```train.json``` in ```train```, and ```val.json``` in ```val```.

In [None]:
register_coco_instances("pond_train", {}, os.path.join(train_folder,"train.json"), train_folder)
register_coco_instances("pond_val", {}, os.path.join(val_folder,"val.json"), val_folder)
pond_metadata = MetadataCatalog.get("pond_train")

### (Optional) Visualize masks in the train data

To examine the images in the training set, the code below shows three random images from the dataset. You can change the number of images shown by changing the variable ```number_of_images```.

In [None]:
dataset_dicts = DatasetCatalog.get("pond_train")
number_of_images = 3
for d in random.sample(dataset_dicts, number_of_images):
    img = cv2.imread(d["file_name"])
    visualizer = Visualizer(img[:, :, ::-1], metadata=pond_metadata, scale=0.5)
    out = visualizer.draw_dataset_dict(d)
    plt.imshow(cv2.cvtColor(out.get_image()[:, :, ::-1], cv2.COLOR_BGR2RGB))
    plt.show()


### 3. Set up training parameters

Here we implement a COCO-pretrained ResNet-101 Mask-RCNN model on the pond_train dataset. You can switch this model into other pretrained or custom models from the [model zoo](https://github.com/facebookresearch/detectron2/blob/main/MODEL_ZOO.md)

In [None]:
cfg = get_cfg()
#point_rend.add_pointrend_config(cfg) # un comment if changeing model to pointrend
cfg.merge_from_file(model_zoo.get_config_file("COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml"))
#cfg.merge_from_file("detectron2_repo/projects/PointRend/configs/InstanceSegmentation/pointrend_rcnn_R_50_FPN_3x_coco.yaml") # uncomment if pointrend
cfg.DATASETS.TRAIN = ("pond_train")
cfg.DATASETS.TEST = ()
cfg.DATALOADER.NUM_WORKERS = 2
cfg.MODEL.WEIGHTS = model_zoo.get_checkpoint_url("COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml")
#cfg.MODEL.WEIGHTS = "detectron2://PointRend/InstanceSegmentation/pointrend_rcnn_R_50_FPN_3x_coco/164955410/model_final_edd263.pkl" # uncomment if pointrend
cfg.SOLVER.IMS_PER_BATCH = 2  
cfg.SOLVER.BASE_LR = 0.0005  
cfg.SOLVER.MAX_ITER = 2000    
cfg.SOLVER.STEPS = []       
cfg.MODEL.ROI_HEADS.BATCH_SIZE_PER_IMAGE = 128
cfg.MODEL.ROI_HEADS.NUM_CLASSES = 1  

LazyConfig.save(cfg, os.path.join(parameter_folder, "config"))


### Set up multiple configurations to train the model

To optimize our model, we can use a random search approach to finetune the model and find the best performing configuration in the validation step. To prepare for that, we can create multiple sets of configurations to test in the next steps.
An error message ```ERROR: Unable to serialize the config to yaml.``` will likely show up when you finish creating the files. It shouldn't be an issue since detectron2 can read pkl files when training the model. 

In [None]:
# clone the repo in order to access pre-defined configs in PointRend project
!git clone --branch v0.6 https://github.com/facebookresearch/detectron2.git detectron2_repo # uncomment if pointrend

In [None]:
import random
from detectron2.projects import point_rend # uncomment if pointrend
# Model selection
pretrained_model_path = "COCO-InstanceSegmentation/mask_rcnn_X_101_32x8d_FPN_3x.yaml"#"COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml"
# hyperparameters
lr_range = [0.0001, 0.001]
batch_size_range = [2, 16]
#
num_experiments = 5 # number of models to test 

for num in range(1, num_experiments+1):
    learning_rate = random.uniform(*lr_range)
    batch_size = random.randint(*batch_size_range)
    cfg = get_cfg()
    #point_rend.add_pointrend_config(cfg) # un comment if changeing model to pointrend
    cfg.merge_from_file(model_zoo.get_config_file(pretrained_model_path))
    #cfg.merge_from_file("detectron2_repo/projects/PointRend/configs/InstanceSegmentation/pointrend_rcnn_R_50_FPN_3x_coco.yaml") # uncomment if pointrend
    cfg.DATASETS.TRAIN = ("pond_train",)
    cfg.DATASETS.TEST = ()
    cfg.DATALOADER.NUM_WORKERS = batch_size
    cfg.MODEL.WEIGHTS = model_zoo.get_checkpoint_url(pretrained_model_path)
    #cfg.MODEL.WEIGHTS = "detectron2://PointRend/InstanceSegmentation/pointrend_rcnn_R_50_FPN_3x_coco/164955410/model_final_edd263.pkl" # uncomment if pointrend
    cfg.SOLVER.IMS_PER_BATCH = 2  
    cfg.SOLVER.BASE_LR = learning_rate
    cfg.SOLVER.MAX_ITER = 16000    
    cfg.SOLVER.STEPS = []       
    cfg.MODEL.ROI_HEADS.BATCH_SIZE_PER_IMAGE = 128
    cfg.MODEL.ROI_HEADS.NUM_CLASSES = 1      

    LazyConfig.save(cfg, os.path.join(parameter_folder, f"config_{num}"))


The configuration for the models are saved as ```config.pkl```s in the [parameter_folder](../../output/parameters). We are now ready to train the neural network in the [next step](../2_train_network/train_network.ipynb). 