# **DeepLandforms**

Author: giacomo.nodjoumi@hyranet.info - g.nodjoumi@jacobs-university.de

## DeepLandforms Training

With this notebook, users can train instance segmentation models on custom dataset of georeferenced images.
The models are based on state-of-the-art general purpose architectures, available [here](https://github.com/facebookresearch/detectron2).
Despite several types of networks are supported, such as object detection, image segmentation ad instance segmentation, and available in the above repository, this notebook and the complementary **DeepLandrorms-Segmentation** notebook are specific for instance segmentation architectures for georefernced images.

## Usage

* Prepare the dataset in COCO label format, using provided **LabelMe** container or else.
* Put or link the dataset into the **DeepLandforms** *.env* file
* Run docker-compose up
* Edit the *configs* section by editing the following parameters:
------------------------------------------------------------------
| **Parameter** | **Function** | **Common Values** |
| ---- | ---- | ---- |
| **cfg.merge_from_file(model_zoo.get_config_file(""))** | Model Architecture | MASK-R-CNN in this work |
| **cfg.TEST.EVAL_PERIOD** |  N° of epochs after an evaluation is performed | depending on SOLVER.MAX_ITER, usually every 1/10 of ITER, e.g. every 1000 on a 10000 iter |
| **cfg.DATALOADER.NUM_WORKERS** | Number of workers for dataloader | usually correspond to cpu cores |
| **cfg.MODEL.WEIGHTS** | model_zoo.get_checkpoint_url("") | Optional but advised to start from a pretrained model from the model zoo, MUST be of the same architecture of the get_config_file. see default values as example. |
| **cfg.SOLVER.IMS_PER_BATCH** | How many image to be ingested, depends on the performance of the GPU, especiall VRAM |  up to 8 for 8GB VRAM |
| **cfg.SOLVER.BASE_LR** | learning rate | 0.0002 is a good starting point |
| **cfg.SOLVER.MAX_ITER** | N° of epochs | Rise up for low mAP, lower to prevent overfitting |
| **cfg.MODEL.ROI_HEADS.BATCH_SIZE_PER_IMAGE** | parameter to sample a subset of proposals coming out of RPN to calculate cls and reg loss during training. | multiple of 2, commonly 64 |
------------------------------------------------------------------
Then just execute the notebook and monitor the training in **Tensorboard** container.

## Funding
*This study is within the Europlanet 2024 RI and EXPLORE project, and it has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 871149 and No 101004214.*

------------------------------------------------------------------

In [None]:
import cv2
import detectron2
from detectron2 import model_zoo
from detectron2.config import get_cfg
from detectron2.utils.visualizer import Visualizer
%matplotlib inline
from matplotlib import pyplot as plt
import os
import random
import torch
from utils.detectron_utils import Trainer
from utils.train_utils import categories_gen, classes_distribution, dataframes_gen, dataMover, datasetReg
from detectron2.evaluation import COCOEvaluator

In [None]:
#image_path = '../data'

In [None]:
dataset, meta, classes , train_dir, image_path = datasetReg()

In [None]:
train_df_dis, valid_df_dis, test_df_dis, train, valid, test = dataframes_gen(classes, dataset)

**CONFIGS - edit befor run**

In [None]:
cfg = get_cfg()
model_config='mask_rcnn_R_50_FPN_3x.yaml'
cfg.merge_from_file(model_zoo.get_config_file("COCO-InstanceSegmentation/"+model_config))
#cfg.merge_from_file(model_zoo.get_config_file("COCO-InstanceSegmentation/mask_rcnn_R_101_FPN_3x.yaml"))
cfg.DATASETS.TRAIN = ('train_data',)
cfg.DATASETS.TEST = ('valid_data',)
cfg.TEST.EVAL_PERIOD = 250
cfg.DATALOADER.NUM_WORKERS = 6
cfg.MODEL.WEIGHTS = model_zoo.get_checkpoint_url("COCO-InstanceSegmentation/"+model_config)  # Let training initialize from model zoo
#cfg.MODEL.WEIGHTS = model_zoo.get_checkpoint_url("COCO-InstanceSegmentation/mask_rcnn_R_101_FPN_3x.yaml")  # Let training initialize from model zoo
cfg.SOLVER.IMS_PER_BATCH = 24
cfg.SOLVER.BASE_LR = 0.001
#cfg.SOLVER.STEPS=(1000,)
cfg.SOLVER.MAX_ITER = 5000 
cfg.MODEL.ROI_HEADS.BATCH_SIZE_PER_IMAGE =  128
cfg.MODEL.ROI_HEADS.NUM_CLASSES = len(classes)  
cfg.OUTPUT_DIR = train_dir
cfg.INPUT.MIN_SIZE_TEST = 500
cfg.INPUT.MAX_SIZE_TEST = 1000
cfg.SOLVER.AMP.ENABLED=True

**End of configs**

In [None]:
dataMover(image_path, train, valid, test)

In [None]:
label = classes
plt.figure(figsize = (10,5), facecolor='white',dpi=300)
plt.suptitle('Class-labels distributions', fontsize=15)
ax1 = plt.subplot(131)
train_df_dis.groupby(['Class']).count().plot(kind='pie', figsize=(10,1,0), autopct=lambda p:f'{p:.2f}%, \n{p*len(train_df_dis)/100:.0f} labels',startangle=90, subplots=True, ax =ax1, fontsize=5, legend=False)
plt.title('Train Dataset\n{} Labels'.format(len(train_df_dis), loc='center'))
ax2 = plt.subplot(132)
valid_df_dis.groupby(['Class']).count().plot(kind='pie', figsize=(10,10),autopct=lambda p:f'{p:.2f}%, \n{p*len(valid_df_dis)/100:.0f} labels',startangle=90, subplots=True, ax =ax2, fontsize=5,legend=False)
plt.title('Valid Dataset\n{} Labels'.format(len(valid_df_dis), loc='center', ))
ax3 = plt.subplot(133)
test_df_dis.groupby(['Class']).count().plot(kind='pie', figsize=(10,10),autopct=lambda p:f'{p:.2f}%, \n{p*len(test_df_dis)/100:.0f} labels',startangle=90, subplots=True, ax =ax3, fontsize=5,legend=False)
plt.title('Test Dataset\n{} Labels'.format(len(test_df_dis), loc='center', ))

In [None]:
for d in random.sample(train, 1):
    img_path = d["file_name"]
    print(img_path)
    img = cv2.imread(img_path)
    visualizer = Visualizer(img[:, :, 1:-1], metadata=meta, scale=2)
    out = visualizer.draw_dataset_dict(d)
    fig = plt.figure(figsize=(10,10))
    plt.imshow(out.get_image()[:, :, :])

In [None]:
# RUN
os.makedirs(cfg.OUTPUT_DIR, exist_ok=True)
trainer = Trainer(cfg)

In [None]:
trainer.resume_or_load(resume=False)
trainer.train()