Object detection and classification

We apply two different models for the detection of cloud types from the Understanding Clouds from Satellite Images Kaggle competition:

  • We modify the architecture from the object detection example in a pytorch tutorial by adding an evaluation function for the test data and by using a better data augmentation layer.

  • We use a model from the segmentation models library (preferred approach)

Quick start

Set environment variables in .env:

source .env

Setup the environment:

pip install -r requirements.txt
python install

Generate a dummy test.csv that is required by the dataloader.

python exec/

Train one of the three training models (exec/, exec/, exec/

python exec/ \
        --size_tr_val 20 \
        --size_val 8 \
        --batch_size 2 \
        --print_freq 2 \
        --num_epochs 3 \
        --seed 1

To make a prediction we have created a dummy test.csv file that has the same structure as the train.csv file. It is created in order to use the same dataloader for training and for making predictions. To make a prediction use exec/, exec/ or exec/

python exec/ \
        --nrows 10 \
        --load_epoch 2
Faster RCNN Architecture

The examples (, use a Faster RCNN with a pretrained Resnet50 as a backbone. The model detects objects, masks and bounding boxes. Since the training data provided in the Kaggle competition contains only masks of four different object types and no bounding boxes we have used an algorithm that detects non-connected regions from the masks and assigns to each of them a bounding box (this just seemed easier than modifying the loss function of the model).

The original model has several weaknesses:

  • The loss function depends on the bounding box loss which is irrelevant for the current task (WIP).
  • We use only random horizontal and vertical image flips to augment the data.
  • The default evaluate() function can not run on a GPU. We do not know if the model is overfitting.

1 Add evaluate() function for the test data

The default forward function of the used classes has different output that depends on whether the model is in train or eval mode. In eval mode the losses are not calculated. In the current implementation we have derived new classes from:

  • torchvision.models.detection.rpn.RegionProposalNetwork
  • torchvision.models.detection.roi_heads.RoIHeads
  • torchvision.models.detection.generalized_rcnn.GeneralizedRCNN

which have a modified forward() method with an additional argument return_loss=False that allows to return the losses in eval mode. Look at /clouds/ for the new class definitions.

This function is used in both and

2 Better data augmentation

We have used the albumeration library for data augmentation. The library takes care of all transformations of the masks and bounding boxes. The Dataset defined in /clouds/ was modified to take into account data transformations from this library.

Bounding box formats:

  • coco: [x_min, y_min, width, height]
  • pascal_voc: [x_min, y_min, x_max, y_max]

The data augmentation is used only in

Segmentation model Architecture (preferred)

The example makes use of the segmentation models library. In this case we only work with different the U-shaped convolutional neural networks which previously were representing only the backbone of the RCNN, i.e. no region proposal network, no RoI heads for bounding boxes and masks.


