This is a baseline model developed for SARAS-ESAD 2020 challenge. To download the dataset and participate in the challenge, please register at SARAS-ESAD website.
The code is adopated from RetinaNet implementation in pytorch.1.x.
- Data preparation instructions for SARAS-ESAD 2020 challenge
- Dataloader for SARAS-ESAD dataset
- Pytorch1.X implementation
- Feature pyramid network (FPN) architecture with different ResNet backbones
- Three types of loss functions i.e OHEM Loss, Focal Loss, and YOLO Loss on top of FPN
Here, we implement basic data-handling tools for SARAS-ESAD dataset with FPN training process. We implement a pure pytorch code for train FPN with Focal-Loss or OHEM/multi-box-loss paper.
We hope this will help kick start more teams to get up to the speed and allow the time for more innovative solutions. We want to eliminate the pain of building data handling and training process from scratch. Our final aim is to get this repository the level of realtime-action-detection.
At the moment we support the latest pytorch and ubuntu with Anaconda distribution of python. Tested on a single machine with 2/4/8 GPUs.
You can found out about architecture and loss function on parent repository, i.e. RetinaNet implementation in pytorch.1.x.
ResNet is used as a backbone network (a) to build the pyramid features (b). Each classification (c) and regression (d) subnet is made of 4 convolutional layers and finally a convolutional layer to predict the class scores and bounding box coordinated respectively.
Similar to the original paper, we freeze the batch normalisation layers of ResNet based backbone networks. Also, few initial layers are also frozen, see fbn
flag in training arguments.
-
OHEM with multi-box loss function: We use multi-box loss function with online hard example mining (OHEM), similar to SSD. A huge thanks to Max DeGroot, Ellis Brown for Pytorch implementation of SSD and loss function.
-
Focal loss: Same as in the original paper we use sigmoid focal loss, see RetinaNet. We use pure pytorch implementation of it.
-
Yolo Loss: Multi-part loss function from YOLO is also implemented here.
You will need the following to run this code successfully
- Anaconda python
- Pytorch latest
- Visualisation
- if you want to visualise set tensorboard flag equal to true while training
- TensorboardX
- Tensorflow for tensorboard
-
Please visit SARAS-ESAD website to download the dataset for surgeon action detection.
-
Extract all the sets (train and val) from zip files and put them under a single directory. Provide the path of that directory as data_root in train file. Data preprocessing and feeding pipeline is in detectionDatasets.py file.
-
rename the data directory
esad
. -
Your directory will look like
- esad
- train
- set1
- file.txt
- file.jpg
- ..
- set1
- val
- obj
- file.txt
- file.jpg
- ..
- obj
- train
- esad
-
Now your dataset is ready, that is time to download imagenet pretrained weights for ResNet backbone models.
-
Weights are initialised with imagenet pretrained models, specify the path of pre-saved models,
model_dir
intrain.py
. Download them from torchvision models. After you have download weights, please rename then appropriately undermodel_dir
e.g. resnet50 resen101 etc. from This is a requirement of the training process.
Once you have pre-processed the dataset, then you are ready to train your networks. We must have the following arguments set correctly:
data_root
is base path uptoesad
directory e.g.\home\gurkirt\
save_root
is a base path where you want to store the checkpoints, training logs, tensorboard logs etc.model_dir
is a path where ResNet backbone model weights are stored
To train run the following command.
python train.py --loss_type=mbox --data_root=\home\gurkirt\ --tensoboard=true
It will use all the visible GPUs.
You can append CUDA_VISIBLE_DEVICES=<gpuids-comma-separated>
at the beginning of the above command to mask certain GPUs. We used 2 GPU machine to run our experiments.
Please check the arguments in train.py
to adjust the training process to your liking.
Model is evaluated and saved after each 1000
iterations.
mAP@0.25 is computed after every 500
iterations and at the end. You can change to your liking by specify it in train.py
arguments.
You can evaluate and save the results in text
file using evaluate.py
. It follow the same arguments train.py
.
By default it evaluate using the model store at max_iters
, but you can change it any other snapshot/checkpoint.
python evaluate.py --loss_type=focal
This will dump a log file with results(mAP) on the validation set and as well as a submission file.
Here are the results on esad
dataset.
Results of the baseline models with different loss function and input image sizes, where backbone network fixed to ResNet50. AP_10, AP_30, AP_50, and AP_mean are presented on validation-set, while Test-AP_mean is computed based on test-set similar to AP_mean.
Loss | min dim | AP_10 | AP_30 | AP_50 | AP_MEAN | Test-AP_MEAN |
---|---|---|---|---|---|---|
Focal | 200 | 33.8 | 17.7 | 6.6 | 19.4 | 15.7 |
Focal | 400 | 35.9 | 19.4 | 8.0 | 21.1 | 16.1 |
Focal | 600 | 29.2 | 17.6 | 8.7 | 18.5 | 14.0 |
Focal | 800 | 31.9 | 20.1 | 8.7 | 20.2 | 12.4 |
OHEM | 200 | 35.1 | 18.7 | 6.3 | 20.0 | 11.3 |
OHEM | 400 | 33.9 | 19.2 | 7.4 | 20.2 | 13.6 |
OHEM | 600 | 37.6 | 23.4 | 11.2 | 24.1 | 12.5 |
OHEM | 800 | 36.8 | 24.3 | 12.2 | 24.4 | 12.3 |
Results of the baseline models with different loss function, backbone networks, where input image size is fixed to 400. AP_10, AP_30, AP_50, and AP_mean are presented on validation-set, while Test-AP_mean is computed based on test-set similar to AP_mean.
Loss | Network | AP_10 | AP_30 | AP_50 | AP_MEAN | Test-AP_MEAN |
---|---|---|---|---|---|---|
Focal | ResNet18 | 35.1 | 18.9 | 8.1 | 20.7 | 15.3 |
OHEM | ResNet18 | 36.0 | 20.7 | 7.7 | 21.5 | 13.8 |
Focal | ResNet34 | 34.6 | 18.9 | 6.4 | 19.9 | 14.3 |
OHEM | ResNet34 | 36.7 | 20.4 | 7.1 | 21.4 | 13.8 |
Focal | ResNet50 | 35.9 | 19.4 | 8.0 | 21.1 | 16.1 |
OHEM | ResNet50 | 33.9 | 19.2 | 7.4 | 20.2 | 13.6 |
Focal | ResNet101 | 32.5 | 17.2 | 6.1 | 18.6 | 14.0 |
OHEM | ResNet101 | 36.6 | 20.1 | 7.4 | 21.3 | 12.3 |
Outputs from the lastest model (800 OHEM) are uploaded in the sample folder. These are generated using the same model (800 OHEM). See flag at line evaluate.py 114 to select validation or testing set (which will be available on 10th June).
- Input image size (
height x width
)is600x1067
or800x1422
. - Batch size is set to
16
, the learning rate of0.01
. - Weights for initial layers are frozen see
freezeupto
flag intrain.py
- max number of iterations is set to 6000
- SGD is used for the optimisation
- initial learning rate is set to
0.01
- learning rate is dropped by the factor of 10 after 5000 iterations
- Different training setting might result in better/same/worse performance