This repo is the implementation of the following paper:
OS2D: One-Stage One-Shot Object Detection by Matching Anchor Features
Anton Osokin, Denis Sumin, Vasily Lomakin
In proceedings of the European Conference on Computer Vision (ECCV), 2020
If you use our ideas, code or data, please, cite our paper (available on arXiv).
Citation in bibtex
@inproceedings{osokin20os2d,
title = {{OS2D}: One-Stage One-Shot Object Detection by Matching Anchor Features},
author = {Anton Osokin and Denis Sumin and Vasily Lomakin},
booktitle = {proceedings of the European Conference on Computer Vision (ECCV)},
year = {2020} }
This software is released under the MIT license, which means that you can use the code in any way you want.
- python >= 3.7
- pytorch >= 1.4, torchvision >=0.5
- NVIDIA GPU, tested with V100 and GTX 1080 Ti
- Installed CUDA, tested with v10.0
See INSTALL.md for the package installation.
See our demo-notebook for an illustration of our method.
See our demo-API-notebook for an illustration of deploying the method in a Docker Container.
- Grozi-3.2k dataset with our annotation (0.5GB): download from Google Drive or with the magic command and unpack to $OS2D_ROOT/data
cd $OS2D_ROOT
./os2d/utils/wget_gdrive.sh data/grozi.zip 1Fx9lvmjthe3aOqjvKc6MJpMuLF22I1Hp
unzip data/grozi.zip -d data
- Extra test sets of retail products (0.1GB): download from Google Drive or with the magic command and unpack to $OS2D_ROOT/data
cd $OS2D_ROOT
./os2d/utils/wget_gdrive.sh data/retail_test_sets.zip 1Vp8sm9zBOdshYvND9EPuYIu0O9Yo346J
unzip data/retail_test_sets.zip -d data
- INSTRE datasets (2.3GB) are re-hosted in Center for Machine Perception in Prague (thanks to Ahmet Iscen!):
cd $OS2D_ROOT
wget ftp://ftp.irisa.fr/local/texmex/corpus/instre/gnd_instre.mat -P data/instre # 200KB
wget ftp://ftp.irisa.fr/local/texmex/corpus/instre/instre.tar.gz -P data/instre # 2.3GB
tar -xzf data/instre/instre.tar.gz -C data/instre
- If you want to add your own dataset you should create an instance of the
DatasetOneShotDetection
class and then pass it into the functions creating dataloadersbuild_train_dataloader_from_config
orbuild_eval_dataloaders_from_cfg
from os2d/data/dataloader.py. See os2d/data/dataset.py for docs and examples.
We release three pretrained models:
Name | mAP on "grozi-val-new-cl" | link |
---|---|---|
OS2D V2-train | 90.65 | Google Drive |
OS2D V1-train | 88.71 | Google Drive |
OS2D V2-init | 86.07 | Google Drive |
The results (mAP on "grozi-val-new-cl") can be computed with the commands given below.
You can download the released datasets with the magic commands:
cd $OS2D_ROOT
./os2d/utils/wget_gdrive.sh models/os2d_v2-train.pth 1l_aanrxHj14d_QkCpein8wFmainNAzo8
./os2d/utils/wget_gdrive.sh models/os2d_v1-train.pth 1ByDRHMt1x5Ghvy7YTYmQjmus9bQkvJ8g
./os2d/utils/wget_gdrive.sh models/os2d_v2-init.pth 1sr9UX45kiEcmBeKHdlX7rZTSA4Mgt0A7
- OS2D V2-train (best model)
For a fast eval on a validation set, one can do use a single scale of images with this script (will give 85.58 mAP on the validation set "grozi-val-new-cl"):
cd $OS2D_ROOT
python main.py --config-file experiments/config_training.yml model.use_inverse_geom_model True model.use_simplified_affine_model False model.backbone_arch ResNet50 train.do_training False eval.dataset_names "[\"grozi-val-new-cl\"]" eval.dataset_scales "[1280.0]" init.model models/os2d_v2-train.pth eval.scales_of_image_pyramid "[1.0]"
Multiscale evaluation gives better results - scripts below use the default setting with 7 scales: 0.5, 0.625, 0.8, 1, 1.2, 1.4, 1.6. Note that this evaluation can be slower because of the multiple scale and a lot of classes in the dataset.
To evaluate on the validation set with multiple scales, run:
cd $OS2D_ROOT
python main.py --config-file experiments/config_training.yml model.use_inverse_geom_model True model.use_simplified_affine_model False model.backbone_arch ResNet50 train.do_training False eval.dataset_names "[\"grozi-val-new-cl\"]" eval.dataset_scales "[1280.0]" init.model models/os2d_v2-train.pth
- OS2D V1-train
To evaluate on the validation set run:
cd $OS2D_ROOT
python main.py --config-file experiments/config_training.yml model.use_inverse_geom_model False model.use_simplified_affine_model True model.backbone_arch ResNet101 train.do_training False eval.dataset_names "[\"grozi-val-new-cl\"]" eval.dataset_scales "[1280.0]" init.model models/os2d_v1-train.pth
- OS2D V2-init
To evaluate on the validation set run:
cd $OS2D_ROOT
python main.py --config-file experiments/config_training.yml model.use_inverse_geom_model True model.use_simplified_affine_model False model.backbone_arch ResNet50 train.do_training False eval.dataset_names "[\"grozi-val-new-cl\"]" eval.dataset_scales "[1280.0]" init.model models/os2d_v2-init.pth
In this project, we do not train models from scratch but start from some pretrained models. For instructions how to get them, see models/README.md.
Our V2-train model on the Grozi-3.2k dataset was trained using this command:
cd $OS2D_ROOT
python main.py --config-file experiments/config_training.yml model.use_inverse_geom_model True model.use_simplified_affine_model False train.objective.loc_weight 0.0 train.model.freeze_bn_transform True model.backbone_arch ResNet50 init.model models/imagenet-caffe-resnet50-features-ac468af-renamed.pth init.transform models/weakalign_resnet101_affine_tps.pth.tar train.mining.do_mining True output.path output/os2d_v2-train
Dut to hard patch mining, this process is quite slow. Without it, training is faster, but produces slightly worse results:
cd $OS2D_ROOT
python main.py --config-file experiments/config_training.yml model.use_inverse_geom_model True model.use_simplified_affine_model False train.objective.loc_weight 0.0 train.model.freeze_bn_transform True model.backbone_arch ResNet50 init.model models/imagenet-caffe-resnet50-features-ac468af-renamed.pth init.transform models/weakalign_resnet101_affine_tps.pth.tar train.mining.do_mining False output.path output/os2d_v2-train-nomining
For the V1-train model, we used this command:
cd $OS2D_ROOT
python main.py --config-file experiments/config_training.yml model.use_inverse_geom_model False model.use_simplified_affine_model True train.objective.loc_weight 0.2 train.model.freeze_bn_transform False model.backbone_arch ResNet101 init.model models/gl18-tl-resnet101-gem-w-a4d43db-converted.pth train.mining.do_mining False output.path output/os2d_v1-train
Note that these runs need a lot of RAM due to caching of the whole training set. If this does not work for you you can use parameters train.cache_images False
, which will load images on the fly, but can be slow. Also note that several first iterations of training can be slow bacause of "warming up", i.e., computing the grids of anchors in Os2dBoxCoder. Those computations are cached, so everyhitng will eventually run faster.
For the rest of the training scripts see below.
All the experiments ob this project were run with our job helper. For each experiment, one program an experiment structure (in python) and calls several technical function provided by the launcher. See, e.g., this file for an example.
The launch happens as follows:
# add OS2D_ROOT to the python path - can be done, e.g., as follows
export PYTHONPATH=$OS2D_ROOT:$PYTHONPATH
# call the experiment script
python ./experiments/launcher_exp1.py LIST_OF_LAUNCHER_FLAGS
Extra parameters in LIST_OF_LAUNCHER_FLAGS
are parsed by the launcher and contain some useful options about the launch:
--no-launch
allows to prepare all the scripts of the experiment without the actual launch.--slurm
allows to prepare SLURM jobs and launches (if the is no--no-launch
) with sbatch.--stdout-file
and--stderr-file
- files where to save stdout and stderr, respectively (relative to the log_path defined in the experiment description).- For many SLURM related parameters, see the launcher.
Our experiments can be found here:
- Experiments with OS2D
- Experiments with the detector-retrieval baseline
- Experiments with the CoAE baseline
- Experiments on the ImageNet dataset
We have added two baselines in this repo:
- Class-agnostic detector + image retrieval system: see README for details.
- Co-Attention and Co-Excitation, CoAE (original code, paper): see README for details.
We would like to personally thank Ignacio Rocco, Relja Arandjelović, Andrei Bursuc, Irina Saparina and Ekaterina Glazkova for amazing discussions and insightful comments without which this project would not be possible.
This research was partly supported by Samsung Research, Samsung Electronics, by the Russian Science Foundation grant 19-71-00082 and through computational resources of HPC facilities at NRU HSE.
This software was largely inspired by a number of great repos: weakalign, cnnimageretrieval-pytorch, torchcv, maskrcnn-benchmark. Special thanks goes to the amazing PyTorch.