Computing object skeletons in natural images is challenging, owing to large variations in object appearance and scale, and the complexity of handling background clutter. Many recent methods frame object skeleton detection as a binary pixel classification problem, which is similar in spirit to learning-based edge detection, as well as to semantic segmentation methods. In the present article, we depart from this strategy by training a CNN to predict a twodimensional vector field, which maps each scene point to a candidate skeleton pixel, in the spirit of flux-based skeletonization algorithms. This “image context flux” representation has two major advantages over previous approaches. First, it explicitly encodes the relative position of skeletal pixels to semantically meaningful entities, such as the image points in their spatial context, and hence also the implied object boundaries. Second, since the skeleton detection context is a region-based vector field, it is better able to cope with object parts of large width. We evaluate the proposed method on three benchmark datasets for skeleton detection and two for symmetry detection, achieving consistently superior performance over state-of-the-art methods.
The code and trained models of:
DeepFlux for Skeletons in the Wild, CVPR 2019 [Paper]
Please cite the related works in your publications if it helps your research:
title={DeepFlux for Skeletons in the Wild},
author={Wang, Yukang and Xu, Yongchao and Tsogkas, Stavros and Bai, Xiang and Dickinson, Sven and Siddiqi, Kaleem},
booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
Caffe and VGG-16 pretrained model [VGG_ILSVRC_16_layers.caffemodel]
Datasets: [SK-LARGE], [SYM-PASCAL]
OpenCV 3.4.3 (C++ or Python, optional)
cp Makefile.config.example Makefile.config
# adjust Makefile.config (for example, enable python layer)
make all -j16
# make sure to include $CAFFE_ROOT/python to your PYTHONPATH.
make pycaffe
Please refer to Caffe Installation to ensure other dependencies.
# download datasets and pretrained model then
mkdir data && mv [your_dataset_folder] data/
mkdir model && mv [your_pretrained_model] models/
# data augmentation
cd data/[your_dataset_folder]
matlab -nodisplay -r "run augmentation.m; exit"
# an example on SK-LARGE dataset
cd examples/DeepFlux/
python --gpu [your_gpu_id] --dataset sklarge --initmodel ../../models/VGG_ILSVRC_16_layers.caffemodel
For training an end-to-end version of DeepFlux, adding the --e2e
# an example on SK-LARGE dataset
cd evaluation/
# inference with C++
./ ../../data/SK-LARGE/images/test ../../data/SK-LARGE/groundTruth/test ../../models/sklarge_iter_40000.caffemodel
# inference with Python
./ ../../data/SK-LARGE/images/test ../../data/SK-LARGE/groundTruth/test ../../models/sklarge_iter_40000.caffemodel
# inference with Python (end-to-end version)
./ ../../data/SK-LARGE/images/test ../../data/SK-LARGE/groundTruth/test ../../models/sklarge_iter_40000.caffemodel
Backbone | F-measure | Comment & Link |
VGG-16 | 0.732 | CVPR submission [Google drive] |
VGG-16 | 0.735 | different_lr [Google drive] |
VGG-16 | 0.737 | end-to-end [Google drive] |
Backbone | F-measure | Comment & Link |
VGG-16 | 0.502 | CVPR submission [Google drive] |
VGG-16 | 0.558 | different_lr [Google drive] |
VGG-16 | 0.569 | end-to-end [Google drive] |
*different_lr means different learning rates for backbone and additional layers
*lambda=0.4, k1=3, k2=4 for all models with post-processing