We provide the config files for training on different backbones:
- ResNet-50, -101, -152
ResNet (CVPR'2016)
@inproceedings{he2016deep,
title={Deep residual learning for image recognition},
author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={770--778},
year={2016}
}
- EfficientNet
EfficientNet (ICML'2019)
@inproceedings{tan2019efficientnet,
title={Efficientnet: Rethinking model scaling for convolutional neural networks},
author={Tan, Mingxing and Le, Quoc},
booktitle={International Conference on Machine Learning},
pages={6105--6114},
year={2019},
organization={PMLR}
}
- HRNet-W32
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
title={Deep high-resolution representation learning for human pose estimation},
author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5693--5703},
year={2019}
}
- ResNext
ResNext (CVPR'2017)
@inproceedings{xie2017aggregated,
title={Aggregated residual transformations for deep neural networks},
author={Xie, Saining and Girshick, Ross and Doll{\'a}r, Piotr and Tu, Zhuowen and He, Kaiming},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={1492--1500},
year={2017}
}
- ViT
ViT (ICLR'2021)
@inproceedings{dosovitskiy2021an,
title={An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale},
author={Alexey Dosovitskiy and Lucas Beyer and Alexander Kolesnikov and Dirk Weissenborn and Xiaohua Zhai and Thomas Unterthiner and Mostafa Dehghani and Matthias Minderer and Georg Heigold and Sylvain Gelly and Jakob Uszkoreit and Neil Houlsby},
booktitle={International Conference on Learning Representations},
year={2021},
url={https://openreview.net/forum?id=YicbFdNTTy}
}
- Swin
Swin (ICCV'2021)
@inproceedings{liu2021Swin,
title={Swin Transformer: Hierarchical Vision Transformer using Shifted Windows},
author={Liu, Ze and Lin, Yutong and Cao, Yue and Hu, Han and Wei, Yixuan and Zhang, Zheng and Lin, Stephen and Guo, Baining},
booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
year={2021}
}
- Twins-PCVCT, -SVT
Twins (NeurIPS'2021)
@inproceedings{chu2021Twins,
title={Twins: Revisiting the Design of Spatial Attention in Vision Transformers},
author={Xiangxiang Chu and Zhi Tian and Yuqing Wang and Bo Zhang and Haibing Ren and Xiaolin Wei and Huaxia Xia and Chunhua Shen},
booktitle={NeurIPS 2021},
year={2021}
}
We evaluate HMR on 3DPW. Values are MPJPE/PA-MPJPE.
Backbones | Config | 3DPW |
---|---|---|
ResNet-50 | resnet50_hmr_pw3d.py | 112.46 / 64.55 |
ResNet-101 | resnet101_hmr_pw3d.py | 112.67 / 63.36 |
ResNet-152 | resnet152_hmr_pw3d.py | 107.13 / 62.13 |
ResNeXt-101 | resnext101_hmr_pw3d.py | 114.43 / 64.95 |
EfficientNet-B5 | efficientnet_hmr_pw3d.py | 112.34 / 67.53 |
HRNet-W32 | hrnet_hmr_pw3d.py | 118.15 / 65.16 |
ViT | vit_hmr_pw3d.py | 111.46 / 62.81 |
Swin | swin_hmr_pw3d.py | 110.42 / 62.78 |
Twins-PCPVT | twins_pcpvt_hmr_pw3d.py | 100.75 / 59.13 |
Twins-SVT | twins_svt_hmr_pw3d.py | 105.42 / 60.11 |