Skip to content

Latest commit

 

History

History
138 lines (107 loc) · 4.63 KB

README.md

File metadata and controls

138 lines (107 loc) · 4.63 KB

Backbones

Introduction

We provide the config files for training on different backbones:

  1. ResNet-50, -101, -152
ResNet (CVPR'2016)
@inproceedings{he2016deep,
  title={Deep residual learning for image recognition},
  author={He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian},
  booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
  pages={770--778},
  year={2016}
}
  1. EfficientNet
EfficientNet (ICML'2019)
@inproceedings{tan2019efficientnet,
  title={Efficientnet: Rethinking model scaling for convolutional neural networks},
  author={Tan, Mingxing and Le, Quoc},
  booktitle={International Conference on Machine Learning},
  pages={6105--6114},
  year={2019},
  organization={PMLR}
}
  1. HRNet-W32
HRNet (CVPR'2019)
@inproceedings{sun2019deep,
  title={Deep high-resolution representation learning for human pose estimation},
  author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
  booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
  pages={5693--5703},
  year={2019}
}
  1. ResNext
ResNext (CVPR'2017)
@inproceedings{xie2017aggregated,
  title={Aggregated residual transformations for deep neural networks},
  author={Xie, Saining and Girshick, Ross and Doll{\'a}r, Piotr and Tu, Zhuowen and He, Kaiming},
  booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
  pages={1492--1500},
  year={2017}
}
  1. ViT
ViT (ICLR'2021)
@inproceedings{dosovitskiy2021an,
  title={An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale},
  author={Alexey Dosovitskiy and Lucas Beyer and Alexander Kolesnikov and Dirk Weissenborn and Xiaohua Zhai and Thomas Unterthiner and Mostafa Dehghani and Matthias Minderer and Georg Heigold and Sylvain Gelly and Jakob Uszkoreit and Neil Houlsby},
  booktitle={International Conference on Learning Representations},
  year={2021},
  url={https://openreview.net/forum?id=YicbFdNTTy}
}
  1. Swin
Swin (ICCV'2021)
@inproceedings{liu2021Swin,
  title={Swin Transformer: Hierarchical Vision Transformer using Shifted Windows},
  author={Liu, Ze and Lin, Yutong and Cao, Yue and Hu, Han and Wei, Yixuan and Zhang, Zheng and Lin, Stephen and Guo, Baining},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
  year={2021}
}
  1. Twins-PCVCT, -SVT
Twins (NeurIPS'2021)
@inproceedings{chu2021Twins,
	title={Twins: Revisiting the Design of Spatial Attention in Vision Transformers},
	author={Xiangxiang Chu and Zhi Tian and Yuqing Wang and Bo Zhang and Haibing Ren and Xiaolin Wei and Huaxia Xia and Chunhua Shen},
	booktitle={NeurIPS 2021},
	year={2021}
}

Results and Models

We evaluate HMR on 3DPW. Values are MPJPE/PA-MPJPE.

Backbones Config 3DPW
ResNet-50 resnet50_hmr_pw3d.py 112.46 / 64.55
ResNet-101 resnet101_hmr_pw3d.py 112.67 / 63.36
ResNet-152 resnet152_hmr_pw3d.py 107.13 / 62.13
ResNeXt-101 resnext101_hmr_pw3d.py 114.43 / 64.95
EfficientNet-B5 efficientnet_hmr_pw3d.py 112.34 / 67.53
HRNet-W32 hrnet_hmr_pw3d.py 118.15 / 65.16
ViT vit_hmr_pw3d.py 111.46 / 62.81
Swin swin_hmr_pw3d.py 110.42 / 62.78
Twins-PCPVT twins_pcpvt_hmr_pw3d.py 100.75 / 59.13
Twins-SVT twins_svt_hmr_pw3d.py 105.42 / 60.11