we propose a human pose estimation framework with refinements at feature and semantic levels. We align auxiliary features with the features of current frame to reduce the loss caused by different feature distributions. Then, an attention mechanism is used to fuse auxiliary features with current features. In term of semantic, we utilize the difference information between adjacent heatmaps as auxiliary features to refine the current heatmaps.
The code is developed using python 3.7, pytorch-11.1, and CUDA 11.3 on Ubuntu 22.04. For our experiments, we used 1 NVIDIA 3090 GPUs.
Pretrained models
Related file
Please refer to Data preparation
Thanks for the baselines, we construct the code based on them: