Skip to content

Official implementation of Faster VoxelPose: Real-time 3D Human Pose Estimation by Orthographic Projection


Notifications You must be signed in to change notification settings


Repository files navigation

Faster VoxelPose: Real-time 3D Human Pose Estimation by Orthographic Projection
(ECCV 2022)

Python 3.8+ PyTorch arXiv PWC PWC PWC


This is the official implementation of our paper:

Faster VoxelPose: Real-time 3D Human Pose Estimation by Orthographic Projection,
Hang Ye, Wentao Zhu, Chunyu Wang, Rujie Wu, and Yizhou Wang
ECCV 2022

The overall framework of Faster-VoxelPose is presented below.


This project is developed using python 3.8, PyTorch 1.12.0, CUDA 11.3 (not necessary this version) on Ubuntu 16.04.

pip install torch==1.12.0+cu113 torchvision==0.13.0+cu113 --extra-index-url
pip install -r requirements.txt

Data Preparation

Download Dataset

Following VoxelPose, we use the CMU Panoptic, Shelf and Campus datasets in our experiments.

  1. We provide the scripts to download the datasets automatically:
bash scripts/
bash scripts/
bash scripts/
  1. Due to incomplete annotations of the Shelf/Campus datasets, we synthesize extra data to provide training supervision for our 3D pose estimator on these two datasets. The pose sequences come from the Panoptic dataset. You need to download it (Google drive) and put it under the data/ directory.

  2. Download the pretrained backbone model (ResNet-50 pretrained on COCO dataset and finetuned jointly on Panoptic dataset and MPII) for 2D heatmap estimation and place it under the backbone/ directory.

Note: As for the Shelf/Campus datasets, we directly test our model using 2D pose predictions from pre-trained Mask R-CNN on COCO Dataset. We've already included the annotations in the data/Campus and data/Shelf directory.

Preprocess Data

To generate 2D heatmap predictions, you need to resize the RGB images in the pre-processing step. You can run the following code to preprocess the dataset. The supported argument [DATASET_NAME] includes Panoptic, Shelf and Campus.

python --dataset [DATASET_NAME]

After downloading and pre-processing data, your directory tree should be like this:

|-- data
    |-- Panoptic
        |-- 16060224_haggling1
        |   |-- hdImgs
        |   |-- hdvideos
        |   |-- hdPose3d_stage1_coco19
        |   |-- calibration_160224_haggling1.json
        |-- 160226_haggling1  
        |-- ...
    |-- Shelf
    |   |-- Camera0
    |   |-- ...
    |   |-- Camera4
    |   |-- actorsGT.mat
    |   |-- calibration_shelf.json
    |   |-- pred_shelf_maskrcnn_hrnet_coco.pkl
    |-- Campus
    |   |-- Camera0
    |   |-- Camera1
    |   |-- Camera2
    |   |-- actorsGT.mat
    |   |-- calibration_campus.json
    |   |-- pred_campus_maskrcnn_hrnet_coco.pkl
    |-- panoptic_training_pose.pkl

Train & Eval


Every experiment is defined by config files. You can specify the path of the config file (e.g.configs/panoptic/jln64.yaml) and run the following code to start training the model. Note that we only support single-GPU training now.

python run/ --cfg [CONFIG_FILE]

Training on your own data

To train Faster-VoxelPose model on your own data, you need to follow the steps below:

  1. Implement the code to process your own dataset under the lib/dataset/ directory. You can refer to lib/dataset/ and rewrite the _get_db and _get_cam functions to take RGB images and camera params as input.

  2. Modify the config file based on configs/shelf/jln64.yaml. Remember to alter the TEST_HEATMAP_SRC attribute to image if no 2D predictions are given.

  3. Start training the model and visualize the evaluation results.


To evaluate the model, specify the path of the config file. By default, the model_best.pth.tar checkpoint under the corresponding working directory will be selected for evaluation. And the results will be printed on the screen.

python run/ --cfg [CONFIG_FILE]

Model Zoo

You can download our pre-trained checkpoint from Google Drive.

Dataset MPJPE AP25 AP50 AP100 AP150 Model weight Config
Panoptic 18.41 86.66 98.08 99.26 99.53 Google drive cfg
Dataset PCP3D Model weight Config
Shelf 97.6 Google drive cfg
Campus 96.9 Google drive cfg

Important Note: Our implementation is slightly different from the one proposed in the original paper. Through lots of experiments, considering the speed-performance tradeoffs, we remove the offset branch in HDN and retrain the models. We'll modify the paper and upload the final version on arXiv.


We also provide a demo demonstrating how to visualize results on your own sequences. Please refer to the ipynb file.


If you use our code or models in your research, please cite with:

    author={Ye, Hang and Zhu, Wentao and Wang, Chunyu and Wu, Rujie and Wang, Yizhou},
    title={Faster VoxelPose: Real-time 3D Human Pose Estimation by Orthographic Projection},
    booktitle = {European Conference on Computer Vision (ECCV)},
    year = {2022}


This repo is built on the excellent work VoxelPose. Thank the authors for releasing their codes.


Official implementation of Faster VoxelPose: Real-time 3D Human Pose Estimation by Orthographic Projection



Code of conduct

Security policy





No releases published