This is a reimplementation of the PyTorch implementation of V2V-PoseNet: Voxel-to-Voxel Prediction Network for Accurate 3D Hand and Human Pose Estimation from a Single Depth Map, created by @dragonbook, which is largely based on the author's torch7 implementation.
This repository provides:
- V2V-PoseNet core modules (model, voxelization, ..)
- An trained model on MSRA hand pose dataset, with about a ~11mm mean error.
Tested on a Windows 11 AMD 5950x Nvidia 3090 machine running:
- Python 3.9.9
- numpy 1.22.0
- open3d 0.14.1.0
- torch 1.10+cu113 (pytorch)
If you wish to convert the ITOP dataset for use in the model you will need the following:
- h5py 3.6.0
- Clone the repo
- Open a Python terminal in the root directory of the repo
- Run the following to install the dependencies
python3 install_requirements.py
- Install PyTorch (with CUDA) the install link for this on Windows 10/11 with a modern Nvidia GPU is as follows:
pip3 install torch==1.10.1+cu113 torchvision==0.11.2+cu113 torchaudio===0.10.1+cu113 -f https://download.pytorch.org/whl/cu113/torch_stable.html
For other installs use the config tool at pytorch.org to download it
The dataset and the centers can be found at:
- MSRA hand dataset - and extract to
/datasets/cvpr15_MSRAHandGestureDB
- Estimated centers - and extract to
/datasets/msra_center
The dataset is described in the paper Cascaded Hand Pose Regression, Xiao Sun, Yichen Wei, Shuang Liang, Xiaoou Tang and Jian Sun, CVPR 2015.
The estimated centers are provided by the original author's implementation.
For simplicity the centers are currently included in the repo.
- Open a python terminal in the root directory of the repo
- Run the following
python3 experiments/msra-subject3/main.py
- Let it run, after 15 epochs it will output to
/output/TIMESTAMP/
The output contains/checkpoint/
Contains the checkpoint files from each epochmodel.pt
The exported modelfit_res.txt
&test_res.txt
Used by the visualizer (see below)
A pre-trained model is included in /output/cvpr15_MSRAHandGestureDB/model.pt
The dataset and the centers can be found at:
- ITOP dataset - and extract to
/datasets/ITOP
- Estimated centers - and extract to
/datasets/ITOP_side_center
The dataset is described in the paper Towards Viewpoint Invariant 3D Human Pose Estimation, Albert Haque, Boya Peng, Zelun Luo, Alexandre Alahi, Serena Yeung, Li Fei-Fei, CVPR 2016.
The estimated centers are provided by the original author's implementation.
For simplicity the centers are currently included in the repo.
Your final /dataset/
folder should look like:
/datasets/ITOP
/datasets/ITOP/ITOP_side_test_labels.h5
/datasets/ITOP/ITOP_side_test_point_cloud.h5
/datasets/ITOP/ITOP_side_train_labels.h5
/datasets/ITOP/ITOP_side_train_point_cloud.h5
/datasets/ITOP_side_center
/datasets/ITOP_side_center/center_test.txt
/datasets/ITOP_side_center/center_train.txt
As the dataset is very large we preprocess it all into smaller files for each frame.
A helper file is provided, which is run using the following:
python3 datasets/itop_side_preprocess.py
This will generate a ~10GB directory at /datasets/ITOP_side_processed
- Open a python terminal in the root directory of the repo
- Run the following
python3 experiments/itop_side/main.py
- Let it run, after 15 epochs it will output to
/output/TIMESTAMP/
The output contains/checkpoint/
Contains the checkpoint files from each epochmodel.pt
The exported modelfit_res.txt
&test_res.txt
Used by the visualizer
A pre-trained model is included in /output/ITOP_side/model.pt
To see how well a model is trained, run the following:
python3 output/OUTPUT_FOLDER/accuracy_graph.py
for the 2 pre-trained models the command is as follows:
- MSRA
python3 output/cvpr15_MSRAHandGestureDB/accuracy_graph.py
- ITOP side
python3 output/ITOP_side/accuracy_graph.py
This will display 2 graphs which can be used to assess the accuracy of the model.
Some demonstration code of the ITOP side model is provided, and can be run using the following from the root directory of the repo:
python3 example/itop.py
This code pulls the test data from the ITOP side dataset and runs the model on them.
Moon, Gyeongsik, Ju Yong Chang, and Kyoung Mu Lee. "V2V-PoseNet: Voxel-to-Voxel Prediction Network for Accurate 3D Hand and Human Pose Estimation from a Single Depth Map." CVPR 2018. [arXiv]
@InProceedings{Moon_2018_CVPR_V2V-PoseNet,
author = {Moon, Gyeongsik and Chang, Juyong and Lee, Kyoung Mu},
title = {V2V-PoseNet: Voxel-to-Voxel Prediction Network for Accurate 3D Hand and Human Pose Estimation from a Single Depth Map},
booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2018}
}