Multi-view Hand Reconstruction with a Point-Embedded Transformer

Lixin Yang · Licheng Zhong · Pengxiang Zhu · Xinyu Zhan · Junxiao Kong . Jian Xu . Cewu Lu

POEM is a generalizable multi-view hand mesh reconstruction (HMR) model designed for practical use in real-world hand motion capture scenerios. It embeds a static basis point within the multi-view stereo space to serve as medium for fusing features across different views. To infer accurate 3D hand mesh from multi-view images, POEM introduce a point-embedded transformer decoder. By employing a combination of five large-scale multi-view datasets and sufficient data augmentation, POEM demonstrates superior generalization ability in real-world applications.

🕹️ Instructions

See docs/installation.md to setup the environment and install all the required packages.
See docs/datasets.md to download all the datasets and additional assets required.

🏃 Training and Evaluation

Available models

We provide four models with different configurations for training and evaluation. We have evaluated the models on multiple datasets.

set ${MODEL} as one in [small, medium, medium_MANO, large].
set ${DATASET} as one in [HO3D, DexYCB, Arctic, Interhand, Oakink, Freihand].

Download the pretrained checkpoints at 🔗ckpt_release and move the contents to ./checkpoints.

Command line arguments

-g, --gpu_id, visible GPUs for training, e.g. -g 0,1,2,3. evaluation only supports single GPU.
-w, --workers, num_workers in reading data, e.g. -w 4.
-p, --dist_master_port, port for distributed training, e.g. -p 60011, set different -p for different training processes.
-b, --batch_size, e.g. -b 32, default is specified in config file, but will be overwritten if -b is provided.
--cfg, config file for this experiment, e.g. --cfg config/release/train_${MODEL}.yaml.
--exp_id specify the name of experiment, e.g. --exp_id ${EXP_ID}. When --exp_id is provided, the code requires that no uncommitted change is remained in the git repo. Otherwise, it defaults to 'default' for training and 'eval_{cfg}' for evaluation. All results will be saved in exp/${EXP_ID}*{timestamp}.
--reload, specify the path to the checkpoint (.pth.tar) to be loaded.

Evaluation

Specify the ${PATH_TO_CKPT} to ./checkpoints/${MODEL}.pth.tar. Then, run the following command. Note that we essentially modify the config file in place to suit different configuration settings. view_min and view_max specify the range of views fed into the model. Use --draw option to render the results, note that it is incompatible with the computation of auc metric.

$ python scripts/eval_single.py --cfg config/release/eval_single.yaml
                                -g ${gpu_id}
                                --reload ${PATH_TO_CKPT}
                                --dataset ${DATASET}
                                --view_min ${MIN_VIEW}
                                --view_max ${MAX_VIEW}
                                --model ${MODEL}

The evaluation results will be saved at exp/${EXP_ID}_{timestamp}/evaluations.

Training

We have used the mixature of multiple datasets packed by webdataset for training. Excecute the following command to train a specific model on the provided dataset.

$ python scripts/train_ddp_wds.py --cfg config/release/train_${MODEL}.yaml -g 0,1,2,3 -w 4

Tensorboard

$ cd exp/${EXP_ID}_{timestamp}/runs/
$ tensorboard --logdir .

Checkpoint

All the checkpoints during training are saved at exp/${EXP_ID}_{timestamp}/checkpoints/, where ../checkpoints/checkpoint records the most recent checkpoint.

License

This code and model are available for non-commercial scientific research purposes as defined in the LICENSE file. By downloading and using the code and model you agree to the terms in the LICENSE.

Citation

@misc{yang2024multiviewhandreconstructionpointembedded,
      title={Multi-view Hand Reconstruction with a Point-Embedded Transformer}, 
      author={Lixin Yang and Licheng Zhong and Pengxiang Zhu and Xinyu Zhan and Junxiao Kong and Jian Xu and Cewu Lu},
      year={2024},
      eprint={2408.10581},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2408.10581}, 
}

For more questions, please contact Lixin Yang: siriusyang@sjtu.edu.cn

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
assets		assets
config		config
docs		docs
lib		lib
prepare		prepare
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
ddp_python		ddp_python
environment.yml		environment.yml
pyrightconfig.json		pyrightconfig.json
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multi-view Hand Reconstruction with a Point-Embedded Transformer

🕹️ Instructions

🏃 Training and Evaluation

Available models

Command line arguments

Evaluation

Training

Tensorboard

Checkpoint

License

Citation

About

Releases

Packages

Contributors 2

Languages

License

JubSteven/POEM-v2

Folders and files

Latest commit

History

Repository files navigation

Multi-view Hand Reconstruction with a Point-Embedded Transformer

🕹️ Instructions

🏃 Training and Evaluation

Available models

Command line arguments

Evaluation

Training

Tensorboard

Checkpoint

License

Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages