A Dual-Masked Auto-Encoder for Robust Motion Capture with Spatial-Temporal Skeletal Token Completion

This repository includes the source code for our ACM Multimedia 2022 paper on multi-view multi-person 3D pose estimation. The preprint version is available at arXiv (arXiv:2207.07381). The project webpage is provided here. The dataset presented in the paper is provided here. Please refer to this for more details.

Dependencies

The code is tested on Windows with

pytorch                   1.10.2
torchvision               0.11.3
CUDA                      11.3.1

We suggest using the virtual environment and an easy-to-use package/environment manager such as conda to maintain the project.

conda create -n dmaeMocap python=3.6
conda activate dmaeMocap
# install pytorch
conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch
# install the rest of the dependencies
pip install -r requirements.txt

Data preparation

Follow the instruction to prepare the necessary data:

Shelf: Download the bundle from here and unzip it to /data/shelf.
- The bundle consists of the pretrained model, multi-view RGB images, 2D pose detection results, camera matrices and 3D ground-truth. Except for the model, the rest of them are credited to 4D Associate Graph.
- The RGB images are decoded by ffmpeg with low quality. If you want to get high-quality images, please refer to Shelf webpage.
- The 2D pose detector is OpenPose.
- We arrange the data from txt to numpy. The code can be found at util/gizmo/shelf_makeup.py.

Or, generate 2D poses on your own. We provide the instruction at util/gizmo/data_makeup.

Data should be organized as follows:

ROOT/
    └── data/
        └── shelf/
            └── sequences/
                └── img_0/
                └── .../
                └── img_4/
            └── camera_params.npy
            └── checkpoint-best.pth
            └── shelf_eval_2d_detection_dict.npy
    └── ...

Inference

We provide the following script to reconstruct and complete 3D skeletons from multi-view RGB video sequences.

python inference.py

The configuration of triangulation can be found and modified at util/config.py. It can visualize the reconstruction results when self.snapshot_flag = True at Line 18. We set self.snapshot_flag = False as default.

You can use python inference.py --no-dmae to disable the motion completion from D-MAE, and use --snapshot to enable the snapshot.

Evaluate

python evaluate.py

Similar to Inference, the way to reconstruct and complete, the evaluation script is configured by util/config.py. In default, we visualize the inference results and the ground-truth at data/shelf/output/eval_snapshot directory. You can find the metrics in the command console as the output and also are saved at data/shelf/output/eval.log. If you want to evaluate the framework without D-MAE, you need to add --no-dmae to the end of the command line, i.e. python evaluate.py --no-dmae.

Overall, output data would be organized as follows:

ROOT/
    └── data/
        └── shelf/
            └── output/
                └── eval_snapshot/
                └── npy/
                └── eval.log
            └── ...
    └── ...

Train the D-MAE

In this short guide, we focus on HPE reconstruction and completion by the pretrained model. If you want to reproduce the results of the pretrained model, please refer to training/README.md.

Bibtex

If you use our code/models in your research, please cite our paper:

@inproceedings{jiang2022dmae,
  title={A Dual-Masked Auto-Encoder for Robust Motion Capture with Spatial-Temporal Skeletal Token Completion},
  author={Jiang, Junkun and Chen, Jie and Guo, Yike},
  booktitle={Proceedings of the 30th ACM international conference on Multimedia},
  year={2022}
}

Acknowledgement

Many thanks to the following open-source repositories for the help to develop D-MAE.

The reimplementation of ViT and MAE from vit-pytorch/vit_pytorch/mae and pengzhiliang/MAE-pytorch
The reimplementation of Fourier Embedding from facebookresearch/3detr
The fast and robust 2D HPE OpenPose
The multi-view multi-person 3D HPE systems
- 4D Association Graph, C++ and highly efficient
- MVPose, python and easy to deploy

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

assets

assets

core

core

data/shelf

data/shelf

model

model

training

training

util

util

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

evaluate.py

evaluate.py

inference.py

inference.py

requirements.txt

requirements.txt

Repository files navigation

A Dual-Masked Auto-Encoder for Robust Motion Capture with Spatial-Temporal Skeletal Token Completion

Dependencies

Data preparation

Inference

Evaluate

Train the D-MAE

Bibtex

Acknowledgement

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
assets		assets
core		core
data/shelf		data/shelf
model		model
training		training
util		util
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
evaluate.py		evaluate.py
inference.py		inference.py
requirements.txt		requirements.txt

License

bruinxiong/2022_MM_DMAE-Mocap

Folders and files

Latest commit

History

Repository files navigation

A Dual-Masked Auto-Encoder for Robust Motion Capture with Spatial-Temporal Skeletal Token Completion

Dependencies

Data preparation

Inference

Evaluate

Train the D-MAE

Bibtex

Acknowledgement

About

Resources

License

Stars

Watchers

Forks

Languages