This repository is the official implementation of the paper "[Unsupervised 3D Human Pose Estimation via Conditional Multi-view Ancestral Sampling]", which has been accepted to FG 2026.
- [April 2026]: Paper accepted to FG 2026!
- [Coming Soon]: We are currently cleaning up the code.
To set up the environment, we recommend using Conda. Run the following commands to create a dedicated environment and install the required dependencies:
# Create a new conda environment
conda create -n cmas python=3.9
conda activate cmas
# Install dependencies
pip install torch torchvision torchaudio
pip install -r requirements.txt(Note: Please ensure you have the appropriate CUDA version installed for PyTorch.)
To evaluate the model, download the 3DYoga90 dataset.
- 3DYoga90 Dataset: https://github.com/seonokkim/3DYoga90
The 3DYoga90 dataset provides 3D ground truth (GT) poses and video data. To run our implementation, you need to extract 2D poses from the video data using AlphaPose and save the resulting pose data as .npy files.
We provide a pretrained diffusion model finetuned for the Yoga dataset.
After downloading, extract the archive into the save/yoga_diffusion_model folder.
To perform 3D Human Pose Estimation, place one of the 2D pose files (.npy) obtained from the 3DYoga90 dataset into the dataset/nba/motions directory, then run the following command:
python -m sample.mas \
--model_path save/yoga_diffusion_model/checkpoint_200000.pth \
--num_samples 1 \
--seed 75 \
--overwrite \
--output_dir results \
--use_data \
--show_input_motions \
--num_views 7 \
--input_iterations 100For more details on available flags and advanced configurations, please refer to the documentation of the original MAS repository.
We built our codebase upon the MAS (Multi-view Ancestral Sampling) repository developed by Roy Kapon et al. We sincerely thank the original authors.
If you find our work useful for your research, please consider citing our FG 2026 paper: