Xiang Xu, Hanbyul Joo, Greg Mori, Manolis Savva.
D3D-HOI video dataset contains a total of 256 videos from 8 categories. Every frame is annotated with the object rotation, translation, size, and part motion in 3D. Below is the per-category breakdown.
Download D3D-HOI dataset
Download original videos
The structure of the D3D-HOI video dataset is as follows:
<dataset_folder>/
<class_name>/
<video_name>/
3d_info.txt
jointstate.txt
<frames>/
<image_name>.jpg
<gt_mask>/
<object_mask_name>.npy
<smplv2d>/
<smpl_2d_coord>.npy
<smplmesh>/
<smpl_3d_coord>.obj
<joints3d>/
<joints_3d_coord>.npy
The file 3d_info.txt
contains the object global rotation, global translation and real-world dimensions (in cm) at rest state. It also provide the ground-truth CAD model ID, interacting part ID, and the estimated camera focal lengths.
The file jointstate.txt
stores the ground-truth per-frame part motion. This is either in degree for revolute joint or in cm for primatic joint.
Original images are in the frame
folder. Per-frame object masks are in the gt_mask
folder. Estimated 2D SMPL vertice coordinates are in smplv2d
folder. Estimated 3D SMPL vertices (after orthographic projection) are in smplmesh
folder. Estimated 3D SMPL joints are in the joints3d
folder. We use pretrained model from EFT to estimate the SMPL parameters.
Due to legal issue, we can not directly re-distribute the post-processed data. We provide the 24 CAD ID together with the motion parameters used in our paper here. Please refer to the preprocess
folder on how to run the CAD process code for PartNet-Mobility models.
After post-process, the structure of the CAD model folder should be as follows:
<dataset_folder>/
<class_name>/
<SAPIEN_id>/
motion.json
motion.gif
<final>/
<part_mesh>.obj
The file motion.json
contains the ground-truth rotation or translation axis origin and direction (at canonical space). It also provides the motion range and motion type (revolute or prismatic). The commonly interacted object vertices are also stored in the contact
list.
The file motion.gif
provides visualization for all possible motions of the model.
Canonical object part meshes are stored in the final
folder.
- Python (tested on 3.8)
- PyTorch (tested on 1.7)
- PyTorch3D (tested on 0.3)
- EFT
- PartNet-Mobility Dataset
- Mesh-Fusion
We recommend using a conda environment:
conda create -n d3dhoi python=3.8
conda activate d3dhoi
pip install -r requirements.txt
Install the torch version that corresponds to your version of CUDA, eg for CUDA 11.0, use:
conda install -c pytorch pytorch=1.7.0 torchvision cudatoolkit=11.0
Install pytorch3d, use:
conda install -c conda-forge -c fvcore fvcore
conda install pytorch3d=0.3.0 -c pytorch3d
Compile the mesh-fusion libraries for running the preprocess code.
Refer to the optimization
folder on how to run the optimization and evaluation code. You can download our optimized results. The zip file also provides scripts for reproducing the results for each category.
More optimization results are available here.
You can also use code the in visualization
folder to explore the annotated dataset.
Our code is released under CC-BY-NC 4.0. See the LICENSE file. However, our code depends on other libraries, including SMPL, which each have their own respective licenses that must also be followed.
We use mesh-fusion to process PartNet-Mobility Dataset, and EFT to estimate the SMPL parameters.
If you use find this code helpful, please consider citing:
@article{xu2021d3dhoi,
title={D3D-HOI: Dynamic 3D Human-Object Interactions from Videos},
author={Xiang Xu and Hanbyul Joo and Greg Mori and Manolis Savva},
journal={arXiv preprint arXiv:2108.08420},
year={2021}
}