Skip to content

Latest commit

 

History

History
67 lines (48 loc) · 4.21 KB

EVALUATE.md

File metadata and controls

67 lines (48 loc) · 4.21 KB

Single-view Textured Human Reconstruction with Image-Conditioned Diffusion

Evaluation Benchmark

We created an evaluation benchmark using the CustomHumans dataset. Please apply the dataset directly and you will find the necessary files in the download link.

Note that we trained our models with 526 human scans provided in the THuman2.0 dataset and tested on 60 scans in the CustomHumans dataset. We used the default hyperparameters and commands suggested in run.sh. The evaluation script can be found here and here. You will need to install two additional packages for evaluation:

pip install torchmetrics[image] mediapipe

Single-view human 3D reconstruction benchmark

Methods P-to-S (cm) ↓ S-to-P (cm) ↓ NC ↑ f-Score ↑
PIFu [Saito2019] 2.209 2.582 0.805 34.881
PIFuHD[Saito2020] 2.107 2.228 0.804 39.076
PaMIR [Zheng2021] 2.181 2.507 0.813 35.847
FOF [Feng2022] 2.079 2.644 0.808 36.013
2K2K [Han2023] 2.488 3.292 0.796 30.186
ICON* [Xiu2022] 2.256 2.795 0.791 30.437
ECON* [Xiu2023] 2.483 2.680 0.797 30.894
SiTH* (Ours) 1.871 2.045 0.826 37.029
  • *indicates methods trained on the same THuman2.0 dataset.

Back-view hallucination benchmark

Methods SSIM ↑ LPIPS↓ KID(×10^−3^) ↓ Joints Err. (pixel) ↓
Pix2PixHD [Wang2018] 0.816 0.141 86.2 53.1
DreamPose [Karras2023] 0.844 0.132 86.7 76.7
Zero-1-to-3 [Liu2023] 0.862 0.119 30.0 73.4
ControlNet [Zhang2023] 0.851 0.202 39.0 35.7
SiTH (Ours) 0.950 0.063 3.2 21.5

How to evaluate your method?

1. Single-view human 3D reconstruction benchmark:

The input to your methods is images_60 and the ground truth is gt_meshes_60. Please note that you cannot use the ground-truth SMPL-X meshes as inputs to your method. Your method should predict the SMPL-X meshes from the input images only. To avoid scale and depth ambiguity, we use ICP to align the predicted meshes to the ground-truth meshes. You can use the evaluation script by running the following command:

python tools/evaluate.py -i /path/to/your/results -g /path/to/ground-truth

2. Back-view hallucination benchmark

The input to your methods is images_60 and the ground truth is gt_back_60. Note that to consider the stochastic nature of generative models, we generate 16 samples per input image.

python tools/evaluate_image.py -i /path/to/your/results -g /path/to/ground-truth --mask

How to properly run SiTH with your own data?

  1. Please note that SiTH uses an orthogonal coordinate system as other methods in the baseline table. If you want to compute 3D metrics, you need to render images with an orthographic camera.
  2. The image resolution for 3D reconstruction and SMPL-X fitting is 1024x1024. Only the resolution for the diffusion hallucination is 512x512. By default, you should prepare an image with a 1024x1024 resolution, and the config file will handle everything for you.
  3. SiTH is a SMPL-centric pipeline and does not handle special cases like holding an object or children.