Error in computing PCK_abs #7

fabienbaradel · 2021-08-19T12:45:53Z

Hi,
Congratulations for your work and thanks for releasing your code.
I think that there is a bug in your evaluation pipeline or I am missing something.
When you are computing the PCK_abs, you are first doing a rigid alignement of the predicted pose and the ground-truth pose and then you are setting the predicted root location to the ground-truth root location.
You should not do that and directly compute the euclidean distance for each joint between pred_pose and gt_pose which are expressed in camera-coordinate system.
Am I saying something wrong?
Best,

The text was updated successfully, but these errors were encountered:

3dpose · 2021-08-22T06:36:32Z

Thank you for your interest in our work.

As in line 86 in eval_mupots_pck_abs.py, you can see that we use the ratio of predicted root depth and ground-truth depth to infer the root camera coordinate. The proportion of predicted root and ground-truth root should be the same as this ratio and that's why we multiply this ratio.

In our method, we predict the person-centric 3D pose, and the depth of the root joint without using the camera parameters like other approaches. Thus, when doing camera coordinate evaluation, we need to shift the person-centric 3D pose with the predicted root joint in camera coordinate, which is implemented in line 86 in eval_mupots_pck_abs.py. It is a little bit confusing because we have:
predP = predP + gt_p3d[k][:,14:15] * ratio
where gt_p3d[k][:,14:15] is from GT so you may think we use the GT to replace our predicted joint. This is not true. The reason we do this is because without using camera parameters we only predict the Z coordinate of root joint, but not X and Y. So we get the ratio using the predicted Z to the Z of GT, and then multiply the ratio to gt_p3d[k][:,14:15], which means that we scale the X, Y coordinate based on errors we have in our predicted Z.

In short, to have a fair comparison, we multiply the ratio thus to include the error of our predicted root joint in camera coordinate. Hope this clarifies your question.

fabienbaradel · 2021-08-24T08:01:49Z

Thanks for explanations.
I now understand better the part regarding the camera-coordinate evaluation. It makes sense.

However you did not answer my question regarding the rigid alignment that you are applying on the predicted pose. The standard MuPots evaluation is normalizing the predicted pose by the ground-truth bones length but you commented this line in your code. Instead you are doing a procruste alignment and this boost the performance by a big margin. Concurrent methods are not doing this procrustes alignement but they only do the normalisation by the bone length.
Could you clarify this part? If possible can you send me references where other papers are doing the same postprocessing as you?
Thanks a lot.

3dpose · 2021-08-29T06:42:27Z

Procrustes analysis (PA) was used to adapt the dataset and keypoint definition differences in our case. As explained in the paper, our method requires videos (image sequence) as input for training, we have to train our models on Human3.6M (video dataset), but cannot use the training dataset of MuPoTS, which is MuCo-3DHP (image dataset). Other approaches do not have such an issue as they take individual images instead of videos as input, so MuCo-3DHP can be used for training. Due to the gap between the two training data sets in terms of camera settings and keypoints' definition, a model trained on Human3.6M tends to display certain biases (rotation and difference in keypoints' definition) when testing is performed on MuPoTS, so PA was used to mitigate the dataset gap. Consider the above explained strong negative impact of training on one dataset and evaluate on the other, using PA as a post processing to mitigate the dataset gap is a natural choice.

In fact, the performance of our method does not depend on using PA. To better demonstrate the performance of our method, we've updated the code and models in the repo by using the bone length normalization with dataset adaptation. In particular, as mentioned in our paper (page 7 and 15), for dataset gap like keypoints' definition, a joint adaptation network [47] can be used. We've added an adaptation network to mitigate the gap between Human3.6M and MuCo-3DHP. With the dataset adaptation (see example result here), we comment out the line of PA and use bone length normalization in both PCK and PCK_abs evaluation, and achieve 0.89 in PCK and 0.48 in PCK_abs, which are the same as the ones using PA as post processing (minor difference in the third decimal digit and after).

3dpose · 2021-10-28T01:50:39Z

@nicolasugrinovic thanks for your interest in our work, since your question is not closely related to this issue, we'll reply your question in a different thread (#12 ).

3dpose closed this as completed Oct 28, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error in computing PCK_abs #7

Error in computing PCK_abs #7

fabienbaradel commented Aug 19, 2021

3dpose commented Aug 22, 2021

fabienbaradel commented Aug 24, 2021

3dpose commented Aug 29, 2021

3dpose commented Oct 28, 2021 •

edited

Error in computing PCK_abs #7

Error in computing PCK_abs #7

Comments

fabienbaradel commented Aug 19, 2021

3dpose commented Aug 22, 2021

fabienbaradel commented Aug 24, 2021

3dpose commented Aug 29, 2021

3dpose commented Oct 28, 2021 • edited

3dpose commented Oct 28, 2021 •

edited