Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zero recall value while evaluating on LMO dataset #93

Closed
supriya-gdptl opened this issue Dec 4, 2022 · 6 comments
Closed

Zero recall value while evaluating on LMO dataset #93

supriya-gdptl opened this issue Dec 4, 2022 · 6 comments

Comments

@supriya-gdptl
Copy link

Hello @wangg12

I tried to evaluate the GDR-Net model on LMO dataset using the pretrained models you shared on OneDrive.
I used following command to run the valuation:

python core/gdrn_modeling/main_gdrn.py --config-file configs/gdrn/lmo/a6_cPnP_AugAAETrunc_BG0.5_lmo_real_pbr0.1_40e.py \
 --num-gpus 1 \
--eval-only  \
--opts MODEL.WEIGHTS=output/gdrn/lmo/a6_cPnP_AugAAETrunc_BG0.5_lmo_real_pbr0.1_40e/gdrn_lmo_real_pbr.pth

However, it is showing zero recall values. Please see the screenshot below.
Could you please help?

Thank you,
Supriya
image

@wangg12
Copy link
Member

wangg12 commented Dec 4, 2022

Maybe you should check your full running log to see where the problem is.

@supriya-gdptl
Copy link
Author

supriya-gdptl commented Dec 4, 2022

The features from the backbone is a tensor of zeros. (On line 121 in GDRN.py). Because of this, all further steps output zero tensor.

features = 
tensor([[[[0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.],
          ...,
          [0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.]]]], device='cuda:0')

The log says all weights (backbone, pnp_net and rot_head) from the checkpoint are loaded correctly. Still the output of backbone is zero tensor.
Did you encounter such error before? Do you know what might be causing it?
See below the log

20221204_124725|fvcore.common.checkpoint@152: [Checkpointer] Loading from D:/research/data/gdrnet_data/gdrn/lmo/a6_cPnP_AugAAETrunc_BG0.5_lmo_real_blender_160e/model_final_wo_optim.pth ...
20221204_124728|d2.checkpoint.c2_model_loading@324: Following weights matched with model:
| Names in Model                        | Names in Checkpoint                                                                       | Shapes                         |
|:--------------------------------------|:------------------------------------------------------------------------------------------|:-------------------------------|
| backbone.bn1.*                        | backbone.bn1.{bias,num_batches_tracked,running_mean,running_var,weight}                   | (64,) () (64,) (64,) (64,)     |
| backbone.conv1.weight                 | backbone.conv1.weight                                                                     | (64, 3, 7, 7)                  |
| backbone.layer1.0.bn1.*               | backbone.layer1.0.bn1.{bias,num_batches_tracked,running_mean,running_var,weight}          | (64,) () (64,) (64,) (64,)     |
| backbone.layer1.0.bn2.*               | backbone.layer1.0.bn2.{bias,num_batches_tracked,running_mean,running_var,weight}          | (64,) () (64,) (64,) (64,)     |
| backbone.layer1.0.conv1.weight        | backbone.layer1.0.conv1.weight                                                            | (64, 64, 3, 3)                 |
| backbone.layer1.0.conv2.weight        | backbone.layer1.0.conv2.weight                                                            | (64, 64, 3, 3)                 |
| backbone.layer1.1.bn1.*               | backbone.layer1.1.bn1.{bias,num_batches_tracked,running_mean,running_var,weight}          | (64,) () (64,) (64,) (64,)     |
| backbone.layer1.1.bn2.*               | backbone.layer1.1.bn2.{bias,num_batches_tracked,running_mean,running_var,weight}          | (64,) () (64,) (64,) (64,)     |
| backbone.layer1.1.conv1.weight        | backbone.layer1.1.conv1.weight                                                            | (64, 64, 3, 3)                 |
....
| pnp_net.fc1.*                         | pnp_net.fc1.{bias,weight}                                                                 | (1024,) (1024,8192)            |
| pnp_net.fc2.*                         | pnp_net.fc2.{bias,weight}                                                                 | (256,) (256,1024)              |
| pnp_net.fc_r.*                        | pnp_net.fc_r.{bias,weight}                                                                | (6,) (6,256)                   |
| pnp_net.fc_t.*                        | pnp_net.fc_t.{bias,weight}                                                                | (3,) (3,256)                   |
.....
| rot_head_net.features.0.weight        | rot_head_net.features.0.weight                                                            | (512, 256, 3, 3)               |
| rot_head_net.features.1.*             | rot_head_net.features.1.{bias,num_batches_tracked,running_mean,running_var,weight}        | (256,) () (256,) (256,) (256,) |
| rot_head_net.features.10.weight       | rot_head_net.features.10.weight                                                           | (256, 256, 3, 3)               |

Thank you,
Supriya

@supriya-gdptl
Copy link
Author

Hi @wangg12,

Could you please tell which version of detectron2 you have used?
The detectron2 website link that you shared in README.md (link) is for detectron2 version 0.6.

Circled in red in the image below

image

@wangg12
Copy link
Member

wangg12 commented Dec 5, 2022

Yes. But I installed from source. It seems you were running on windows, could you run the code on Ubuntu?

@supriya-gdptl
Copy link
Author

Thank you for the suggestion @wangg12.

I figured out the issue.
The features from backbone were zero because the weights of backbone were zero.
The checkpoint was getting loaded correctly but for some unknown reason, Line 550 in gdrn_evaluator.py was resetting the weights to zero.

I resolved this issue by loading the checkpoint again after line 550.
I got the following result.

Could you please tell me what does each metric in the first column stand for, i.e. what does ad_2, rete_2, re_2, te_2, proj_2, re, te stand for?

image

Thank you,
Supriya

@wangg12
Copy link
Member

wangg12 commented Dec 5, 2022

Here https://github.com/THU-DA-6D-Pose-Group/GDR-Net/blob/main/core/gdrn_modeling/gdrn_custom_evaluator.py#L772 you can find what those metrics mean.

@wangg12 wangg12 closed this as completed Dec 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants