Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error running demo.py: a parameter group that doesn't match the size of optimizer's group #1

Closed
nicolasugrinovic opened this issue Jun 16, 2020 · 4 comments

Comments

@nicolasugrinovic
Copy link

nicolasugrinovic commented Jun 16, 2020

Hi, kudos for the great work! Thank you for sharing the code.

When trying to run demo.py with the command

python3 tools/demo.py --config=configs/smpl/tune.py --image_folder=demo_images/ --output_folder=results/ --ckpt data/checkpoint.pt

I get the following error:

FIle "/miniconda3/envs/multiperson/lib/python3.7/site-packages/mmcv/runner/runner.py", line 313, in resume
    self.optimizer.load_state_dict(checkpoint['optimizer'])
File "/miniconda3/envs/multiperson/lib/python3.7/site-packages/torch/optim/optimizer.py", line 115, in load_state_dict
    raise ValueError("loaded state dict contains a parameter group "
ValueError: loaded state dict contains a parameter group that doesn't match the size of optimizer's group

It seems to be triggered by something with the checkpoint and the model architecture. I also get the following message:


unexpected key in source state_dict: fc.weight, fc.bias
missing keys in source state_dict: layer3.0.bn2.num_batches_tracked, layer1.2.bn1.num_batches_tracked, layer2.0.bn2.num_batches_tracked, layer2.1.bn1.num_batches_tracked, layer3.0.bn3.num_batches_tracked, layer3.2.bn2.num_batches_tracked, layer1.2.bn2.num_batches_tracked, layer4.2.bn3.num_batches_tracked, layer4.2.bn2.num_batches_tracked, layer3.3.bn1.num_batches_tracked, layer2.1.bn2.num_batches_tracked, layer3.2.bn1.num_batches_tracked, layer2.3.bn2.num_batches_tracked, layer3.0.downsample.1.num_batches_tracked, layer3.3.bn2.num_batches_tracked, layer3.4.bn3.num_batches_tracked, layer1.0.bn2.num_batches_tracked, layer3.5.bn1.num_batches_tracked, layer1.1.bn3.num_batches_tracked, layer3.5.bn3.num_batches_tracked, layer4.0.downsample.1.num_batches_tracked, layer4.1.bn3.num_batches_tracked, layer2.3.bn1.num_batches_tracked, layer3.4.bn1.num_batches_tracked, layer2.2.bn2.num_batches_tracked, layer4.0.bn1.num_batches_tracked, layer2.1.bn3.num_batches_tracked, layer1.1.bn1.num_batches_tracked, layer3.0.bn1.num_batches_tracked, layer2.2.bn1.num_batches_tracked, layer1.0.downsample.1.num_batches_tracked, layer2.0.bn3.num_batches_tracked, layer1.1.bn2.num_batches_tracked, layer4.2.bn1.num_batches_tracked, layer3.2.bn3.num_batches_tracked, layer4.1.bn1.num_batches_tracked, layer2.0.downsample.1.num_batches_tracked, layer1.2.bn3.num_batches_tracked, layer3.1.bn2.num_batches_tracked, layer4.0.bn2.num_batches_tracked, layer3.3.bn3.num_batches_tracked, layer2.0.bn1.num_batches_tracked, layer2.3.bn3.num_batches_tracked, layer4.0.bn3.num_batches_tracked, bn1.num_batches_tracked, layer3.4.bn2.num_batches_tracked, layer4.1.bn2.num_batches_tracked, layer3.1.bn3.num_batches_tracked, layer1.0.bn1.num_batches_tracked, layer2.2.bn3.num_batches_tracked, layer3.1.bn1.num_batches_tracked, layer1.0.bn3.num_batches_tracked, layer3.5.bn2.num_batches_tracked
@nkolot
Copy link
Collaborator

nkolot commented Jun 16, 2020

Can you show me the full error traceback?
Also did you install our version of mmcv? We have made some modifications from the original in mmdetection.

@nicolasugrinovic
Copy link
Author

Thanks for the quick reply!

I installed mmcv just as mentioned on your readme.
Here is the full traceback:

unexpected key in source state_dict: fc.weight, fc.bias

missing keys in source state_dict: layer4.0.bn2.num_batches_tracked, layer3.4.bn1.num_batches_tracked, layer4.0.downsample.1.num_batches_tracked, layer1.0.downsample.1.num_batches_tracked, layer3.2.bn1.num_batches_tracked, layer1.2.bn3.num_batches_tracked, layer2.3.bn1.num_batches_tracked, layer2.0.bn2.num_batches_tracked, layer3.5.bn3.num_batches_tracked, layer3.0.downsample.1.num_batches_tracked, layer2.2.bn1.num_batches_tracked, layer1.0.bn1.num_batches_tracked, layer4.1.bn3.num_batches_tracked, layer1.1.bn2.num_batches_tracked, layer1.1.bn3.num_batches_tracked, layer3.0.bn1.num_batches_tracked, layer2.1.bn2.num_batches_tracked, layer2.1.bn3.num_batches_tracked, layer4.1.bn2.num_batches_tracked, layer2.0.bn1.num_batches_tracked, layer3.3.bn3.num_batches_tracked, layer4.0.bn3.num_batches_tracked, layer3.4.bn3.num_batches_tracked, layer2.2.bn2.num_batches_tracked, layer2.0.downsample.1.num_batches_tracked, layer1.2.bn2.num_batches_tracked, layer2.3.bn2.num_batches_tracked, layer4.1.bn1.num_batches_tracked, layer3.3.bn2.num_batches_tracked, layer3.3.bn1.num_batches_tracked, layer3.1.bn3.num_batches_tracked, layer3.0.bn3.num_batches_tracked, layer3.1.bn2.num_batches_tracked, layer3.4.bn2.num_batches_tracked, layer2.1.bn1.num_batches_tracked, layer4.2.bn1.num_batches_tracked, layer1.2.bn1.num_batches_tracked, layer3.2.bn2.num_batches_tracked, layer1.1.bn1.num_batches_tracked, layer4.2.bn3.num_batches_tracked, layer1.0.bn3.num_batches_tracked, layer3.1.bn1.num_batches_tracked, layer3.0.bn2.num_batches_tracked, layer1.0.bn2.num_batches_tracked, layer2.3.bn3.num_batches_tracked, layer2.0.bn3.num_batches_tracked, layer3.5.bn2.num_batches_tracked, layer4.0.bn1.num_batches_tracked, layer2.2.bn3.num_batches_tracked, layer3.5.bn1.num_batches_tracked, layer3.2.bn3.num_batches_tracked, bn1.num_batches_tracked, layer4.2.bn2.num_batches_tracked

2020-06-16 13:45:31,001 - INFO - load checkpoint from data/checkpoint.pt
2020-06-16 13:45:31,247 - WARNING - missing keys in source state_dict: smpl_head.smpl.v_template, smpl_head.loss.smpl.J_regressor_extra, smpl_head.loss.smpl.parents, smpl_head.smpl.posedirs, smpl_head.smpl.vertex_joint_selector.extra_joints_idxs, smpl_head.smpl.lbs_weights, smpl_head.smpl.J_regressor_extra, smpl_head.loss.smpl.vertex_joint_selector.extra_joints_idxs, smpl_head.loss.smpl.J_regressor, smpl_head.smpl.parents, smpl_head.smpl.J_regressor, smpl_head.loss.smpl.shapedirs, smpl_head.loss.smpl.posedirs, smpl_head.smpl.shapedirs, smpl_head.loss.smpl.faces_tensor, smpl_head.loss.smpl.v_template, smpl_head.loss.smpl.lbs_weights, smpl_head.smpl.faces_tensor

Traceback (most recent call last):
  File "tools/demo.py", line 190, in <module>
    main()
  File "tools/demo.py", line 149, in main
    runner.resume(cfg.resume_from)
  File "/home/nugrinovic/miniconda3/envs/multiperson/lib/python3.7/site-packages/mmcv/runner/runner.py", line 313, in resume
    self.optimizer.load_state_dict(checkpoint['optimizer'])
  File "/home/nugrinovic/miniconda3/envs/multiperson/lib/python3.7/site-packages/torch/optim/optimizer.py", line 115, in load_state_dict
    raise ValueError("loaded state dict contains a parameter group "
ValueError: loaded state dict contains a parameter group that doesn't match the size of optimizer's group

@nkolot
Copy link
Collaborator

nkolot commented Jun 16, 2020

So as I can see from your traceback, Line 313 in mmcv/runner/runner.py is different from the one in the repo.

self.mode = 'val'

The line where we load the optimizer state is actually Line 350

self.optimizer.load_state_dict(checkpoint['optimizer'])

I suspect that you might have a different version of mmcv installed at some point. We had encountered this issue in the past and explicitly put the optimizer state loading under a try/except block.

A quick solution would be to run rm -rf /home/nugrinovic/miniconda3/envs/multiperson/lib/python3.7/site-packages/mmcv* and then reinstall it. You might have to reinstall mmdetection if you do that.

@nicolasugrinovic
Copy link
Author

That was exactly it! Somehow, I had another version installed.
Now it is working, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants