Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't load data parallel model #3

Closed
NoamRosenberg opened this issue Jul 16, 2019 · 2 comments
Closed

Can't load data parallel model #3

NoamRosenberg opened this issue Jul 16, 2019 · 2 comments

Comments

@NoamRosenberg
Copy link
Owner

NoamRosenberg commented Jul 16, 2019

When trying to load data parallel model I get an error which I can't see because the command line gets flooded with model layers like such. I thought this might have something to do with data parallel because of the "module", but even removing the "module" from the state_dict names or trying to load with model.module.load_state_dict doesn't work.

.
.
.
.
6.0.conv_p.1.running_var", "module.aspp_16.0.conv_p.1.num_batches_tracked", "module.aspp_16.0.concate_conv.0.weight", "module.aspp_16.0.concate_conv.1.weight", "module.aspp_16.0.concate_conv.1.bias", "module.aspp_16.0.concate_conv.1.running_mean", "module.aspp_16.0.concate_conv.1.running_var", "module.aspp_16.0.concate_conv.1.num_batches_tracked", "module.aspp_32.0.conv11.0.weight", "module.aspp_32.0.conv11.1.weight", "module.aspp_32.0.conv11.1.bias", "module.aspp_32.0.conv11.1.running_mean", "module.aspp_32.0.conv11.1.running_var", "module.aspp_32.0.conv11.1.num_batches_tracked", "module.aspp_32.0.conv33.0.weight", "module.aspp_32.0.conv33.1.weight", "module.aspp_32.0.conv33.1.bias", "module.aspp_32.0.conv33.1.running_mean", "module.aspp_32.0.conv33.1.running_var", "module.aspp_32.0.conv33.1.num_batches_tracked", "module.aspp_32.0.conv_p.0.weight", "module.aspp_32.0.conv_p.1.weight", "module.aspp_32.0.conv_p.1.bias", "module.aspp_32.0.conv_p.1.running_mean", "module.aspp_32.0.conv_p.1.running_var", "module.aspp_32.0.conv_p.1.num_batches_tracked", "module.aspp_32.0.concate_conv.0.weight", "module.aspp_32.0.concate_conv.1.weight", "module.aspp_32.0.concate_conv.1.bias", "module.aspp_32.0.concate_conv.1.running_mean", "module.aspp_32.0.concate_conv.1.running_var", "module.aspp_32.0.concate_conv.1.num_batches_tracked", "module.final_conv.weight", "module.final_conv.bias".

@NoamRosenberg NoamRosenberg changed the title Can't load data parallel model Can't load model Jul 16, 2019
@NoamRosenberg
Copy link
Owner Author

NoamRosenberg commented Jul 16, 2019

I wrote CUDA_AVAILABLE_DEVICES instead of CUDA_VISIBLE_DEVICES. later I cleaned out the module wrapper and loaded it as I would a non data parallel model. For now this only works on a single GPU. However I still can't load parallel models on multi GPUs

@NoamRosenberg NoamRosenberg changed the title Can't load model Can't load data parallel model Jul 16, 2019
@NoamRosenberg
Copy link
Owner Author

okay, for some reason in this version of pytorch we can load parallel model state_dict without specifically mentioning .module

I've committed and pushed a fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant