-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
About multi-gpus training #1
Comments
Do you have any Suggestions? I would appreciate it if you could reply,thanks. |
Does this error occur if you use a CPU or a single GPU? |
|
I got results similar to yours. I'm not sure if it's a code problem or a hyperparameter problem. I tried to email the author of the paper, but I didn't get a reply. Finally, the problem I encountered is that I don’t know how to calculate mAP, because some threshold parameters are not 0 to 1, such as class_threshold and peak_threshold in PRM I hope it will be helpful for your future experiments. If you have good ideas or found some bugs, please contact me |
And the code of the PRM part is given by the author. |
OK, Thank you for the suggestions! I will keep trying! |
Hi, Liu. Thanks for sharing your work. Now, I meet a problem when training the simple-IAM with multi-gpus. nn.DataParallel works when training the prm classification networks. However, the training process fails when it comes to the iam. Here is my modification on your codes:
self.optimizer_filling = nn.DataParallel(self.optimizer_filling, device_ids=self.Device_ids)
self.optimizer_prm = nn.DataParallel(self.optimizer_prm, device_ids=self.Device_ids)
self.prm_module = nn.DataParallel(peak_response_mapping(self.basebone, **config['model']), device_ids=self.Device_id)
self.filling_module = nn.DataParallel(instance_extent_filling(config), device_ids=self.Device_ids)
self.filling_module.module.load_state_dict(checkpoint['state_dict'], False)
self.prm_module.module.load_state_dict(checkpoint['state_dict'], False)
RuntimeError: Caught RuntimeError in replica 1 on device 1.
Original Traceback (most recent call last):
File "/home/user/anaconda3/envs/CenterMask/lib/python3.7/site-packages/torch/nn/parallel/parallel_apply.py", line 60, in _worker
output = module(*input, **kwargs)
File "/home/user/anaconda3/envs/CenterMask/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in call
result = self.forward(*input, **kwargs)
File "/media/ExtHDD/zzp/simple-IAM-master/iam/modules/instance_extent_filling.py", line 105, in forward
self.channel_num, self.kernel, self.kernel)
RuntimeError: shape '[2, 112, 112, 16, 3, 3]' is invalid for input of size 1806336
The text was updated successfully, but these errors were encountered: