Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

train my own data error #42

Open
zhepherd opened this issue May 5, 2020 · 2 comments
Open

train my own data error #42

zhepherd opened this issue May 5, 2020 · 2 comments

Comments

@zhepherd
Copy link

zhepherd commented May 5, 2020

2020-05-05 00:42:16,806 - mmdet - INFO - Start running, host: huangzhipeng@k8s-deploy-rod9ow-1567512049745-7b4474f8b7-b8k8k, work_dir: /nfs/project/huangzhipeng/tools/opensorce/SOLO/work_dirs/decoupled_solo_release_r50_fpn_8gpu_3x
2020-05-05 00:42:16,808 - mmdet - INFO - workflow: [('train', 1)], max: 36 epochs
/tmp-data/huangzhipeng/anaconda3/envs/solo/lib/python3.7/site-packages/torch/nn/functional.py:2506: UserWarning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details.
"See the documentation of nn.Upsample for details.".format(mode))
/tmp-data/huangzhipeng/anaconda3/envs/solo/lib/python3.7/site-packages/torch/nn/functional.py:2506: UserWarning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details.
"See the documentation of nn.Upsample for details.".format(mode))
/tmp-data/huangzhipeng/anaconda3/envs/solo/lib/python3.7/site-packages/torch/nn/functional.py:2506: UserWarning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details.
"See the documentation of nn.Upsample for details.".format(mode))
/tmp-data/huangzhipeng/anaconda3/envs/solo/lib/python3.7/site-packages/torch/nn/functional.py:2506: UserWarning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details.
"See the documentation of nn.Upsample for details.".format(mode))
Traceback (most recent call last):
File "./tools/train.py", line 125, in
main()
File "./tools/train.py", line 121, in main
timestamp=timestamp)
File "/nfs/project/huangzhipeng/tools/opensorce/SOLO/mmdet/apis/train.py", line 103, in train_detector
timestamp=timestamp)
File "/nfs/project/huangzhipeng/tools/opensorce/SOLO/mmdet/apis/train.py", line 250, in dist_train
runner.run(data_loaders, cfg.workflow, cfg.total_epochs)
File "/tmp-data/huangzhipeng/anaconda3/envs/solo/lib/python3.7/site-packages/mmcv/runner/runner.py", line 364, in run
epoch_runner(data_loaders[i], **kwargs)
File "/tmp-data/huangzhipeng/anaconda3/envs/solo/lib/python3.7/site-packages/mmcv/runner/runner.py", line 268, in train
self.model, data_batch, train_mode=True, **kwargs)
File "/nfs/project/huangzhipeng/tools/opensorce/SOLO/mmdet/apis/train.py", line 79, in batch_processor
loss, log_vars = parse_losses(losses)
File "/nfs/project/huangzhipeng/tools/opensorce/SOLO/mmdet/apis/train.py", line 56, in parse_losses
dist.all_reduce(loss_value.div
(dist.get_world_size()))
File "/tmp-data/huangzhipeng/anaconda3/envs/solo/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 902, in all_reduce
work = _default_pg.allreduce([tensor], opts)
RuntimeError: Socket Timeout

@zhepherd
Copy link
Author

zhepherd commented May 5, 2020

image

@lucasjinreal
Copy link

@zhepherd How did u change the config according to your classes?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants