You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I thought I'd document a fix for non-distributed training in case other people are trying to use this repo as well.
Even if distributed training isn't selected, you'll get a:
RuntimeError: Default process group has not been initialized, please make sure to call init_process_group.
during the forward call.
This happens in the decoder head due to the SyncBN layer. Find the config file under base corresponding to your model, and change the norm_cfg to use BN instead of SyncBN. For example, I'm using SeMask-FPN, so under configs/base/models/semfpn_semask_swin.py, I change:
The layers seem to be totally compatible with one another, so you can load weights from models using SyncBN fine. Just a warning that you'll need to hardcode this if changing between regular and distributed training.
The text was updated successfully, but these errors were encountered:
You can quickly run a process on a single GPU with the tools/dist_train.sh script by setting the GPUS=1. You won't have to hardcode anything in that case. Is there some disadvantage to doing that and running the regular training instead?
This is a great point, and I don't think there is any issue with just using the dist_train.sh script instead of running train.py directly. Thanks for the great work btw, I've used SeMask for a few projects and the pre-trained weights are really helpful!
I thought I'd document a fix for non-distributed training in case other people are trying to use this repo as well.
Even if distributed training isn't selected, you'll get a:
during the forward call.
This happens in the decoder head due to the SyncBN layer. Find the config file under base corresponding to your model, and change the norm_cfg to use BN instead of SyncBN. For example, I'm using SeMask-FPN, so under configs/base/models/semfpn_semask_swin.py, I change:
to
The layers seem to be totally compatible with one another, so you can load weights from models using SyncBN fine. Just a warning that you'll need to hardcode this if changing between regular and distributed training.
The text was updated successfully, but these errors were encountered: