You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, mala rescales learning rate based on # of workers which indirectly follows the linear scaling rule (Goyal et al.) if mini batch size is variably calculated based on # of workers. But this isn't always the case as mini batch size can be fixed separately. I think we should scale LR based on the mini batch size instead to ensure we are following the linear scaling rule.
The text was updated successfully, but these errors were encountered:
Currently, mala rescales learning rate based on # of workers which indirectly follows the linear scaling rule (Goyal et al.) if mini batch size is variably calculated based on # of workers. But this isn't always the case as mini batch size can be fixed separately. I think we should scale LR based on the mini batch size instead to ensure we are following the linear scaling rule.
The text was updated successfully, but these errors were encountered: