Make learning rate rescale based on mini-batch size rather than # workers #422

kyledmiller · 2023-02-28T17:34:32Z

Currently, mala rescales learning rate based on # of workers which indirectly follows the linear scaling rule (Goyal et al.) if mini batch size is variably calculated based on # of workers. But this isn't always the case as mini batch size can be fixed separately. I think we should scale LR based on the mini batch size instead to ensure we are following the linear scaling rule.

RandomDefaultUser added this to the v1.2.0 - Seizing the means of production milestone Apr 21, 2023

RandomDefaultUser removed this from the v1.2.0 - Seizing the means of production milestone Jun 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make learning rate rescale based on mini-batch size rather than # workers #422

Make learning rate rescale based on mini-batch size rather than # workers #422

kyledmiller commented Feb 28, 2023

Make learning rate rescale based on mini-batch size rather than # workers #422

Make learning rate rescale based on mini-batch size rather than # workers #422

Comments

kyledmiller commented Feb 28, 2023