Support weight_dacay(l2 actually) #5587

wyg1997 · 2021-07-23T12:09:07Z

SGD支持weight_decay参数，更新了文档：

但SGD底层和torch计算方式还没有对齐，后续需要在kernel对齐。

github-actions · 2021-07-23T13:28:42Z

Speed stats:

GPU Name: GeForce GTX 1080 

PyTorch resnet50 time: 136.3ms (= 6814.0ms / 50, input_shape=[16, 3, 224, 224], backward is enabled)
OneFlow resnet50 time: 123.9ms (= 6192.5ms / 50, input_shape=[16, 3, 224, 224], backward is enabled)
Relative speed: 1.10 (= 136.3ms / 123.9ms)

PyTorch resnet50 time: 83.7ms (= 4183.6ms / 50, input_shape=[8, 3, 224, 224], backward is enabled)
OneFlow resnet50 time: 72.3ms (= 3613.2ms / 50, input_shape=[8, 3, 224, 224], backward is enabled)
Relative speed: 1.16 (= 83.7ms / 72.3ms)

PyTorch resnet50 time: 58.3ms (= 2913.6ms / 50, input_shape=[4, 3, 224, 224], backward is enabled)
OneFlow resnet50 time: 52.0ms (= 2600.7ms / 50, input_shape=[4, 3, 224, 224], backward is enabled)
Relative speed: 1.12 (= 58.3ms / 52.0ms)

PyTorch resnet50 time: 47.4ms (= 2371.2ms / 50, input_shape=[2, 3, 224, 224], backward is enabled)
OneFlow resnet50 time: 50.5ms (= 2523.7ms / 50, input_shape=[2, 3, 224, 224], backward is enabled)
Relative speed: 0.94 (= 47.4ms / 50.5ms)

PyTorch resnet50 time: 43.0ms (= 2148.8ms / 50, input_shape=[1, 3, 224, 224], backward is enabled)
OneFlow resnet50 time: 50.6ms (= 2527.7ms / 50, input_shape=[1, 3, 224, 224], backward is enabled)
Relative speed: 0.85 (= 43.0ms / 50.6ms)

chengtbf · 2021-07-25T04:46:34Z

oneflow/python/nn/optimizer/sgd.py

@@ -60,16 +62,19 @@ def __init__(
        parameters: Union[Iterator[Parameter], List[Dict]],
        lr: float = 1e-3,
        momentum: float = 0.0,
+        weight_decay: float = 0.0,  # SGD's weight_decay actually does L2 Normalize


这个 L2，是跟 adam 的 L2 语义一致还是 adamW 的 L2 语义一致？ @leaves-zwx @strint @wyg1997 （即：跟 Optimizer 里的 WeightDecay 参数语义一致，还是 VariableConf 里的 L2 语义一致？）如果是后者的话， Lazy nn.Graph Optimizer 的 SGD 就要处理 Variable 的 L2 设置问题了

注：两者的区别是 WeightDecay 计算的时机不同。

这里的 weight_decay（SGD）就是 L2 regularization，它的计算发生在 momentum 更新之前，如果有 SGDW（pytorch 没实现），这里的 weight_decay 就发生在 momentum 更新之后（原教旨 weigth_decay）。如果 momentum beta 为 0（SGD 不带 momentum）, 则前述2者没有区别。

参考 https://github.com/Oneflow-Inc/OneTeam/issues/492

那按照我的理解，PyTorch 对齐的 SGD Optimizer 里的 WeightDecay ，其实不是我们 Lazy OptimizerConf 里的 WeightDecay，而是 Lazy VariableOpConf 里的 Regularizer 的 L2 参数？ @leaves-zwx @strint ，如果是这样的话，那么 nn.Graph 的 SGD 就要处理如何把这个WeightDecay 参数写到对应的 VariableOpConf 上去了

感觉 nn.Graph 的 add_optimizer 需要判断一下 optimizer 的类型了

嗯，SGD这里加上weight_decay后，我们就可以基于这个optimizer接着做下Variable的L2那个pr。add_optimizer会间接的感知类型，就是类型不同产生的数据不同。

feat(SGD): support weight_dacay(l2 actually)

6a9a2f6

wyg1997 requested a review from BBuf July 23, 2021 12:09

BBuf approved these changes Jul 23, 2021

View reviewed changes

wyg1997 added automerge eager enhancement labels Jul 23, 2021

wyg1997 requested a review from oneflow-ci-bot July 23, 2021 12:18

Merge branch 'master' into feat-SGD_weight_decay

b63c983

wyg1997 requested review from oneflow-ci-bot and removed request for oneflow-ci-bot July 23, 2021 12:18

Merge branch 'master' into feat-SGD_weight_decay

d6abadf

oneflow-ci-bot requested review from oneflow-ci-bot and removed request for oneflow-ci-bot July 23, 2021 12:51

oneflow-ci-bot merged commit 5c7bab4 into master Jul 23, 2021

oneflow-ci-bot deleted the feat-SGD_weight_decay branch July 23, 2021 13:52

chengtbf reviewed Jul 25, 2021

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support weight_dacay(l2 actually) #5587

Support weight_dacay(l2 actually) #5587

wyg1997 commented Jul 23, 2021

github-actions bot commented Jul 23, 2021

chengtbf Jul 25, 2021

leaves-zwx Jul 25, 2021

leaves-zwx Jul 25, 2021

chengtbf Jul 25, 2021

leaves-zwx Jul 25, 2021

strint Jul 26, 2021

Support weight_dacay(l2 actually) #5587

Support weight_dacay(l2 actually) #5587

Conversation

wyg1997 commented Jul 23, 2021

github-actions bot commented Jul 23, 2021

chengtbf Jul 25, 2021

Choose a reason for hiding this comment

leaves-zwx Jul 25, 2021

Choose a reason for hiding this comment

leaves-zwx Jul 25, 2021

Choose a reason for hiding this comment

chengtbf Jul 25, 2021

Choose a reason for hiding this comment

leaves-zwx Jul 25, 2021

Choose a reason for hiding this comment

strint Jul 26, 2021

Choose a reason for hiding this comment