paddle 的 gradient clipping 疑似无法被触发，求解答。 #775

lcy-seso · 2016-12-08T06:24:36Z

paddle 实现了对 gradient 做 element-wise 的 hard clipping。我的设置方式是在配置中调用：

default_gradient_clipping_threshold(10)

输出配置解析结果，也确定参数的 gradient clipping 阈值设置成功。

代码中 FirstOrderOptimizer.cpp文件中，OptimizerWithGradientClipping::update 实现了梯度的clipping，但是这个函数现在并没有没有被调用。

训练算法参数如下：

Settings(algorithm="sgd",
             learning_method="adam",
             learning_rate=5e-4,
             learning_rate_decay_a=0,
             learning_rate_decay_b=0,
             ada_rou=0.95,
             ada_epsilon=1e-6,
             batch_size=4,
             num_batches_per_send_parameter=1,
             num_batches_per_get_parameter=1, )

目前只是本地gpu 单卡训练，未开启 do _average_in_cpu.
代码中：trainer 的 init 调用 createParameterUpdater，然后会 new SgdLocalUpdater，再 reset成 AverageOptimizer，因为没有设置 average sgd 的窗口，这个 optimizer 什么都没干。
而只有 OptimizerWithRegularizer 会创建 OptimizerWithGradientClipping，始终无法确定如何能触发 gradient clipping 。

求解答，谢谢啦。

lcy-seso · 2016-12-08T06:38:45Z

现在看上去只有 kSgdSparseCpuTraining 这个分枝会触发 Gradient clipping ？

lcy-seso · 2016-12-08T07:08:27Z

代码里面把 SgdLocalUpdater 换成 SgdThreadUpdater 是可以解决gradient clipping 的问题。希望有一个比较彻底的办法，怀疑这段代码的逻辑有些问题。

qingqing01 · 2016-12-08T07:19:45Z

SgdLocalUpdater支持功能：构造函数里支持AverageOptimizer
SgdThreadUpdater支持功能： init函数 -> sgdOptimizerCreate, 而sgdOptimizerCreate里支持 OptimizerWithRegularizer和AverageOptimizer。

所以SgdThreadUpdater是否可完全替代 SgdLocalUpdater ？ @emailweixu @reyoung

emailweixu · 2016-12-08T19:57:58Z

One major difference is that SgdThreadUpdater cannot use with ConcurrentRemoteUpdater, which is the current default. However, ConcurrentRemoteParameterUpdater hasn't shown clear advantage over RemoteParameterUpdater yet. So it should be ok to default to use RemoteParameterUpdater (--use_old_updater=true).

And I agree that we should default to use SgdThreadUpdater. There are a few cases not supported by SgdThreadUpdater. As long as we give clear message about them, it should be fine.

* add pre-commit CI

backyes assigned lcy-seso and emailweixu Dec 9, 2016

backyes added need be discussed NeedMoreDetails and removed NeedMoreDetails labels Dec 9, 2016

lcy-seso mentioned this issue Jan 13, 2017

GPU运行seqToseq的Demo一段时间后报错 #1143

Closed

qingqing01 mentioned this issue Apr 25, 2017

Enable gradient clipping. #1894

Closed

kuke closed this as completed Jul 28, 2017

emailweixu reopened this Aug 25, 2017

yu239-zz mentioned this issue Aug 27, 2017

question again on gradient_clipping_threshold #3696

Closed

qingqing01 closed this as completed Nov 6, 2017

zhhsplendid pushed a commit to zhhsplendid/Paddle that referenced this issue Sep 25, 2019

polish yolov3_loss annotation (PaddlePaddle#775)

22c2449

yaozhixin pushed a commit to graphcore/Paddle-fork that referenced this issue Jul 1, 2022

add pre-commit CI (PaddlePaddle#775)

8754988

* add pre-commit CI

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

paddle 的 gradient clipping 疑似无法被触发，求解答。 #775

paddle 的 gradient clipping 疑似无法被触发，求解答。 #775

lcy-seso commented Dec 8, 2016 •

edited

Loading

lcy-seso commented Dec 8, 2016

lcy-seso commented Dec 8, 2016

qingqing01 commented Dec 8, 2016

emailweixu commented Dec 8, 2016

paddle 的 gradient clipping 疑似无法被触发，求解答。 #775

paddle 的 gradient clipping 疑似无法被触发，求解答。 #775

Comments

lcy-seso commented Dec 8, 2016 • edited Loading

lcy-seso commented Dec 8, 2016

lcy-seso commented Dec 8, 2016

qingqing01 commented Dec 8, 2016

emailweixu commented Dec 8, 2016

lcy-seso commented Dec 8, 2016 •

edited

Loading