Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

paddle 的 gradient clipping 疑似无法被触发,求解答。 #775

Closed
lcy-seso opened this issue Dec 8, 2016 · 4 comments
Closed

paddle 的 gradient clipping 疑似无法被触发,求解答。 #775

lcy-seso opened this issue Dec 8, 2016 · 4 comments
Assignees

Comments

@lcy-seso
Copy link
Contributor

lcy-seso commented Dec 8, 2016

paddle 实现了 对 gradient 做 element-wise 的 hard clipping。我的设置方式是在配置中调用:

default_gradient_clipping_threshold(10)

输出配置解析结果,也确定参数的 gradient clipping 阈值设置成功。

代码中 FirstOrderOptimizer.cpp文件中,OptimizerWithGradientClipping::update 实现了梯度的clipping,但是这个函数现在并没有没有被调用。

训练算法参数如下:

Settings(algorithm="sgd",
             learning_method="adam",
             learning_rate=5e-4,
             learning_rate_decay_a=0,
             learning_rate_decay_b=0,
             ada_rou=0.95,
             ada_epsilon=1e-6,
             batch_size=4,
             num_batches_per_send_parameter=1,
             num_batches_per_get_parameter=1, )

目前只是本地gpu 单卡训练,未开启 do _average_in_cpu.
代码中:trainer 的 init 调用 createParameterUpdater,然后会 new SgdLocalUpdater,再 reset成 AverageOptimizer,因为没有设置 average sgd 的窗口,这个 optimizer 什么都没干。
而只有 OptimizerWithRegularizer 会创建 OptimizerWithGradientClipping,始终无法确定如何能触发 gradient clipping 。

求解答,谢谢啦。

@lcy-seso
Copy link
Contributor Author

lcy-seso commented Dec 8, 2016

现在看上去只有 kSgdSparseCpuTraining 这个分枝会触发 Gradient clipping ?

@lcy-seso
Copy link
Contributor Author

lcy-seso commented Dec 8, 2016

代码里面把 SgdLocalUpdater 换成 SgdThreadUpdater 是可以解决gradient clipping 的问题。希望有一个比较彻底的办法,怀疑这段代码的逻辑有些问题。

@qingqing01
Copy link
Contributor

  • SgdLocalUpdater支持功能: 构造函数里支持AverageOptimizer
  • SgdThreadUpdater支持功能: init函数 -> sgdOptimizerCreate, 而sgdOptimizerCreate里支持 OptimizerWithRegularizer和AverageOptimizer。

所以SgdThreadUpdater是否可完全替代 SgdLocalUpdater ? @emailweixu @reyoung

@emailweixu
Copy link
Collaborator

One major difference is that SgdThreadUpdater cannot use with ConcurrentRemoteUpdater, which is the current default. However, ConcurrentRemoteParameterUpdater hasn't shown clear advantage over RemoteParameterUpdater yet. So it should be ok to default to use RemoteParameterUpdater (--use_old_updater=true).

And I agree that we should default to use SgdThreadUpdater. There are a few cases not supported by SgdThreadUpdater. As long as we give clear message about them, it should be fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants