Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The hyper parameters of paddle.optimizer does not work in v2 API. #2042

Closed
qingqing01 opened this issue May 7, 2017 · 6 comments
Closed

Comments

@qingqing01
Copy link
Contributor

qingqing01 commented May 7, 2017

The hyper parameters in the paddle.optimizer does not work in v2 API. For example, using momentum optimizer in the sentiment demo as follows,

    optimizer = paddle.optimizer.Momentum(
        learning_rate=2e-3,
        momentum=0.9,
        gradient_clipping_threshold=25.0,
        regularization=paddle.optimizer.L2Regularization(rate=8e-4),
        model_average=paddle.optimizer.ModelAverage(average_window=0.5))

Then print the proto-string of config before this line in python/paddle/v2/trainer.py, it can be found that the proto-string of parameters does not contain the hyper parameters, such as L2 regularization and momentum. The momentum is 0 if you print it before this line in paddle/parameter/FirstOrderOptimizer.h. The proto-string of parameters are as follows,

parameters {
  name: "___embedding_layer_0__.w0"
  size: 658816
  initial_mean: 0.0
  initial_std: 0.0139387206988
  dims: 5147
  dims: 128
  initial_strategy: 0
  initial_smart: true
}
parameters {
  name: "___sequence_conv_pool_0___conv_fc.w0"
  size: 49152
  initial_mean: 0.0
  initial_std: 0.051031036308
  dims: 384
  dims: 128
  initial_strategy: 0
  initial_smart: true
}
....

But the correct proto-string of parameters should contain decay_rate and momentum, as follows,

parameters {
  name: "___embedding_0__.w0"
  size: 3840000
  momentum: 0.9
  initial_mean: 0.0
  initial_std: 0.0057735026919
  decay_rate: 0.0008
  dims: 30000
  dims: 128
  initial_strategy: 0
  initial_smart: true
  gradient_clipping_threshold: 25.0
}
parameters {
  name: "___fc_layer_0__.w0"
  size: 65536
  momentum: 0.9
  initial_mean: 0.0
  initial_std: 0.0883883476483
  decay_rate: 0.0008
  dims: 128
  dims: 512
  initial_strategy: 0
  initial_smart: true
  gradient_clipping_threshold: 25.0
}
...
@qingqing01 qingqing01 added the Bug label May 8, 2017
@reyoung reyoung self-assigned this May 8, 2017
@reyoung reyoung added this to 未规划 in Scrum Board May 8, 2017
@luotao1 luotao1 added this to 已有BUG in V2 API Enhancement May 9, 2017
@lcy-seso lcy-seso added this to Top priorities in Defects board May 10, 2017
@lcy-seso lcy-seso moved this from Not in schedule to Next Week in Defects board May 10, 2017
@lcy-seso lcy-seso moved this from Next Week to Current Week ToDo in Defects board May 10, 2017
@lcy-seso lcy-seso moved this from Current Week ToDo to Not in schedule in Defects board May 10, 2017
@reyoung
Copy link
Collaborator

reyoung commented May 10, 2017

This bug is hard to fix. Because we split model configuration into two parts, the topology configuration, and the optimizer settings. When we config and parse topology, there is no optimizer information we set. However, the weight_decay belongs to topology now.

Here is a step by step solution to this issue.

  1. Disable weight_decay in optimizer settings. If a user wants a global weight_decay, he can maintain a ParamAttr by himself.

  2. Make weight_decay out of Parameter Configuration or set the global weight_decay in topology configuration(e.g. ModelConfig) itself.

The second step is a little bit hard to implement, may change the C++ Core of Paddle.

@reyoung reyoung moved this from Not in schedule to Next Week in Defects board May 10, 2017
@qingqing01
Copy link
Contributor Author

qingqing01 commented May 10, 2017

不仅仅是weight_decay的问题,还包括momentum, gradient_clipping_threshold

@reyoung
Copy link
Collaborator

reyoung commented May 10, 2017

The solution might be as following.

The global weight_decay, momentum, gradient_clipping_threshold should be saved into proto::OptimizationConfig which saved inside TrainerConfig.proto, just like learning_rate in OptimizationConfig.

And then we can get the global weight_decay in all optimizers.

@lcy-seso lcy-seso moved this from Next Week to Doing in Defects board May 22, 2017
@lcy-seso lcy-seso removed this from Doing in Defects board May 22, 2017
@lcy-seso lcy-seso moved this from BUG to 已完成 in V2 API Enhancement Jun 7, 2017
@lcy-seso
Copy link
Contributor

lcy-seso commented Jun 7, 2017

This problem has been fixed by this PR #2288.
Thank you for the issue.

@lcy-seso lcy-seso closed this as completed Jun 7, 2017
@lcy-seso lcy-seso reopened this Jun 22, 2017
@lcy-seso
Copy link
Contributor

This problem is not solved yet, so I reopen it.

@qingqing01
Copy link
Contributor Author

close since the v2 API has fixed this issue.

heavengate pushed a commit to heavengate/Paddle that referenced this issue Aug 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Scrum Board
未规划
Development

No branches or pull requests

3 participants