Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

optimizer must be defined before network topology, otherwise BUG occurs. #2621

Closed
lcy-seso opened this issue Jun 27, 2017 · 2 comments
Closed
Assignees
Labels

Comments

@lcy-seso
Copy link
Contributor

lcy-seso commented Jun 27, 2017

  • PaddlePaddle simplifies the way it parses the network configuration by introducing some global variables in this PR Fix V2 API #2288.

  • optimizer is used to set the default values for some parameters, including L2 regularization, gradient_clipping_threshold and so on. These settings are stored in some global variables, so before parsing the network topology, these global variables should be correctly initialized.

  • This means the definition of optimizer must be called before the definition of network topology to enable the global parameter settings.

  • The following codes are wrong because gradient_clipping_threshold and L2 regularization will not work.

    cost = seq2seq_net(source_dict_dim, target_dict_dim)
    parameters = paddle.parameters.create(cost)
    
    optimizer = paddle.optimizer.RMSProp(
        learning_rate=1e-3,
        gradient_clipping_threshold=10.0,
        regularization=paddle.optimizer.L2Regularization(rate=8e-4))
  • This will be right:

    optimizer = paddle.optimizer.RMSProp(
        learning_rate=1e-3,
        gradient_clipping_threshold=10.0,
        regularization=paddle.optimizer.L2Regularization(rate=8e-4))
        
    cost = seq2seq_net(source_dict_dim, target_dict_dim)
    parameters = paddle.parameters.create(cost)
  • This is very dangerous for users because no error is reported.

@lcy-seso
Copy link
Contributor Author

lcy-seso commented Jun 27, 2017

Currently, it requires the calling order must be like this:

  1. define the optimizer
  2. define the network (v2.layer.parse_network is called )
  3. create the parameters (in parameter.create, v2.layer.parse_network is called the second time, can this be avoided? It seems that the function parameter.create is not necessary.)

the order 3 follows 2 is guaranteed by program syntax, but the order 1 must before 2 is left to the users.

@emailweixu I am not quite sure what is the best way to fix this. Checking the calling order (a flag indicates whether optimizer is called and topology tests this flag) may be not a good enough way, or we have to do so, and then all the demos in PaddleBook and PaddleModels must be modified accordingly.

Do you have some suggestions?

@lcy-seso
Copy link
Contributor Author

lcy-seso commented Jun 27, 2017

I think the problem is we want to make optimizer and parameter two independent concepts, but currently, topology and also parameter depend on optimizer.

I hope to get some suggestions that should PaddlePaddle guarantee such a dependence or leave it to the users (to check optimizer must be defined before topology)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants