Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

optimize optimizer learning rate #8873

Closed
jacquesqiao opened this issue Mar 8, 2018 · 2 comments · Fixed by #8874
Closed

optimize optimizer learning rate #8873

jacquesqiao opened this issue Mar 8, 2018 · 2 comments · Fixed by #8874
Assignees

Comments

@jacquesqiao
Copy link
Member

jacquesqiao commented Mar 8, 2018

Background

Profile script: dzhwinter/benchmark#84

From issue #8818 we can see that in parameter optimization stage, there are many elementwise_mul ops, they take a lot of time.
image

These elementwise_mul ops are used to compute learning_rate for each parameter because every parameter may have a different learning_rate, the computation process is

param_lr = global_lr * lr_for_param

global_lr is a global Variable, lr_for_param is a float value for a parameter, the default value is 1.0. The code above adds the elementwise_mul ops to the main program.

The improvement

Most of the time, the value of lr_for_param is 1.0, in this condition we have no need to add these elementwise_mul ops.

The logic after optimization should be:

if lr_for_param == 1.0:
    param_lr = global_lr
else:
    param_lr = global_lr * lr_for_param

A complete solution should be constant folding, we should add a constant folding transpiler which will recognize all constant value and calculate them during compile stage, this will reduce many ops running when executing the program.

Optimization result

Timeline after optimize
image

calc_step_num ave_step_time(before) ave_step_time(after) after/before
3 1.12088267008 1.03341897329 0.9219689097488165
38 1.05036788238 0.987895676964 0.9405234999432334
78 1.06520705345 0.953312274737 0.894954902569792
@wangkuiyi
Copy link
Collaborator

Is it because both global_lr and lr_for_param are constants, we could remove those elementwise_muls by computing global_lr*lr_for_param at compile time? If so, do we need to add a compilation optimization stage? Where should it be? In a transpiler? @jacquesqiao

@jacquesqiao
Copy link
Member Author

jacquesqiao commented Mar 9, 2018

@wangkuiyi yes, a better solution should be constant folding, we should add a constant folding transpiler which will recognize all constant value and calculate them during compile stage, this will reduce many ops running when executing the program.

I will add this transpiler soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants