Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change customize_loss_grad to use_default_grad_scale. #10223

Merged
merged 9 commits into from
May 2, 2018

Conversation

pkuyym
Copy link
Contributor

@pkuyym pkuyym commented Apr 26, 2018

Resolves #10219

reyoung
reyoung previously approved these changes Apr 26, 2018
Copy link
Collaborator

@reyoung reyoung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool

panyx0718
panyx0718 previously approved these changes Apr 26, 2018
Copy link
Contributor

@panyx0718 panyx0718 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also update the transformer model?

@@ -46,6 +46,10 @@ def __init__(self,
improve performance in some cases, defalut False.
share_vars_from(ParallelExecutor, default None): If provied,
it will share variables from the specified ParallelExecutor.
use_default_grad_scale(bool, default True): If set True, a default
scale value equal to `1./device_count` would be multiplied to
the gradients. Otherwise, a customized scale value should be
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to gradients of each device? and then aggregated?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, followed the comment.

use_default_grad_scale(bool, default True): If set True, a default
scale value equal to `1./device_count` would be multiplied to
the gradients. Otherwise, a customized scale value should be
feeded to the network.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

feeded->fed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Contributor Author

@pkuyym pkuyym left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your comments and will update transformer after this PR merged.

use_default_grad_scale(bool, default True): If set True, a default
scale value equal to `1./device_count` would be multiplied to
the gradients. Otherwise, a customized scale value should be
feeded to the network.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -46,6 +46,10 @@ def __init__(self,
improve performance in some cases, defalut False.
share_vars_from(ParallelExecutor, default None): If provied,
it will share variables from the specified ParallelExecutor.
use_default_grad_scale(bool, default True): If set True, a default
scale value equal to `1./device_count` would be multiplied to
the gradients. Otherwise, a customized scale value should be
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, followed the comment.

@reyoung reyoung dismissed stale reviews from panyx0718 and themself via c0ac0cd April 28, 2018 06:07
@pkuyym pkuyym merged commit 9a8be9d into PaddlePaddle:develop May 2, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants