Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix ft job converge #5132

Merged
merged 2 commits into from
Oct 27, 2017
Merged

Conversation

typhoonzero
Copy link
Contributor

@typhoonzero typhoonzero commented Oct 26, 2017

Fix fault tolerant job not converge bug, previous way will overwrite optimizer learning_rate settings. Still I add a TODO, we should find a way to update optimization settings per parameter.

See: https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/v2/topology.py#L55 some global settings hacking.

@helinwang
Copy link
Contributor

helinwang commented Oct 27, 2017

CI failed with

NewRemoteParameterUpdater.cpp:115: Missing username in TODO; it should look like "// TODO(my_username): Stuff." [readability/todo] [2]

I pushed a commit to fix it.

Copy link
Contributor

@helinwang helinwang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM++!

@typhoonzero
Copy link
Contributor Author

Thanks @helinwang !

@typhoonzero typhoonzero merged commit 2000caf into PaddlePaddle:develop Oct 27, 2017
@typhoonzero typhoonzero deleted the fix_ft_job_converge branch December 22, 2017 05:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants