-
Notifications
You must be signed in to change notification settings - Fork 526
[SYSTEMDS-2550] Parameter Server Validation and FedParamServ Statistics #1154
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I can understand why this is done with the logging. But i think this is a slight misuse of the logging in the system, since ideally we (or at least me) want no difference in execution (or as close to) when we set different logging levels. This violates it since the validation in the parameter server can take many seconds of execution time. I would suggest adding a parameter some other way to enable / disable the validation. While it is still possible that we output the validation scores through LOG.info when that other "unrelated to logging level" setting is set. But maybe I'm the only one with this opinion? @mboehm7 ? furthermore designwise, you might want to consider doing this validation in parallel with the distribution of model parameters. since we know that the controller is doing nothing while the workers are training, you could do the validation on a separate thread at no cost to execution time. |
Baunsgaard
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
two logic errors i spotted.
|
|
||
| protected synchronized void updateGlobalModel(int workerID, ListObject gradients) { | ||
| try { | ||
| if (LOG.isDebugEnabled()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
everything is surrounded by a is debug enabled witch is a lower level than info.
| if(LOG.isInfoEnabled()) { | ||
| // This if works similarly to the one for BSP, but divides the sync couter through the number of workers, | ||
| // creating "Pseudo Epochs" | ||
| if (LOG.isInfoEnabled() && _validationPossible && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
outerif checks the same as first part of inner if.
|
yes, I agree with @Baunsgaard regarding logging as it might hide bugs if there is additional state updated. Just guard the actual info logging (and string concatenation). Regarding parallelization please put a TODO in there and focus on the main parts of the validation first. |
…mmunciation overhead
|
Thank you @Baunsgaard and @mboehm7! I have addressed the issues you raised. |
|
LGTM - thanks for the extension @tobiasrieger. I only made minor changes during the merge: (1) reverted the cumulative time measurement in Timing and changed the setup measurement accordingly, (2) added the data partitioning timing (time was measured but not collected), (3) reverted the changed test logging level to INFO, and (4) some minor formatting changes to avoid huge indentation. |
This PR contains a lot of valuable features for the parameter server
Validation
There is now a second createPS function in the ParamservBuiltinCPInstruction. If all the additional arguments are specified the parameter server is able to validate after each epoch. It will do so if LOG.info is enabled.
This feature is implemented for the federated parameter server ONLY, but is easily implemented for the other cases too.
Federated Parameter Server Statistics
The federated parameter server now uses the same statistics as the regular one, where possible. Also a number of new ones were introduced:
Other Changes