Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DL4J Spark training: Batch norm variance may (rarely) become negative during training #6750

Closed
AlexDBlack opened this issue Nov 22, 2018 · 1 comment

Comments

Projects
None yet
1 participant
@AlexDBlack
Copy link
Member

commented Nov 22, 2018

Batch norm mean/variance is usually updated (locally) by just modifying the parameters during forward pass when training==true.

This is fine for single node training, but doesn't work at all for distributed training.
Thus, we push the parameter changes into the updates vector and all is well.

However, when the variance estimate becomes small relative to the threshold, we can have a situation where varEstimate - threshold < 0 which can obviously break things.
In principle, we can also have update stacking in this situation: i.e., N nodes change the variance estimate simultaneously, thus we have varEstimate - n*threshold < 0.

One solution: #6749

Another (quicker) solution: reparameterize the variance "parameter" to something else, like log variance (which can be positive or negative).
The choice of reparameterization obviously matters - a badly chosen parameterization may impact convergence (as we'll need to take very large - or very small - steps to get to where we wan to be).

@lock

This comment has been minimized.

Copy link

commented Dec 23, 2018

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Dec 23, 2018

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.