Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.Sign up
DL4J Spark training: Batch norm variance may (rarely) become negative during training #6750
Batch norm mean/variance is usually updated (locally) by just modifying the parameters during forward pass when training==true.
This is fine for single node training, but doesn't work at all for distributed training.
However, when the variance estimate becomes small relative to the threshold, we can have a situation where
One solution: #6749
Another (quicker) solution: reparameterize the variance "parameter" to something else, like log variance (which can be positive or negative).