Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix batchnorm layer numerics by replacing `powx()` #5136

wants to merge 4 commits into from


None yet
3 participants
Copy link

commented Dec 29, 2016

I had problems to train ResNet on ImageNet, by using the solver-setting debug_info: true, I saw that there were nan's produced by the BatchNorm layer.
The problem was that the square inside the variance-term was computed via the caffe_gpu_powx function. This is slow on the one hand, but also unstable, especially for negative numbers. Therefore it was replaced by a caffe_gpu_mul call.
In the same way, I added a caffe_gpu_sqrt function in order to replace the caffe_gpu_powx(..., ..., 0.5, ...), that could lead to the same problems.
Additionally I added Dtype() castings to make sure, where appropriate.
Note: I didn't change the CPU version as it seams to be working. Nevertheless, using caffe_sqr instead of caffe_powx (..., 2) could speed up the layer... of course, also here a sqrt-function could be better than a caffe_powx (..., 0.5).

@shelhamer shelhamer changed the title Fix batchnorm layer Fix batchnorm layer numerics by replacing `powx()` Jan 18, 2017

@shelhamer shelhamer added the focus label Jan 18, 2017

jeffdonahue added a commit that referenced this pull request Apr 13, 2017


This comment has been minimized.

Copy link

commented Apr 13, 2017

Thanks for this fix @pfollmann! Merged in c560658 with the corresponding change for CPU batch norm. I left off the explicit casts and comments as I don't think these are needed for this stabilization.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.