-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add float16 support to batch norm operator #9176
Conversation
@@ -118,15 +122,16 @@ class BatchNormKernel<platform::CUDADeviceContext, T> | |||
|
|||
// alloc memory | |||
y->mutable_data<T>(ctx.GetPlace()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why here does not use BatchNormParamType?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per the cudnn API, when we are using fp16 mode to run Batch norm cudnn, the data type of input x
and output y
will be fp16, but the other input parameters including mean
and variance
will still be float32 type.
So here T == float16
, and BatchNormParamType<T> == float
as defined in cudnn_helper.h
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
fix #9175
Only added and verified float16 kernel for the inference mode of cudnn batch norm kernel,
which is needed to run vgg/resnet inference.
OpTest.np_dtype_to_fluid_dtype is to change the dtype of a numpy array from float16 to uint16 so that it can correctly bind with paddle float16 in tensor_py.h.