Add float16 support to batch norm operator #9176

kexinzhao · 2018-03-19T03:39:11Z

Only added and verified float16 kernel for the inference mode of cudnn batch norm kernel,
which is needed to run vgg/resnet inference.

OpTest.np_dtype_to_fluid_dtype is to change the dtype of a numpy array from float16 to uint16 so that it can correctly bind with paddle float16 in tensor_py.h.

jacquesqiao · 2018-03-21T01:29:28Z

paddle/fluid/operators/batch_norm_op.cu.cc

@@ -118,15 +122,16 @@ class BatchNormKernel<platform::CUDADeviceContext, T>

    // alloc memory
    y->mutable_data<T>(ctx.GetPlace());


why here does not use BatchNormParamType?

Per the cudnn API, when we are using fp16 mode to run Batch norm cudnn, the data type of input x and output y will be fp16, but the other input parameters including mean and variance will still be float32 type.

So here T == float16, and BatchNormParamType<T> == float as defined in cudnn_helper.h

jacquesqiao

LGTM!

kexinzhao added 8 commits March 16, 2018 16:59

initial commit

39c676e

add python batch norm inference test

0a95a44

add more tests

151cfff

fix test

5e36757

update test

3233b2b

fix batch norm fp16 param type

e870947

fix scaling param type

ffa22a5

update

446d54f

kexinzhao added the 预测原名Inference，包含Capi预测问题等 label Mar 19, 2018

decrease atol

6ec0f91

kexinzhao requested a review from jacquesqiao March 19, 2018 04:39

jacquesqiao reviewed Mar 21, 2018

View reviewed changes

jacquesqiao approved these changes Mar 21, 2018

View reviewed changes

kexinzhao merged commit ed2bc19 into PaddlePaddle:develop Mar 21, 2018

kexinzhao deleted the batch_norm_fp16 branch March 21, 2018 05:59

Xreki added this to Performance Tuning (DONE) in Inference Framework Apr 3, 2018

Xreki moved this from Performance Tuning (DONE) to Support FP16 in Inference Framework Apr 3, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add float16 support to batch norm operator #9176

Add float16 support to batch norm operator #9176

kexinzhao commented Mar 19, 2018 •

edited

Loading

jacquesqiao Mar 21, 2018

kexinzhao Mar 21, 2018

jacquesqiao left a comment

		@@ -118,15 +122,16 @@ class BatchNormKernel<platform::CUDADeviceContext, T>

		// alloc memory
		y->mutable_data<T>(ctx.GetPlace());

Add float16 support to batch norm operator #9176

Add float16 support to batch norm operator #9176

Conversation

kexinzhao commented Mar 19, 2018 • edited Loading

jacquesqiao Mar 21, 2018

Choose a reason for hiding this comment

kexinzhao Mar 21, 2018

Choose a reason for hiding this comment

jacquesqiao left a comment

Choose a reason for hiding this comment

kexinzhao commented Mar 19, 2018 •

edited

Loading