Implement normalization methods(BatchNorm/LayerNorm/BatchRenorm) as functions in a common header file #5685

zhouxiao-coder · 2017-11-16T03:23:33Z

Current Status

In the new Paddle core, we currently have a batch_norm_op which essentially is an implementation of fused spatial batch normalization method. Like most other operators, most computations are directly included in BatchNorm kernels.

Suggestion

I suggest we abstract away some normalization calculations and implement them as functions/functors in a common header file like normalization.h to better reuse existing code.

Reasons

Normalization methods are evolving rapidly, new variants of BatchNorm come out from time to time, e.g. layer normalization and batch renormalization. They often share similar structures.
Normalization layers are often used in deep architectures repeatedly, so it should be efficient. We may need to write a lot of fused kernels to support this.
RNN units also benefit from using some normalization methods, we should support it in c++ code.

The text was updated successfully, but these errors were encountered:

lcy-seso · 2017-11-16T03:26:59Z

Layer normalization is just transposing the input of batch norm (moving average of mean and std is no longer needed), is it possible that just wrap the batch norm to implement the layer normalization?

lcy-seso · 2017-11-16T03:37:18Z

Another potentially usefully normalization method I am interested in is weight normalization. As I understand, most of the user-defined normalization can be implemented by the combination of some primary arithmetical operators.

Implement a certain normalization method in an independent operator is helpful to improve the computation and memory efficiency (most time many intermediate computation results can be simplified by manually check the formulas). But I did not check all the potential normalization methods.

zhouxiao-coder · 2017-11-16T04:17:22Z

@lcy-seso

is it possible that just wrap the batch norm to implement the layer normalization?

Yes, we could, and that's what TensorFlow does in their repo. They wrap it around a non-fused BatchNorm implementation. There are some subtleties about LayerNorm: the sizes of estimated mean and variance are batch size rather than channel size, but they bypass it using broadcasting property.

However, I think it's not optimal. As you have already pointed out, non-fused BatchNorm is significantly slower, it makes a big difference if we want to use it in deep models.

Since one big advantage of LayerNorm is it is directly usable in RNN units, it also makes sense to reuse code between standalone LayerNorm layer and "LSTMUnitsWithLayerNorm".

zhouxiao-coder · 2017-11-16T04:21:03Z

Another potentially usefully normalization method I am interested in is weight normalization.

I also checked weight normalization. It seems simple enough to be implemented in basic operators efficiently, so I didn't mention it in the title. If it turns out to be necessary, we can also write it in the common header.

zhouxiao-coder self-assigned this Nov 16, 2017

zhouxiao-coder mentioned this issue Nov 16, 2017

[WIP] Refactor Normalization methods && implement LayerNorm #5686

Closed

This was referenced Nov 22, 2017

Batch Norm Op should support any input dimension, not only 4 #5811

Closed

Enhance batch_norm_op: support 2d and 5d data and unbiased variance estimation #5845

Closed

shanyi15 closed this as completed Aug 15, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement normalization methods(BatchNorm/LayerNorm/BatchRenorm) as functions in a common header file #5685

Implement normalization methods(BatchNorm/LayerNorm/BatchRenorm) as functions in a common header file #5685

zhouxiao-coder commented Nov 16, 2017 •

edited by lcy-seso

Loading

lcy-seso commented Nov 16, 2017 •

edited

Loading

lcy-seso commented Nov 16, 2017

zhouxiao-coder commented Nov 16, 2017 •

edited

Loading

zhouxiao-coder commented Nov 16, 2017

Implement normalization methods(BatchNorm/LayerNorm/BatchRenorm) as functions in a common header file #5685

Implement normalization methods(BatchNorm/LayerNorm/BatchRenorm) as functions in a common header file #5685

Comments

zhouxiao-coder commented Nov 16, 2017 • edited by lcy-seso Loading

Current Status

Suggestion

Reasons

lcy-seso commented Nov 16, 2017 • edited Loading

lcy-seso commented Nov 16, 2017

zhouxiao-coder commented Nov 16, 2017 • edited Loading

zhouxiao-coder commented Nov 16, 2017

zhouxiao-coder commented Nov 16, 2017 •

edited by lcy-seso

Loading

lcy-seso commented Nov 16, 2017 •

edited

Loading

zhouxiao-coder commented Nov 16, 2017 •

edited

Loading