batch normalization layer #2

wenhe-jia · 2017-07-10T04:51:09Z

I noticed that in model json files, there are not "moving_mean" and "moving_variance" in BatchNorm layers. Can you explain why? Thx.

cypw · 2017-07-10T05:51:23Z

@LeonJWH

MXNet does not store information about "moving_mean" and "moving_variance" in the json file.
( see: http://data.dmlc.ml/mxnet/models/imagenet/resnet )

Please ask this question at MXNet repo for more information. Thanks!

wenhe-jia · 2017-07-10T07:56:50Z

Have you merged "moving_mean" and "moving_variance" params into "gamma" and "beta"?

cypw · 2017-07-10T08:28:13Z

@LeonJWH

No, we didn't merge them into any other params. ( see: forward code )

You can get these raw values by _, _, aux_params = mx.model.load_checkpoint(prefix, epoch) ( see: score.py ), where aux_params is a dict that contains the value of "moving_mean" and "moving_var" for each BN layer.

wenhe-jia · 2017-07-10T09:27:03Z

ok, fine, i'll check it out soon, thx

terrychenism · 2017-07-11T01:08:02Z

@cypw how did you do batch normalization refine after training? Do you have plan to release this code?

cypw · 2017-07-11T08:51:19Z

@terrychenism

We refined the batch normalization layers as suggested by [1].

In [1], the authors refine the BN layers by computing the average (not moving average) on a sufficiently large training batch after the training procedure ( see: ResNet ). It does require some coding.

To make things easier, in our implementation, we freeze all layers except the BN layers and refine the params in BN layers for one epoch. We use the refined moving values as the final result. I am not sure which strategy is better, but our implementation does not require coding.

--------
[1] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition.

terrychenism · 2017-07-11T09:08:47Z

Great idea! top1 accuracy is improved after this procedure? about 1%?

cypw · 2017-07-11T09:20:38Z

@terrychenism

=_=!! Nope, It only improved the Top-5 by about 0.03%. Besides, it has some negative effect on Top-1 accuracy. ( Actually, the original Top-1 accuracy is a little bit higher than the released accuracy. )

terrychenism · 2017-07-11T09:30:21Z

Ok, thanks! I will try this step on resnext.

wenhe-jia changed the title ~~batch normalization~~ batch normalization layer Jul 10, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

batch normalization layer #2

batch normalization layer #2

wenhe-jia commented Jul 10, 2017 •

edited

Loading

cypw commented Jul 10, 2017

wenhe-jia commented Jul 10, 2017

cypw commented Jul 10, 2017

wenhe-jia commented Jul 10, 2017

terrychenism commented Jul 11, 2017

cypw commented Jul 11, 2017

terrychenism commented Jul 11, 2017

cypw commented Jul 11, 2017

terrychenism commented Jul 11, 2017

batch normalization layer #2

batch normalization layer #2

Comments

wenhe-jia commented Jul 10, 2017 • edited Loading

cypw commented Jul 10, 2017

wenhe-jia commented Jul 10, 2017

cypw commented Jul 10, 2017

wenhe-jia commented Jul 10, 2017

terrychenism commented Jul 11, 2017

cypw commented Jul 11, 2017

terrychenism commented Jul 11, 2017

cypw commented Jul 11, 2017

terrychenism commented Jul 11, 2017

wenhe-jia commented Jul 10, 2017 •

edited

Loading