Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

batch normalization layer #2

Open
wenhe-jia opened this issue Jul 10, 2017 · 9 comments
Open

batch normalization layer #2

wenhe-jia opened this issue Jul 10, 2017 · 9 comments

Comments

@wenhe-jia
Copy link

wenhe-jia commented Jul 10, 2017

I noticed that in model json files, there are not "moving_mean" and "moving_variance" in BatchNorm layers. Can you explain why? Thx.

@wenhe-jia wenhe-jia changed the title batch normalization batch normalization layer Jul 10, 2017
@cypw
Copy link
Owner

cypw commented Jul 10, 2017

@LeonJWH

MXNet does not store information about "moving_mean" and "moving_variance" in the json file.
( see: http://data.dmlc.ml/mxnet/models/imagenet/resnet )

Please ask this question at MXNet repo for more information. Thanks!

@wenhe-jia
Copy link
Author

Have you merged "moving_mean" and "moving_variance" params into "gamma" and "beta"?

@cypw
Copy link
Owner

cypw commented Jul 10, 2017

@LeonJWH

No, we didn't merge them into any other params. ( see: forward code )

You can get these raw values by _, _, aux_params = mx.model.load_checkpoint(prefix, epoch) ( see: score.py ), where aux_params is a dict that contains the value of "moving_mean" and "moving_var" for each BN layer.

@wenhe-jia
Copy link
Author

ok, fine, i'll check it out soon, thx

@terrychenism
Copy link

@cypw how did you do batch normalization refine after training? Do you have plan to release this code?

@cypw
Copy link
Owner

cypw commented Jul 11, 2017

@terrychenism

We refined the batch normalization layers as suggested by [1].

In [1], the authors refine the BN layers by computing the average (not moving average) on a sufficiently large training batch after the training procedure ( see: ResNet ). It does require some coding.

To make things easier, in our implementation, we freeze all layers except the BN layers and refine the params in BN layers for one epoch. We use the refined moving values as the final result. I am not sure which strategy is better, but our implementation does not require coding.

--------
[1] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition.

@terrychenism
Copy link

Great idea! top1 accuracy is improved after this procedure? about 1%?

@cypw
Copy link
Owner

cypw commented Jul 11, 2017

@terrychenism

=_=!! Nope, It only improved the Top-5 by about 0.03%. Besides, it has some negative effect on Top-1 accuracy. ( Actually, the original Top-1 accuracy is a little bit higher than the released accuracy. )

@terrychenism
Copy link

Ok, thanks! I will try this step on resnext.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants