This is repository for Caffe development by @SanghyukChun.
Now currently working on implementing batch normalization based on PR#1965.
- common layer header and proto file
- bn layer cpp and cu
- BN modles based on AlexNet and GoogleNet in model zoo
In PR#1965, there are 2 unresolved problems:
- In the PR, shuffling is implemented for only encoded data.
- Mean/variance for inference is not implemented. Therefore, we cannot classify test data with batch size 1.
In this dev repository, I resolve the problems as followings:
- Instead of using shuffling code from PR#1965, I use shuffle param in ImageDataLayer. It is still not implemented for every layers but I decide to use ImageDataLayer because it is uniform shuffling and not that slow then DataLayer.
- I apply moving average strategy duriung training phase and use it for inference.
Some approaches I am now considering:
- Shuffling via random skip as mentioned in the comment in PR#1965. Then we can use every type of data layer for shuffling. However, my experiments say shuffling is not critical issue for BN network
- Implement completely independent batch normalization inference module as like lim6060's. However, my my experiments say inference rule is not critical issue for BN network
Caffe is released under the BSD 2-Clause license. The BVLC reference models are released for unrestricted use.
Please cite Caffe in your publications if it helps your research:
@article{jia2014caffe,
Author = {Jia, Yangqing and Shelhamer, Evan and Donahue, Jeff and Karayev, Sergey and Long, Jonathan and Girshick, Ross and Guadarrama, Sergio and Darrell, Trevor},
Journal = {arXiv preprint arXiv:1408.5093},
Title = {Caffe: Convolutional Architecture for Fast Feature Embedding},
Year = {2014}
}