Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Batch normalization #513

Closed
wants to merge 23 commits into from

Conversation

vdumoulin
Copy link
Contributor

Fixes #509.

@vdumoulin vdumoulin mentioned this pull request Mar 19, 2015
>>> fprop = function(cg.inputs, cg.outputs[0])
>>> bn_fprop = function(cg_bn.inputs, cg_bn.outputs[0])
>>> linear.initialize()
>>> print fprop(numpy.ones((3, 2), dtype=theano.config.floatX))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docstrings are Python 3, so use print()

@vdumoulin vdumoulin force-pushed the batch_normalization branch 3 times, most recently from e476508 to 0b73ae6 Compare March 19, 2015 16:53
from theano.gof import graph
from theano.sandbox.rng_mrg import MRG_RandomStreams
from theano.scan_module.scan_op import Scan
from toolz import unique

from blocks import config
from blocks.roles import add_role, has_roles, AUXILIARY, PARAMETER, DROPOUT
from blocks.roles import add_role, has_roles, AUXILIARY, PARAMETER, DROPOUT, BN
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BN is a little cryptic...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about BATCH_NORMALIZED?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good.

@vdumoulin
Copy link
Contributor Author

One of the things about which I'd like to know your opinion: is it alright to expect that the user provides the gamma and beta parameters?

I chose to go that route because it offers more flexibility: the user can choose how to initialize these parameters, if they're to be learned, etc.

Plus I could add some normalize_batch flag (defaulting to True) which, when set to False, just multiplies by gamma and adds beta. That would allow the user to re-use apply_batch_normalization at test time with the population statistics rather than the batch statistics.

@dwf
Copy link
Contributor

dwf commented Mar 23, 2015

Leaving this open in a tab so I have a look at it first thing in the morning.

@dwf
Copy link
Contributor

dwf commented Mar 23, 2015

Apparently my tab strategy did not pan out. Looking now.

@dwf dwf force-pushed the master branch 2 times, most recently from 461d845 to 1bb236a Compare March 23, 2015 23:27
epsilon = numpy.cast[theano.config.floatX](epsilon)

# Broadcast gamma and beta properly
axis = axis if isinstance(axis, (list, tuple)) else (axis,)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since it is a list of axis, I would name it axes throughout the code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. Also I think you can just use blocks.utils.pack here.

@vdumoulin
Copy link
Contributor Author

@dwf @fvisin Your comments have been addressed, thanks!

@vdumoulin
Copy link
Contributor Author

I'm getting a syntax check error because I'm redefining the axis argument within list comprehension.

I really want to keep that name for the argument, as it's consistent with theano and numpy, so I'll change axis to ax within the code.

@bartvm
Copy link
Member

bartvm commented Mar 24, 2015

I guess that a (very) loose interpretation of PEP8 would suggest you use axis_ instead of ax.

@vdumoulin
Copy link
Contributor Author

Close enough, I'll do that.

@vdumoulin
Copy link
Contributor Author

I think I could also write a short tutorial in the documentation to show how to use batch normalization with a deep network on MNIST.

I have a working example where an MLP with 4 sigmoid layers followed by a softmax layer fails to learn by itself but has no problem training when using batch normalization.

@dwf
Copy link
Contributor

dwf commented Mar 24, 2015 via email

@vdumoulin
Copy link
Contributor Author

I think we should stay consistent with theano and numpy when appropriate. I'd change axes for axis in VariableClipping.

@dwf
Copy link
Contributor

dwf commented Mar 24, 2015 via email

@vdumoulin
Copy link
Contributor Author

It seems like it does:

numpy.mean(numpy.random.uniform(size=(2, 2, 2)), axis=(1, 2))

although it raises an error if axis is a list instead of a tuple.

The analogous theano construct works on both lists and tuples.

@rizar
Copy link
Contributor

rizar commented Sep 30, 2015

I am confused: it seems like @cooijmanstim's pull request to @vdumoulin branch was merged, but somehow the code here is not updated. @vdumoulin , do you know why?

@rizar
Copy link
Contributor

rizar commented Oct 26, 2015

Closed in favour of #851.

@rizar rizar closed this Oct 26, 2015
@vdumoulin vdumoulin deleted the batch_normalization branch January 23, 2016 15:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Batch normalization
8 participants