WIP: Batch normalization #513

vdumoulin · 2015-03-19T14:37:12Z

Fixes #509.

bartvm · 2015-03-19T14:41:27Z

blocks/graph.py

+    >>> fprop = function(cg.inputs, cg.outputs[0])
+    >>> bn_fprop = function(cg_bn.inputs, cg_bn.outputs[0])
+    >>> linear.initialize()
+    >>> print fprop(numpy.ones((3, 2), dtype=theano.config.floatX))


Docstrings are Python 3, so use print()

dwf · 2015-03-19T19:44:45Z

blocks/graph.py

 from theano.gof import graph
 from theano.sandbox.rng_mrg import MRG_RandomStreams
 from theano.scan_module.scan_op import Scan
 from toolz import unique

 from blocks import config
-from blocks.roles import add_role, has_roles, AUXILIARY, PARAMETER, DROPOUT
+from blocks.roles import add_role, has_roles, AUXILIARY, PARAMETER, DROPOUT, BN


BN is a little cryptic...

How about BATCH_NORMALIZED?

Sounds good.

vdumoulin · 2015-03-19T20:05:14Z

One of the things about which I'd like to know your opinion: is it alright to expect that the user provides the gamma and beta parameters?

I chose to go that route because it offers more flexibility: the user can choose how to initialize these parameters, if they're to be learned, etc.

Plus I could add some normalize_batch flag (defaulting to True) which, when set to False, just multiplies by gamma and adds beta. That would allow the user to re-use apply_batch_normalization at test time with the population statistics rather than the batch statistics.

dwf · 2015-03-23T05:38:29Z

Leaving this open in a tab so I have a look at it first thing in the morning.

dwf · 2015-03-23T19:37:25Z

Apparently my tab strategy did not pan out. Looking now.

fvisin · 2015-03-24T03:30:23Z

blocks/graph.py

+    epsilon = numpy.cast[theano.config.floatX](epsilon)
+
+    # Broadcast gamma and beta properly
+    axis = axis if isinstance(axis, (list, tuple)) else (axis,)


Since it is a list of axis, I would name it axes throughout the code.

+1. Also I think you can just use blocks.utils.pack here.

vdumoulin · 2015-03-24T13:10:58Z

@dwf @fvisin Your comments have been addressed, thanks!

vdumoulin · 2015-03-24T14:15:02Z

I'm getting a syntax check error because I'm redefining the axis argument within list comprehension.

I really want to keep that name for the argument, as it's consistent with theano and numpy, so I'll change axis to ax within the code.

bartvm · 2015-03-24T14:18:12Z

I guess that a (very) loose interpretation of PEP8 would suggest you use axis_ instead of ax.

vdumoulin · 2015-03-24T14:19:34Z

Close enough, I'll do that.

vdumoulin · 2015-03-24T16:31:30Z

I think I could also write a short tutorial in the documentation to show how to use batch normalization with a deep network on MNIST.

I have a working example where an MLP with 4 sigmoid layers followed by a softmax layer fails to learn by itself but has no problem training when using batch normalization.

dwf · 2015-03-24T16:46:55Z

Hmm, VariableClipping uses axes, I did that since it accepts multiple axes. Should I change it for consistency?

vdumoulin · 2015-03-24T16:53:09Z

I think we should stay consistent with theano and numpy when appropriate. I'd change axes for axis in VariableClipping.

dwf · 2015-03-24T16:54:40Z

Does Theano accept multiple axes for its axis arguments though? I'm pretty sure numpy doesn't.

vdumoulin · 2015-03-24T17:05:45Z

It seems like it does:

numpy.mean(numpy.random.uniform(size=(2, 2, 2)), axis=(1, 2))

although it raises an error if axis is a list instead of a tuple.

The analogous theano construct works on both lists and tuples.

rizar · 2015-09-30T14:49:42Z

I am confused: it seems like @cooijmanstim's pull request to @vdumoulin branch was merged, but somehow the code here is not updated. @vdumoulin , do you know why?

rizar · 2015-10-26T20:09:55Z

Closed in favour of #851.

vdumoulin mentioned this pull request Mar 19, 2015

Batch normalization #509

Closed

bartvm reviewed Mar 19, 2015
View reviewed changes

vdumoulin force-pushed the batch_normalization branch 3 times, most recently from e476508 to 0b73ae6 Compare March 19, 2015 16:53

dwf reviewed Mar 19, 2015
View reviewed changes

vdumoulin force-pushed the batch_normalization branch from 4e8bab6 to 2b02163 Compare March 23, 2015 19:21

dwf force-pushed the master branch 2 times, most recently from 461d845 to 1bb236a Compare March 23, 2015 23:27

fvisin reviewed Mar 24, 2015
View reviewed changes

vdumoulin force-pushed the batch_normalization branch from 4a796d7 to 2f0b766 Compare March 24, 2015 16:27

dwf mentioned this pull request Mar 24, 2015

Rename axes->axis for consistency. #537

Merged

jbornschein mentioned this pull request Mar 25, 2015

Exploding cell value outputs of the encoder? jbornschein/draw#3

Closed

vdumoulin added 23 commits September 23, 2015 10:56

Add BN draft code

ad1daaf

Add documentation

a6abb7f

Fix failing doctests

e5a64de

Add BatchNormalizationRole

47fdd9d

Add unit test for BN

e5f10f5

Use the correct dimensionality for gamma and beta

6c955de

Make batch normalization more flexible

181e404

Make Scrutinizer quit complaining

8437f96

BN -> BATCH_NORMALIZED

aa4153e

Fix failing doctest

27ae4a6

Refactor code for readability

0072f9a

axis -> axis_ to please the Scrutinizer god

4622011

Add test-time flag for batch normalization

1378771

Start writing batch normalization tutorial

a6a1bfc

Skip training loop in batch normalization tutorial doctest

2e8892c

Add warning about apply_batch_normalization behaviour

cf74b5d

Add convnet section to batch normalization tutorial

ad2aad7

Remove useless import

ade3bb4

Allow per-variable use of population statistics

f21538a

Add per-variable axis specification

7e78e6a

Fix typos

45b6ef3

Adapt BN tutorial

2b06ac8

Fix flake8 error

7fb1a80

vdumoulin force-pushed the batch_normalization branch from 1795d5e to 7fb1a80 Compare September 23, 2015 15:01

cooijmanstim mentioned this pull request Oct 1, 2015

WIP: Batch normalization #851

Closed

rizar closed this Oct 26, 2015

dwf mentioned this pull request Nov 24, 2015

Brick-based batch normalization solution #920

Closed

vdumoulin deleted the batch_normalization branch January 23, 2016 15:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: Batch normalization #513

WIP: Batch normalization #513

vdumoulin commented Mar 19, 2015

bartvm Mar 19, 2015

dwf Mar 19, 2015

dwf Mar 19, 2015

vdumoulin Mar 19, 2015

vdumoulin commented Mar 19, 2015

dwf commented Mar 23, 2015

dwf commented Mar 23, 2015

fvisin Mar 24, 2015

dwf Mar 24, 2015

vdumoulin commented Mar 24, 2015

vdumoulin commented Mar 24, 2015

bartvm commented Mar 24, 2015

vdumoulin commented Mar 24, 2015

vdumoulin commented Mar 24, 2015

dwf commented Mar 24, 2015 via email

vdumoulin commented Mar 24, 2015

dwf commented Mar 24, 2015 via email

vdumoulin commented Mar 24, 2015

rizar commented Sep 30, 2015

rizar commented Oct 26, 2015

WIP: Batch normalization #513

WIP: Batch normalization #513

Conversation

vdumoulin commented Mar 19, 2015

bartvm Mar 19, 2015

Choose a reason for hiding this comment

dwf Mar 19, 2015

Choose a reason for hiding this comment

dwf Mar 19, 2015

Choose a reason for hiding this comment

vdumoulin Mar 19, 2015

Choose a reason for hiding this comment

vdumoulin commented Mar 19, 2015

dwf commented Mar 23, 2015

dwf commented Mar 23, 2015

fvisin Mar 24, 2015

Choose a reason for hiding this comment

dwf Mar 24, 2015

Choose a reason for hiding this comment

vdumoulin commented Mar 24, 2015

vdumoulin commented Mar 24, 2015

bartvm commented Mar 24, 2015

vdumoulin commented Mar 24, 2015

vdumoulin commented Mar 24, 2015

dwf commented Mar 24, 2015 via email

vdumoulin commented Mar 24, 2015

dwf commented Mar 24, 2015 via email

vdumoulin commented Mar 24, 2015

rizar commented Sep 30, 2015

rizar commented Oct 26, 2015