-
Notifications
You must be signed in to change notification settings - Fork 352
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Batch normalization #513
Conversation
>>> fprop = function(cg.inputs, cg.outputs[0]) | ||
>>> bn_fprop = function(cg_bn.inputs, cg_bn.outputs[0]) | ||
>>> linear.initialize() | ||
>>> print fprop(numpy.ones((3, 2), dtype=theano.config.floatX)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Docstrings are Python 3, so use print()
e476508
to
0b73ae6
Compare
from theano.gof import graph | ||
from theano.sandbox.rng_mrg import MRG_RandomStreams | ||
from theano.scan_module.scan_op import Scan | ||
from toolz import unique | ||
|
||
from blocks import config | ||
from blocks.roles import add_role, has_roles, AUXILIARY, PARAMETER, DROPOUT | ||
from blocks.roles import add_role, has_roles, AUXILIARY, PARAMETER, DROPOUT, BN |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BN
is a little cryptic...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about BATCH_NORMALIZED
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good.
One of the things about which I'd like to know your opinion: is it alright to expect that the user provides the I chose to go that route because it offers more flexibility: the user can choose how to initialize these parameters, if they're to be learned, etc. Plus I could add some |
Leaving this open in a tab so I have a look at it first thing in the morning. |
4e8bab6
to
2b02163
Compare
Apparently my tab strategy did not pan out. Looking now. |
461d845
to
1bb236a
Compare
epsilon = numpy.cast[theano.config.floatX](epsilon) | ||
|
||
# Broadcast gamma and beta properly | ||
axis = axis if isinstance(axis, (list, tuple)) else (axis,) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since it is a list of axis, I would name it axes
throughout the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1. Also I think you can just use blocks.utils.pack
here.
I'm getting a syntax check error because I'm redefining the I really want to keep that name for the argument, as it's consistent with |
I guess that a (very) loose interpretation of PEP8 would suggest you use |
Close enough, I'll do that. |
4a796d7
to
2f0b766
Compare
I think I could also write a short tutorial in the documentation to show how to use batch normalization with a deep network on MNIST. I have a working example where an MLP with 4 sigmoid layers followed by a softmax layer fails to learn by itself but has no problem training when using batch normalization. |
Hmm, VariableClipping uses axes, I did that since it accepts multiple axes.
Should I change it for consistency?
|
I think we should stay consistent with theano and numpy when appropriate. I'd change |
Does Theano accept multiple axes for its axis arguments though? I'm pretty
sure numpy doesn't.
|
It seems like it does: numpy.mean(numpy.random.uniform(size=(2, 2, 2)), axis=(1, 2)) although it raises an error if The analogous theano construct works on both lists and tuples. |
1795d5e
to
7fb1a80
Compare
I am confused: it seems like @cooijmanstim's pull request to @vdumoulin branch was merged, but somehow the code here is not updated. @vdumoulin , do you know why? |
Closed in favour of #851. |
Fixes #509.