-
Notifications
You must be signed in to change notification settings - Fork 951
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Regularization #285
Regularization #285
Conversation
That's a good start! However, I think Apart from weight and output penalization, is there anything else we should put into this module? We had that discussion before -- there are regularization methods in different |
Personally I prefer something verbose like How about breaking it down something like this:
not sure if the last is needed, seems like |
I did something along these lines trying to implement batchwise dropout.. unfortunately I had to get pretty invasive, and it ended up not being that much faster anyway. Probably should look into it again at some point though, in principle it should speed things up quite a bit. |
Doesn't sound too bad, but that might burn somebody else assuming that
Looks good, except that
Thinking about it, that proposal predated #182. It would actually be a bad idea to provide
Batchwise dropout could just become a different layer, couldn't it? This doesn't need a helper function of the kind I suggested. (And the helper function shouldn't make any difference in performance, just provide a different way to obtain a particular graph.) By the way, back in 2012, batchwise dropout wasn't giving me as good results on MNIST as itemwise dropout (i.e., batchwise dropout was not on par with the 2012 dropout paper). |
I'm not sure about using I also somewhat prefer names like Merging |
I think maybe we're talking about different things - I was trying to implement this paper from feb 2015. It use submatrices to avoid calculating dropped-out activations, and at least the way I was doing it required modifying the |
That's true. I mostly stuck to the term because it's called "weight regularization". So
That's true... but hidden, out = get_output([l_hidden, l_out])
loss = something(out, target) + alpha * penalize_all_params(outlayer, l2) + beta * l1(hidden) still reads clearly. But I see your point!
In #182, I came up with
Ahaa, sorry, I had heard about batchwise dropout, but had never read the paper. Now everything is clear :) |
I know there are similar things elsewhere, but to me How do you feel about |
It's okay, it just doesn't tell what exactly we're penalizing. I like Maybe somebody else has a good idea... we have What about: def apply_penalty(things, penalty, **kwargs):
return sum(penalty(thing, **kwargs) for thing in things) The use cases would be: loss = apply_penalty(get_output(layer), l1) # or simply l1(get_output(network))
loss = apply_penalty(layer.get_params(regularizable=True), l2)
loss = apply_penalty((layer.get_params(regularizable=True) for layer in layers), l2)
loss = apply_penalty(get_all_params(layer, regularizable=True), l2) Would that be too verbose? It's very clear what each line does, and we don't need to discuss any names. |
I don't have much to add to this discussion at the moment... you guys seem to have all the bases covered :) I guess it would be nice to have a shortcut for this one:
This is by far going to be the most common use case, i.e. "add regularization terms for all regularizable parameters in a network". If we can avoid 'exposing' the tagging mechanism here, that would be nice. Something like:
but maybe with a different name, I don't know :) Just something that encapsulates the |
Yes, that's exactly the trouble. Without a shortcut, it's completely clear what it does. And if we provide a shortcut, we need a name that a) clearly says whether it's going to affect a single layer or also the layers feeding into it and b) what exactly it's applying the penalty function to (namely, the regularizable parameters). If you come up with a good name, I'm all up for a shortcut! I think names are pretty important. Bad names make any code a lot harder to work with (see |
couple other quibbles with this: we never use Are there any plans for Lasagne to have a network container class like nolearn's |
we could have a shortcut |
Oh yes, just like
But that doesn't make clear it's going to regularize the full network. People might inadvertently use it as |
you're right, that's no good. ok, I think the best so far is They don't specify And even if the user doesn't realize it, is there a common case where that would cause a bad outcome? Presumably |
Not really, see #2 for reasons. For all purposes, a network in Lasagne can be represented as a list (not a set, it must be ordered) of its output layers. This gives a fixed order of input layers and a fixed order of parameters.
My issue is not so much about whether the functionality is clear from the documentation, but whether it's clear from just reading the code. Compare: loss = apply_penalty(get_all_params(layer, regularizable=True), l2) # undoubtedly clear, albeit verbose
loss = penalize_net(layer, l2) # assume it might do weight decay of the network because of 'l2' and 'net'
loss = penalize_network_params(layer, l2) # assume it excludes biases because of 'l2', but not sure
loss = penalize_network_weights(layer, l2) # assume it excludes biases because of 'l2' and 'weights' If we go for I think we had that before, but what about: loss = regularize_network_params(layer, l2) # hints both at 'params' and 'regularizable' "regularize" is a bit generic, but on the other hand, "regularizable" is a bit generic as well and "penalty" could be any kind of regularization, not just weight decay. My favourites are In any case, I'd like this to be expressed in terms of |
I think this argument is being given too much weight. It's nice to have clear and descriptive names, but it's a trade-off and there is a point where it just becomes too cumbersome. I think
I think pragmatism trumps clarity here. We need this shortcut, just because we can't come up with a descriptive name for it is not a good enough reason not to have it. |
I was referring to #285 (comment): "I got burned a while ago by assuming
I'm not opposed to the shortcut, but I think names matter a lot to the quality of a framework, and we should choose them wisely. We've covered the pros and cons quite well now, we just need to decide.
What do you think of our suggestions then? All of them allow this. |
True, but we can't guard ourselves against every possible assumption that any user might make. We shouldn't take this too far. They should also read the docs ;)
|
Sure, I didn't want to take it further,
Great. Let's go for this (plus |
|
reworked, will do docs later. |
if len(params) == 0: | ||
return 0 | ||
else: | ||
return sum(penalty(p, **kwargs) for p in params) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you remove the len
check? sum
returns zero for an empty sequence anyway, and using len
will require params
to have a length -- it's more flexible if we allow generators as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed this
Thanks, looks quite good! What about changing |
try: | ||
return sum(penalty(x, **kwargs) for x in tensor_or_tensors) | ||
except (TypeError, ValueError): | ||
return penalty(tensor_or_tensors, **kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What case is the ValueError
for? Isn't the TypeError
enough to distinguish between iterable and non-iterable tensor_or_tensors
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Theano raises ValueError
when attemping iteration of a vector.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! 👍
I think all the comments have been addressed. If there aren't any more, I'll squash this to a single commit. |
looks good to me, go for it :) The only thing worth mentioning is the use of We have a bunch of |
If you're done can you remove the [WIP] tag? bit uncomfortable merging a PR like that :) |
Done. Not sure why coveralls is still showing yellow, it seems like the check passed |
Yeah, I've seen it on other PRs as well, I think there's something wrong with it today. |
Hmm, I just noticed one thing that's missing: the rst file to make the documentation show up. It's a bit unfortunate because I was hoping to refer to the docs in the API change announcement on the mailing list. I don't know how many people are using the old regularization module though, not too many I imagine. So it probably isn't too big of an issue. I'll merge this when I have some time to write the announcement as well. |
Merging. |
No description provided.