Regularization #14

benanne · 2014-09-12T11:43:58Z

We should implement some commonly used regularizers (L1, L2, sparsity penalties on the activations as in sparse autoencoders, ...).

How should we do this? The nntools.regularization module I included in the initial commit was an afterthought and should be treated as more of a placeholder.

In #11 @f0k already mentioned that it's probably a good idea to make the regularization module operate on Theano expressions, not Layer instances, so that it can be used in isolation.

Any ideas? We should also take into account that some regularizers operate on model parameters (e.g. L1, L2) and others operate on activations (autoencoder sparsity penalty) and are data-dependent.

The text was updated successfully, but these errors were encountered:

f0k · 2014-09-12T11:57:28Z

I think the regularizers themselves should just accept expressions and return expressions. This way the L1 regularization can be applied both to model parameters and to activations, for example. We might add convenience methods such as:

def regularize_weights(layer, regularizer):
    return regularizer(layer.get_non_bias_params())

def regularize_all_weights(layer, regularizer):
    return regularizer(nntools.layers.get_all_non_bias_params(layer))

def regularize_output(layer, regularizer, inputs=None):
    return regularizer(layer.get_output(inputs))

So one can write: cost = something + alpha * regularize_all_weights(outlayer, L2) + beta * regularize_output(hiddenlayer, L1).

Again, there should be a way to use networks with multiple output layers, I guess... we could swap the argument order so we can define regularize_all_weights(regularizer, *layers), or we could have all of the *_all_* methods accept lists in place of a single layer, that would give us more flexibility.

/edit: Some regularizers will take additional parameters (such as a sparsity target), so the convenience functions probably should be:

def regularize_weights(layer, regularizer, *args, **kwargs):
    return regularizer(layer.get_non_bias_params(), *args, **kwargs)

craffel · 2015-01-24T19:51:15Z

I think the regularizers themselves should just accept expressions and return expressions.

Agreed here. I also suggested in #110 that rather than layers having just get_params and get_bias_params, we also have functions which group functions conveniently like get_weight_params and the get_init_params for recurrent layers that @skaae added. I think we should avoid get_all_non_bias_params.

ebattenberg · 2015-02-08T21:52:09Z

I think the regularizers themselves should just accept expressions and return expressions.

What types of expressions? Expressions that augment (are added to) the original cost function? These would then be differentiable. I ask because in #84 I'm toying with the idea of hard weight norm constraints which can't be written as part of the cost function. They can really only be written as updates on values that violate the hard constraint.

benanne · 2015-02-08T21:55:23Z

There are many different ways to regularize things, that hook into various parts of the code - I think what's meant is that in general we want any tools we write to help with regularization to operate on Theano data types only, so that they are maximally reusable (and so that they can be used without the rest of the library).

The current regularization code takes a layer instance and then calls get_all_paramson it, so it is limited in its applicability and has a dependency on the Layer class. Such dependencies should be avoided if there is no significant benefit to them.

cancan101 · 2015-02-08T23:40:47Z

See also #86.

benanne · 2015-06-02T08:24:05Z

Now that our test coverage and documentation are in a pretty good state, fixing up the regularization module is one of the last few things we need to do before we're ready for release. Some good ideas for a new API have been discussed in this thread. This discussion predates the get_params() API change and the introduction of tags though. I guess the main takeaway is that functions in this module should operate on Theano expressions, not stacks of layers - and possibly provide one or more convenience functions that take a layer or list of layers as input. Although we'll have to see if that's actually useful at all (i.e. if it makes things significantly shorter and/or more readable).

We just need to turn this idea into a PR now :) Any takers?

ebenolson · 2015-06-03T13:43:36Z

I made a start at it, see #285

benanne mentioned this issue Jan 23, 2015

Params for updates vs. params for regularization #110

Closed

f0k mentioned this issue Feb 8, 2015

Constrained Optimization as a Regularizer #84

Open

dnouri mentioned this issue Feb 8, 2015

Weight Decay dnouri/nolearn#14

Closed

benanne added this to the First release milestone May 19, 2015

benanne mentioned this issue May 20, 2015

Improving test coverage #112

Closed

f0k mentioned this issue Jun 3, 2015

Regularization #285

Merged

benanne closed this as completed Jun 21, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Regularization #14

Regularization #14

benanne commented Sep 12, 2014

f0k commented Sep 12, 2014

craffel commented Jan 24, 2015

ebattenberg commented Feb 8, 2015

benanne commented Feb 8, 2015

cancan101 commented Feb 8, 2015

benanne commented Jun 2, 2015

ebenolson commented Jun 3, 2015

Regularization #14

Regularization #14

Comments

benanne commented Sep 12, 2014

f0k commented Sep 12, 2014

craffel commented Jan 24, 2015

ebattenberg commented Feb 8, 2015

benanne commented Feb 8, 2015

cancan101 commented Feb 8, 2015

benanne commented Jun 2, 2015

ebenolson commented Jun 3, 2015