Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regularization #14

Closed
benanne opened this issue Sep 12, 2014 · 7 comments
Closed

Regularization #14

benanne opened this issue Sep 12, 2014 · 7 comments
Milestone

Comments

@benanne
Copy link
Member

benanne commented Sep 12, 2014

We should implement some commonly used regularizers (L1, L2, sparsity penalties on the activations as in sparse autoencoders, ...).

How should we do this? The nntools.regularization module I included in the initial commit was an afterthought and should be treated as more of a placeholder.

In #11 @f0k already mentioned that it's probably a good idea to make the regularization module operate on Theano expressions, not Layer instances, so that it can be used in isolation.

Any ideas? We should also take into account that some regularizers operate on model parameters (e.g. L1, L2) and others operate on activations (autoencoder sparsity penalty) and are data-dependent.

@f0k
Copy link
Member

f0k commented Sep 12, 2014

I think the regularizers themselves should just accept expressions and return expressions. This way the L1 regularization can be applied both to model parameters and to activations, for example. We might add convenience methods such as:

def regularize_weights(layer, regularizer):
    return regularizer(layer.get_non_bias_params())

def regularize_all_weights(layer, regularizer):
    return regularizer(nntools.layers.get_all_non_bias_params(layer))

def regularize_output(layer, regularizer, inputs=None):
    return regularizer(layer.get_output(inputs))

So one can write: cost = something + alpha * regularize_all_weights(outlayer, L2) + beta * regularize_output(hiddenlayer, L1).

Again, there should be a way to use networks with multiple output layers, I guess... we could swap the argument order so we can define regularize_all_weights(regularizer, *layers), or we could have all of the *_all_* methods accept lists in place of a single layer, that would give us more flexibility.

/edit: Some regularizers will take additional parameters (such as a sparsity target), so the convenience functions probably should be:

def regularize_weights(layer, regularizer, *args, **kwargs):
    return regularizer(layer.get_non_bias_params(), *args, **kwargs)

@craffel
Copy link
Member

craffel commented Jan 24, 2015

I think the regularizers themselves should just accept expressions and return expressions.

Agreed here. I also suggested in #110 that rather than layers having just get_params and get_bias_params, we also have functions which group functions conveniently like get_weight_params and the get_init_params for recurrent layers that @skaae added. I think we should avoid get_all_non_bias_params.

@ebattenberg
Copy link
Contributor

I think the regularizers themselves should just accept expressions and return expressions.

What types of expressions? Expressions that augment (are added to) the original cost function? These would then be differentiable. I ask because in #84 I'm toying with the idea of hard weight norm constraints which can't be written as part of the cost function. They can really only be written as updates on values that violate the hard constraint.

@benanne
Copy link
Member Author

benanne commented Feb 8, 2015

There are many different ways to regularize things, that hook into various parts of the code - I think what's meant is that in general we want any tools we write to help with regularization to operate on Theano data types only, so that they are maximally reusable (and so that they can be used without the rest of the library).

The current regularization code takes a layer instance and then calls get_all_paramson it, so it is limited in its applicability and has a dependency on the Layer class. Such dependencies should be avoided if there is no significant benefit to them.

@cancan101
Copy link

See also #86.

@benanne benanne added this to the First release milestone May 19, 2015
@benanne
Copy link
Member Author

benanne commented Jun 2, 2015

Now that our test coverage and documentation are in a pretty good state, fixing up the regularization module is one of the last few things we need to do before we're ready for release. Some good ideas for a new API have been discussed in this thread. This discussion predates the get_params() API change and the introduction of tags though. I guess the main takeaway is that functions in this module should operate on Theano expressions, not stacks of layers - and possibly provide one or more convenience functions that take a layer or list of layers as input. Although we'll have to see if that's actually useful at all (i.e. if it makes things significantly shorter and/or more readable).

We just need to turn this idea into a PR now :) Any takers?

@ebenolson
Copy link
Member

I made a start at it, see #285

@f0k f0k mentioned this issue Jun 3, 2015
@benanne benanne closed this as completed Jun 21, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants