Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Params for updates vs. params for regularization #110

Closed
craffel opened this issue Jan 23, 2015 · 2 comments
Closed

Params for updates vs. params for regularization #110

craffel opened this issue Jan 23, 2015 · 2 comments

Comments

@craffel
Copy link
Member

craffel commented Jan 23, 2015

I just realized that if someone were to use regularization on the recurrent nets, they might have an unintended side effect - namely, get_params (optionally) returns the initial state vectors for the recurrent layer. This is because sometimes people want to optimize the initial state vectors, so we want to allow them to be returned so they can be passed as updates to the optimization function (SGD etc). The current L2 regularizer optionally calls either get_all_params or get_all_non_bias_params, both of which would potentially return the initial state param. But, I don't think anyone would want to regularize them. This may be true in other layers too, that get_params returns parameters which don't make sense to regularize.

I think that instead of making the regularizers operate on a layer and then call get_all_params or get_all_non_bias_params, we force the user to supply the params they want to regularize. Then, each layer should have separate methods for get_weight_params, get_bias_params, and when appropriate things like get_init_params (for recurrence). get_params would just combine the output of all of these individual functions. This would provide the convenience of getting all of the params for updates, but also would make it so that the user can control what they want to regularize. Any thoughts?

@benanne
Copy link
Member

benanne commented Jan 23, 2015

Agreed. This is also more in line with our goal of transparency: "Functions and methods should return Theano expressions and standard Python / numpy data types where possible." This will reduce cognitive overhead (what methods take Theano expressions? What methods take layers?) and increase interoperability with other libraries and custom Theano code.

As I've mentioned before in #86, what's currently there should probably be thrown away. This is already being discussed in #14 so maybe we should move there.

@craffel
Copy link
Member Author

craffel commented Jan 24, 2015

This is already being discussed in #14 so maybe we should move there.

OK, closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants