You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I just realized that if someone were to use regularization on the recurrent nets, they might have an unintended side effect - namely, get_params (optionally) returns the initial state vectors for the recurrent layer. This is because sometimes people want to optimize the initial state vectors, so we want to allow them to be returned so they can be passed as updates to the optimization function (SGD etc). The current L2 regularizer optionally calls either get_all_params or get_all_non_bias_params, both of which would potentially return the initial state param. But, I don't think anyone would want to regularize them. This may be true in other layers too, that get_params returns parameters which don't make sense to regularize.
I think that instead of making the regularizers operate on a layer and then call get_all_params or get_all_non_bias_params, we force the user to supply the params they want to regularize. Then, each layer should have separate methods for get_weight_params, get_bias_params, and when appropriate things like get_init_params (for recurrence). get_params would just combine the output of all of these individual functions. This would provide the convenience of getting all of the params for updates, but also would make it so that the user can control what they want to regularize. Any thoughts?
The text was updated successfully, but these errors were encountered:
Agreed. This is also more in line with our goal of transparency: "Functions and methods should return Theano expressions and standard Python / numpy data types where possible." This will reduce cognitive overhead (what methods take Theano expressions? What methods take layers?) and increase interoperability with other libraries and custom Theano code.
As I've mentioned before in #86, what's currently there should probably be thrown away. This is already being discussed in #14 so maybe we should move there.
I just realized that if someone were to use regularization on the recurrent nets, they might have an unintended side effect - namely,
get_params
(optionally) returns the initial state vectors for the recurrent layer. This is because sometimes people want to optimize the initial state vectors, so we want to allow them to be returned so they can be passed as updates to the optimization function (SGD etc). The current L2 regularizer optionally calls eitherget_all_params
orget_all_non_bias_params
, both of which would potentially return the initial state param. But, I don't think anyone would want to regularize them. This may be true in other layers too, thatget_params
returns parameters which don't make sense to regularize.I think that instead of making the regularizers operate on a layer and then call
get_all_params
orget_all_non_bias_params
, we force the user to supply the params they want to regularize. Then, each layer should have separate methods forget_weight_params
,get_bias_params
, and when appropriate things likeget_init_params
(for recurrence).get_params
would just combine the output of all of these individual functions. This would provide the convenience of getting all of the params for updates, but also would make it so that the user can control what they want to regularize. Any thoughts?The text was updated successfully, but these errors were encountered: