-
Notifications
You must be signed in to change notification settings - Fork 951
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
What's the most effective and elegant way to set lr_mult and decay_mult for each trainable layer? #669
Comments
I noticed that there is a similar question on lasagne-users One answer is
I don’t know how to set tags when build the model. Can someone give me an examples? Thanks. |
Which is not by coincidence, since support questions should be posted on the mailing list, not on the issue tracker. The issue tracker is to be reserved for bug reports and feature discussions (i.e., things that require a change in Lasagne's codebase).
layer = InputLayer((None, 10))
layer = DenseLayer(layer, 100)
layer.W.tag.grad_scale = 10 Then the loop would become: params = lasagne.layers.get_all_params(l_out)
grads = theano.grad(loss, params)
for idx, param in enumerate(params):
grad_scale = getattr(param.tag, 'grad_scale', 1)
if grad_scale != 1:
grads[idx] *= grad_scale
updates = lasagne.updates.nesterov_momentum(grads, params, ...) |
Thank you very much, f0k, You always help me in a timely manner. |
In this way, we can only set the lr_mult. How about decay_mult? |
For different L2 decay per layer, you can use regularize_layer_params_weighted. The first argument is a dictionary mapping layers to regularization strength (see the example at the top of the linked page). The result is a loss you would add to your loss function. |
@f0k Oh, yes! That's it. Thank you very much! |
There are a lot of useful CNN models defined in Caffe’s prototxt files. When one want to define the same model using Lasagne, one must consider the
lr_mult
anddecay_mult
hyper-parameters in Caffe model.For example, in Caffe, a
Convolutional layer
is defined as follows.If the learning rate is
0.1
, and weight decay is0.0005
.The first
is for
W
, which mean that the learning rate ofW
is10 * 0.1=1
, the weight decay ofW
is1* 0.0005=0.0005
.The following
is the same as above for the bias
b
.So, my question is how to effective define the model (maybe the loss) in Lasagne?
The text was updated successfully, but these errors were encountered: