What's the most effective and elegant way to set lr_mult and decay_mult for each trainable layer? #669

kli-casia · 2016-04-26T09:59:01Z

There are a lot of useful CNN models defined in Caffe’s prototxt files. When one want to define the same model using Lasagne, one must consider the lr_mult and decay_mult hyper-parameters in Caffe model.

For example, in Caffe, a Convolutional layer is defined as follows.

layer {
  name: "conv1"
  type: "Convolution"
  bottom: "data"
  top: "conv1"
  param {
    lr_mult: 10
    decay_mult: 1
  }
  param {
    lr_mult: 20
    decay_mult: 0
  }
  convolution_param {
    num_output: 96
    kernel_size: 11
    stride: 4
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}

If the learning rate is 0.1, and weight decay is 0.0005.

The first

param {
    lr_mult: 1
    decay_mult: 1
  }

is for W, which mean that the learning rate of W is 10 * 0.1=1, the weight decay of W is 1* 0.0005=0.0005.

The following

param {
    lr_mult: 20
    decay_mult: 0
  }

is the same as above for the bias b.

So, my question is how to effective define the model (maybe the loss) in Lasagne?

The text was updated successfully, but these errors were encountered:

kli-casia · 2016-04-26T10:22:46Z

I noticed that there is a similar question on lasagne-users
https://groups.google.com/forum/#!msg/lasagne-users/2z-6RrgiHkE/lHghzLDgCgAJ

One answer is

params = lasagne.layers.get_all_params(l_out)
grads = theano.grad(loss, params)
for idx, param in enumerate(params):
    grad_scale = ... # obtain multiplier for that parameter in some way
    if grad_scale != 1:
        grads[idx] *= grad_scale
updates = lasagne.updates.nesterov_momentum(grads, params, ...)

You can use whichever way you like for the "obtain multiplier" step -- maintain a dictionary of param -> multiplier, or set param.tag.grad_scale to some value when you build the model (every Theano expression has a tag attribute that can be used freely, e.g., layer.W.tag.grad_scale=.5).

I don’t know how to set tags when build the model. Can someone give me an examples? Thanks.

f0k · 2016-04-26T12:27:36Z

I noticed that there is a similar question on lasagne-users

Which is not by coincidence, since support questions should be posted on the mailing list, not on the issue tracker. The issue tracker is to be reserved for bug reports and feature discussions (i.e., things that require a change in Lasagne's codebase).

I don’t know how to set tags when build the model. Can someone give me an examples?

layer = InputLayer((None, 10))
layer = DenseLayer(layer, 100)
layer.W.tag.grad_scale = 10

Then the loop would become:

params = lasagne.layers.get_all_params(l_out)
grads = theano.grad(loss, params)
for idx, param in enumerate(params):
    grad_scale = getattr(param.tag, 'grad_scale', 1)
    if grad_scale != 1:
        grads[idx] *= grad_scale
updates = lasagne.updates.nesterov_momentum(grads, params, ...)

kli-casia · 2016-04-26T12:38:51Z

Thank you very much, f0k, You always help me in a timely manner.
I will post support questions on the maling list from now on

jiqiujia · 2017-02-20T12:10:12Z

In this way, we can only set the lr_mult. How about decay_mult?

f0k · 2017-02-22T00:31:04Z

How about decay_mult?

For different L2 decay per layer, you can use regularize_layer_params_weighted. The first argument is a dictionary mapping layers to regularization strength (see the example at the top of the linked page). The result is a loss you would add to your loss function.

jiqiujia · 2017-02-22T06:29:45Z

@f0k Oh, yes! That's it. Thank you very much!

kli-casia changed the title ~~What the most effective and elegant way to set lr_mult and decay_mult for each trainable layer?~~ What's the most effective and elegant way to set lr_mult and decay_mult for each trainable layer? Apr 26, 2016

f0k closed this as completed Apr 26, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What's the most effective and elegant way to set lr_mult and decay_mult for each trainable layer? #669

What's the most effective and elegant way to set lr_mult and decay_mult for each trainable layer? #669

kli-casia commented Apr 26, 2016 •

edited

kli-casia commented Apr 26, 2016 •

edited

f0k commented Apr 26, 2016 •

edited

kli-casia commented Apr 26, 2016 •

edited

jiqiujia commented Feb 20, 2017

f0k commented Feb 22, 2017

jiqiujia commented Feb 22, 2017

What's the most effective and elegant way to set lr_mult and decay_mult for each trainable layer? #669

What's the most effective and elegant way to set lr_mult and decay_mult for each trainable layer? #669

Comments

kli-casia commented Apr 26, 2016 • edited

kli-casia commented Apr 26, 2016 • edited

f0k commented Apr 26, 2016 • edited

kli-casia commented Apr 26, 2016 • edited

jiqiujia commented Feb 20, 2017

f0k commented Feb 22, 2017

jiqiujia commented Feb 22, 2017

kli-casia commented Apr 26, 2016 •

edited

kli-casia commented Apr 26, 2016 •

edited

f0k commented Apr 26, 2016 •

edited

kli-casia commented Apr 26, 2016 •

edited