Fixing harsh upgrade_proto for `"BatchNorm"` layer #5184

Merged
merged 1 commit into from Jan 20, 2017

Conversation

Projects
None yet
4 participants
Contributor

shaibagon commented Jan 15, 2017

This PR attempts to fix issues #5171 and #5120 cuased by PR #4704:
PR#4704 removes completely all param arguments of "BatchNorm" layers, and resetting them to param {lr_mult: 0}. This "upgrade" is too harsh and it discards "name" argument that might be set by user.

This PR fixes upgrade_proto.cpp for "BatchNorm" layer to be more conservative, leave "name" in param, and only set lr_mult and decay_mult to zero.

Example of such upgrade:
Input prototxt

layer {
  type: "BatchNorm"
  name: "bn0"
  bottom: "data"
  top: "bn0"
  # old style params
  param: { lr_mult: 0 }
  param: { lr_mult: 0 }
  param: { lr_mult: 0 }
}
layer {
  type: "BatchNorm"
  name: "bn1"
  bottom: "bn0"
  top: "bn1"
  # wrong params
  param: { lr_mult: 1 decay_mult: 1}
  param: { lr_mult: 1 decay_mult: 0}
  param: { lr_mult: 1 decay_mult: 1}
}
layer {
  type: "BatchNorm"
  name: "bn2"
  bottom: "bn1"
  top: "bn2"
  # no params at all
}
layer {
  type: "BatchNorm"
  name: "bn3"
  bottom: "bn2"
  top: "bn3"
  # wrong with "name"
  param: { lr_mult: 1 decay_mult: 1 name: "bn_m"}
  param: { lr_mult: 1 decay_mult: 1 name: "bn_s"}
  param: { lr_mult: 1 decay_mult: 1 name: "bn_b"}
}
layer {
  type: "BatchNorm"
  name: "bn4"
  bottom: "bn3"
  top: "bn4"
  # only "name"
  param: { name: "bn_m"}
  param: { name: "bn_s"}
  param: { name: "bn_b"}
}

"Upgraded" prorotxt:

layer {
  name: "bn0"
  type: "BatchNorm"
  bottom: "data"
  top: "bn0"
  param {
    lr_mult: 0
    decay_mult: 0
  }
  param {
    lr_mult: 0
    decay_mult: 0
  }
  param {
    lr_mult: 0
    decay_mult: 0
  }
}
layer {
  name: "bn1"
  type: "BatchNorm"
  bottom: "bn0"
  top: "bn1"
  param {
    lr_mult: 0
    decay_mult: 0
  }
  param {
    lr_mult: 0
    decay_mult: 0
  }
  param {
    lr_mult: 0
    decay_mult: 0
  }
}
layer {
  name: "bn2"
  type: "BatchNorm"
  bottom: "bn1"
  top: "bn2"
}
layer {
  name: "bn3"
  type: "BatchNorm"
  bottom: "bn2"
  top: "bn3"
  param {
    name: "bn_m"
    lr_mult: 0
    decay_mult: 0
  }
  param {
    name: "bn_s"
    lr_mult: 0
    decay_mult: 0
  }
  param {
    name: "bn_b"
    lr_mult: 0
    decay_mult: 0
  }
}
layer {
  name: "bn4"
  type: "BatchNorm"
  bottom: "bn3"
  top: "bn4"
  param {
    name: "bn_m"
    lr_mult: 0
    decay_mult: 0
  }
  param {
    name: "bn_s"
    lr_mult: 0
    decay_mult: 0
  }
  param {
    name: "bn_b"
    lr_mult: 0
    decay_mult: 0
  }
}

As you can see lr_mult and decay_mult are set to zero leaving name intact when explicitly set by user.

@shaibagon shaibagon fixing upgrade_proto for BatchNorm layer: be more conservative leave …
…"name" in param, only set lr_mult and decay_mult to zero
a19357a
Contributor

shaibagon commented Jan 16, 2017

@shelhamer would you please have a look at this issue/proposed fix?

Thanks.

shelhamer self-assigned this Jan 17, 2017

Owner

shelhamer commented Jan 20, 2017 edited

Switching to zeroing the lr_mult and decay_mult like this is fine. I was too focused on avoiding incorrect statistics gradients that I made sharing impossible. Thanks for the fix!

@shelhamer shelhamer merged commit bc0d680 into BVLC:master Jan 20, 2017

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
Contributor

shaibagon commented Jan 20, 2017

@shelhamer Thanks for merging This PR!

Contributor

antran89 commented Feb 2, 2017

@shaibagon Thank Shai for a fix. Not sure about internal structure. Just quick question. Does the upgraded proto of BN layer have the same interface as before having this upgrade?

Contributor

shaibagon commented Feb 2, 2017

@antran89 There is no interface change. The actions upgrade_proto takes when encountering "BatchNorm" layer's param are more "gentle" now.

shaibagon deleted the shaibagon:fix_batch_norm_param_upgrade branch Apr 18, 2017

Jiangfeng-Xiong commented May 9, 2017 edited

@shaibagon @shelhamer what will happen if we share parameters in batchnorm layer, since mean and variance are calculated based on input, so, during training, there are two inputs in the siamese network,there would be two means and two variance based on different inputs, So, what will be used as paramter in batchnorm, or we just average them?
Thanks

Contributor

shaibagon commented May 9, 2017

@Jiangfeng-Xiong you obviously cannot have two means and variances in the same layer, it make no sense.
The idea behind a Siamese network is that you actually train a single net, this is why you share the weights between the two copies. Thus, the batch norm parameters are averaged between the two copies as are all the weights in the net.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment