You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello, thanks for your sharing of the code!
But I have some questions about the code. It seems that the noises are only applied to the BN layers instead of the conv layers in the code. According to the description in the paper, the perturbations to the weight and bias of a neuron may cancel each other out due to the BN layers. So if the network contains the BN layers, the ANP algorithm does only need to perturb the neurons in the BN layers. Otherwise, the ANP algorithm will perturb the neurons in the conv layers. Is that right? Could you please supplement the experimental code for the network that does not contain BN layers?
The text was updated successfully, but these errors were encountered:
The ANP algorithm only needs to perturb the neurons in the BN layers. Otherwise, the ANP algorithm will perturb the neurons in the Conv layers. Is that right?
You are correct. Let me explain in detail further:
During inference, BatchNorm layer is a linear function, and we can absorb them into Conv layers (see How to absorb batch norm layer weights into Convolution layer weights?), which means a complete layer actually consists of a Conv layer and a BatchNorm layer, so does the neurons in this layer. Taking ResNet as an example, we could use the scaling factors in BatchNorm layer to control the output of the neurons in this layer (Similar methods can be found in [1]). As a result, perturbing BN layers is more natural than perturbing Conv layers in my opinion.
Could you please supplement the experimental code for the network that does not contain BN layers?
Sorry, I can't. This is because modern DNNs heavily rely on norm layers (BatchNorm, LayerNorm, and others) and they always fail without norm layers on the commonly-used dataset such as CIFAR-10.
Hello, thanks for your sharing of the code!
But I have some questions about the code. It seems that the noises are only applied to the BN layers instead of the conv layers in the code. According to the description in the paper, the perturbations to the weight and bias of a neuron may cancel each other out due to the BN layers. So if the network contains the BN layers, the ANP algorithm does only need to perturb the neurons in the BN layers. Otherwise, the ANP algorithm will perturb the neurons in the conv layers. Is that right? Could you please supplement the experimental code for the network that does not contain BN layers?
The text was updated successfully, but these errors were encountered: