You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thank you for great work. I have the following concerns when I try running your code and read your paper:
Condensation criterion: In the paper, you use L1-norm value of weights within the same group to find column indices for pruning weights with small values. I saw these indices were also applied for the other groups in the code self.mask[i::self.groups, d, :, :].fill(0).
Have you tested learned group convolution for larger kernel filters (3x3)? If yes, how 's about the efficiency?
In the code, why do you shuffle weights for group lasso loss?
Why do you drop 50% input channels by CondensingLinear(child, 0.5) for converting models?
Thanks,
Hai
The text was updated successfully, but these errors were encountered:
About condensation criterion, i::self.group means starting from i and stepping by self.group, instead of from i to self.group. Learned Group Convolution implicitly employs a channel shuffle layer (Fig 1 in our paper, permute layer), so the first dimension to be pruned is not continuous. Shuffle weight in group lasso loss is for the same reason.
Larger kernels are more complicated, e.g. its bias might be considered as part of its importance to be pruned. As far as I am concerned, naively prune weights by its absolute sum may lead to decrease in efficiency.
If you carefully examine the Fig 8 in our paper, you might find out that most of the weights in classifier layer are extremely low. This is also the case in ImageNet, where there are much more classifier parameters (about 2M). By pruning them, we could save huge amount of parameters without loss in accuracy.
Hi @gaohuang and @ShichenLiu ,
Thank you for great work. I have the following concerns when I try running your code and read your paper:
Thanks,
Hai
The text was updated successfully, but these errors were encountered: