Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some concerns of approach to prune network. #6

Closed
haithanhp opened this issue Dec 11, 2017 · 1 comment
Closed

Some concerns of approach to prune network. #6

haithanhp opened this issue Dec 11, 2017 · 1 comment

Comments

@haithanhp
Copy link

Hi @gaohuang and @ShichenLiu ,

Thank you for great work. I have the following concerns when I try running your code and read your paper:

  • Condensation criterion: In the paper, you use L1-norm value of weights within the same group to find column indices for pruning weights with small values. I saw these indices were also applied for the other groups in the code self.mask[i::self.groups, d, :, :].fill(0).
  • Have you tested learned group convolution for larger kernel filters (3x3)? If yes, how 's about the efficiency?
  • In the code, why do you shuffle weights for group lasso loss?
  • Why do you drop 50% input channels by CondensingLinear(child, 0.5) for converting models?

Thanks,
Hai

@ShichenLiu
Copy link
Owner

Hi @HaiPhan1991 ,

About condensation criterion, i::self.group means starting from i and stepping by self.group, instead of from i to self.group. Learned Group Convolution implicitly employs a channel shuffle layer (Fig 1 in our paper, permute layer), so the first dimension to be pruned is not continuous. Shuffle weight in group lasso loss is for the same reason.

Larger kernels are more complicated, e.g. its bias might be considered as part of its importance to be pruned. As far as I am concerned, naively prune weights by its absolute sum may lead to decrease in efficiency.

If you carefully examine the Fig 8 in our paper, you might find out that most of the weights in classifier layer are extremely low. This is also the case in ImageNet, where there are much more classifier parameters (about 2M). By pruning them, we could save huge amount of parameters without loss in accuracy.

Best,
Shichen

undol26 pushed a commit to undol26/CondenseNet that referenced this issue Oct 20, 2021
* [ShichenLiu#6] Add argument option
* [ShichenLiu#6] Delete LTDN word and change using args option
* [ShichenLiu#6] Save args settings to file
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants