Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question on dropping function #25

Closed
lizhenstat opened this issue Aug 20, 2019 · 3 comments
Closed

Question on dropping function #25

lizhenstat opened this issue Aug 20, 2019 · 3 comments

Comments

@lizhenstat
Copy link

lizhenstat commented Aug 20, 2019

Hi, I have one question on function dropping in layers.py.
I don't understand why learned group convolution still needs the shuffling operation?

        weight = weight.view(d_out, self.groups, self.in_channels)
        weight = weight.transpose(0, 1).contiguous()
        weight = weight.view(self.out_channels, self.in_channels)

https://github.com/ShichenLiu/CondenseNet/blob/master/layers.py#L78

I notice there is a shuffle operation mentioned in 4.1's first graph:
"we permute the output channels of the first 1x1_conv learned group convolution layer,
such that the features generated by each of its groups are evenly used by all the groups of
the subsequent 3x3 group convolutional layer"
However, this operation aims to shuffle feature maps, not convolutional kernels.

Can you explain a little bit?
Thanks in advance

@ShichenLiu
Copy link
Owner

Hi,

During testing, we explicit shuffle the feature channels. However, during training, we implicitly shuffle the feature channels by choosing to drop certain kernels. These are equivalent mathematically.

@lizhenstat
Copy link
Author

lizhenstat commented Aug 28, 2019

@ShichenLiu Thanks for your reply, I still have the following two questions:
(1)
I understand the equivalence of shuffling feature map and shuffling kernels. However, I still don't understand why is the shuffle operation necessary here? (I understand why shuffleNet shuffle the output feature maps, since they need different kernels come from different groups to increase variety of the inputs). As for learned group convolution here? Each kernel already learn the corresponding "important" input feauture maps through training, why do we still need this operation here.

(2)
Did you update mask in the following way the weight has been shuffled, however the corresponding mask has not? I am not sure whether I understand it right(I check the mask in different stages)

self._mask[i::self.groups, d, :, :].fill_(0)

Thanks a lot

@lizhenstat
Copy link
Author

@ShichenLiu

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants