-
Notifications
You must be signed in to change notification settings - Fork 800
Thinning FC layers #73
Comments
Hi Vinu, I was thinking how to answer you, so it took me some time to write back to you.
Thinning in PyTorch is not trivial mainly because the structure (layers, connectivity and layer sizes) are all described in code (compare this to Caffe's protobuf format which is straight-forward to manipulate with code). So in the example above, when an object of type This was the second part of "thinning" - the part that "executes" thinning instructions. The first part is more challenging, and it is creating the thinning recipe. The challenge here is to understand the dependencies between the layers. For example, imagine a Batch Normalization layer that follows a Conv2d layer. If we reduce the number of output channels (and remove weight filters), we also need to understand that there's a BN layer following and that it also requires changes in its configuration (number of inputs) and in its parameters (mu and sigma tensors). And these dependencies can get quite complex (we also need to transform tensors that the Optimizer may have; and the gradients - see [3] below). Phew! Let's catch our breath ;-). That's a lot of stuff going on and this was, believe it or not, a short and perhaps unsatisfying explanation of everything involved in "thinning". I've been planning on properly documenting this because the code is hard to follow. Now to your questions ;-): This was long but short. |
Thank you, Neta for your above description and answers. I am getting used to your style of designing data structures to solve model compression problems. Thanks to your in-code documentation and description above, I understand your approach to I have a few follow-up questions on certain low-level aspects of your design, I roughly see why you might have made these decisions, but clarifying this from you, I hope will benefit our community. [1 ] Necessity of Sub recipes.
Here is my understanding/notes for your choice of "modules" and why you might have considered having two sub-recipes for a ThinningRecipe.
[2] The case where Linear follows a Conv2d This is a slightly simplified view of what you do when executing param_directives for the case of Linear followed by Conv2d. Could you please explain the logic that follows:
[3] Handling of BN layer separately.
and 64 became 34 due to pruning, we need to adjust the configuration (number of inputs) 64 to 34 in the next BN, my understanding is that this is the only change required and in its parameters (mu and sigma tensors) will remain the same as the values removed are zeros so the mean and the variance will remain unchanged. [4] Handling gradients
|
Hi Vinu, |
Sure Neta, No Problem. [1] I was able to easily add support for Thinning FC layers. [2] My changes seems to work fine for small networks, I test it by SummaryGraph, Forward pass. [3] I notice that the accuracy remains the same too, still have not seen a misclassification yet. |
Hi Vinu, Sorry for taking such a long time - I was away in a conference and on vacation. Do you want to share your new code for thinning FC layers? Cheers, |
Sure Neta, I will do it this week. |
Hi, Closing this due to staleness. |
The thinning methods support only removing channels or filters of a CONV layer
[1]
How about thinning FC layers, even if you are not going to support it, can you provide, what all one should take care of if one wants to implement say
remove_rows( )
orremove_columns( )
corresponding to neuron pruning ?[2]
Its seems hard to simply extend the thinning_recipe approach as it seems to be too tied to removing CONV structures. Any suggestions ?
[3]
Also If we are thinning, pruned pytorch models, what could be the reason for its accuracy drop ?
Because we are strictly removing only zero structures, the math should be about the same and cause the same classificaiton ?
You seem to be taking into consideration a possible perofrmace drop by preparing to thin even the gradient tensors.
The text was updated successfully, but these errors were encountered: