The usage of gumbel softmax in DS-Net #10

LinyeLi60 · 2021-11-17T16:10:33Z

Thank you for your very nice work,I want to know that the effect of gumble softmax，because I think the network can be trained without gumble softmax.
Is the gumbel softmax just aimed to increase the randomness of channel choice?

LinyeLi60 · 2021-11-18T01:23:04Z

To optimize the nondifferentiable slimming head of dynamic gate

I don't understand why slimming head is nondifferentiable, because I think the output of slimming head is not in the computation graph of latter network layers.

changlin31 · 2021-11-19T05:13:39Z

Hi @sixzerotech

The output of slimming head is used as a sub-network routing signal for subsequent layers.

First, let we assume that the SGS training loss is not introduced. To optimize the gate by AutoGrad end-to-end, we need to include its output in the computation graph. This is achieved by masking the output of subsequent layers by the output of the gate. However, this hard (0 or 1) mask is not differentiable. We follow previous works that use tricks such as semihash, gumbel-softmax to tuckle this.

Second, as we already introduced SGS loss, the end-to-end target loss with gumbel-softmax is not a necessity. However, as SGS loss will only encourage the network to choose the first or last gate (gate target is [1, 0, 0, 0] or [0, 0, 0, 1]), it is better to combine it with end-to-end target loss with gumbel-softmax.

LinyeLi60 · 2021-11-22T04:53:11Z

Thank you very much for your reply!

LinyeLi60 · 2022-02-22T11:10:00Z

hi, changlin, I have another question about the num_choice. If I set the num_choice to 14 and train the gate, the gate tends to choose the smallest sub-network even with gumbel softmax.
Below is my log
`02/20 10:45:57 AM Distributing BatchNorm running means and vars
02/20 10:46:49 AM blocks.3.first_block.gate: tensor([271., 11., 24., 8., 13., 12., 5., 9., 8., 15., 11., 9.,
12., 16.], device='cuda:0')
02/20 10:46:49 AM Test: [ 48/48 (100%)] Loss: 0.7052 (1.2775) Acc@1: 83.254715 (70.4500) Acc@5: 94.221695 (89.2820) GateAcc: 53.7736(53.3780) Flops: 201547424 (185565811) Time: 0.918s, 923.72/s (1.067s, 794.82/s) DataTime: 0.113 (0.126)

02/20 10:47:40 AM blocks.3.first_block.gate: tensor([257., 12., 12., 7., 12., 10., 9., 18., 11., 13., 9., 18.,
21., 15.], device='cuda:0')
02/20 10:47:40 AM Test(EMA): [ 48/48 (100%)] Loss: 0.7114 (1.2773) Acc@1: 83.254715 (70.4200) Acc@5: 94.575470 (89.2200) GateAcc: 51.2972(53.1000) Flops: 220756448 (185856099) Time: 0.904s, 938.03/s (1.041s, 814.28/s) DataTime: 0.117 (0.121)

02/20 10:47:40 AM Current checkpoints:
('./output/train-dynamic-slimmable-slimmable_mbnet_v1_bn_uniform/20220217-223124-slimmable_mbnet_v1_bn_uniform/checkpoint-9.pth.tar', 70.53800006835938)
('./output/train-dynamic-slimmable-slimmable_mbnet_v1_bn_uniform/20220217-223124-slimmable_mbnet_v1_bn_uniform/checkpoint-4.pth.tar', 70.49399994140624)
('./output/train-dynamic-slimmable-slimmable_mbnet_v1_bn_uniform/20220217-223124-slimmable_mbnet_v1_bn_uniform/checkpoint-10.pth.tar', 70.4880000439453)
('./output/train-dynamic-slimmable-slimmable_mbnet_v1_bn_uniform/20220217-223124-slimmable_mbnet_v1_bn_uniform/checkpoint-18.pth.tar', 70.4880000439453)
('./output/train-dynamic-slimmable-slimmable_mbnet_v1_bn_uniform/20220217-223124-slimmable_mbnet_v1_bn_uniform/checkpoint-32.pth.tar', 70.48800001953126)
('./output/train-dynamic-slimmable-slimmable_mbnet_v1_bn_uniform/20220217-223124-slimmable_mbnet_v1_bn_uniform/checkpoint-13.pth.tar', 70.48600001953125)
('./output/train-dynamic-slimmable-slimmable_mbnet_v1_bn_uniform/20220217-223124-slimmable_mbnet_v1_bn_uniform/checkpoint-25.pth.tar', 70.46999999267578)
('./output/train-dynamic-slimmable-slimmable_mbnet_v1_bn_uniform/20220217-223124-slimmable_mbnet_v1_bn_uniform/checkpoint-8.pth.tar', 70.46800004638672)
('./output/train-dynamic-slimmable-slimmable_mbnet_v1_bn_uniform/20220217-223124-slimmable_mbnet_v1_bn_uniform/checkpoint-22.pth.tar', 70.46600004394531)
('./output/train-dynamic-slimmable-slimmable_mbnet_v1_bn_uniform/20220217-223124-slimmable_mbnet_v1_bn_uniform/checkpoint-39.pth.tar', 70.41999996582031)`

changlin31 · 2022-02-22T11:20:01Z

Hi, @sixzerotech

This is an expected behavior as the gate is very difficult to tune. I suggest you limit the routing space to larger sub-networks (e.g., choice 4-8) if you want to select larger ones. Or, you could try disabling the complexity loss and lower the weight of SGS loss.

LinyeLi60 · 2022-02-22T11:25:34Z

Hi, @sixzerotech

This is an expected behavior as the gate is very difficult to tune. I suggest you limit the routing space to larger sub-networks (e.g., choice 4-8) if you want to select larger ones. Or, you could try disabling the complexity loss and lower the weight of SGS loss.

Yes, I agree with you that the gate is really difficult to tune after my countless experiments. Thank you for your quick reply, looking forward to your future work to solve this thorny problem.

changlin31 · 2022-02-22T11:30:42Z

Thank you for your understanding. I'm closing this for now.

changlin31 added the discussion Not a issue with the code label Nov 19, 2021

LinyeLi60 closed this as completed Nov 22, 2021

LinyeLi60 reopened this Feb 22, 2022

changlin31 closed this as completed Feb 22, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The usage of gumbel softmax in DS-Net #10

The usage of gumbel softmax in DS-Net #10

LinyeLi60 commented Nov 17, 2021

LinyeLi60 commented Nov 18, 2021 •

edited

changlin31 commented Nov 19, 2021

LinyeLi60 commented Nov 22, 2021

LinyeLi60 commented Feb 22, 2022

changlin31 commented Feb 22, 2022

LinyeLi60 commented Feb 22, 2022

changlin31 commented Feb 22, 2022

The usage of gumbel softmax in DS-Net #10

The usage of gumbel softmax in DS-Net #10

Comments

LinyeLi60 commented Nov 17, 2021

LinyeLi60 commented Nov 18, 2021 • edited

changlin31 commented Nov 19, 2021

LinyeLi60 commented Nov 22, 2021

LinyeLi60 commented Feb 22, 2022

changlin31 commented Feb 22, 2022

LinyeLi60 commented Feb 22, 2022

changlin31 commented Feb 22, 2022

LinyeLi60 commented Nov 18, 2021 •

edited