Forward and backward for FBNet #13

GuntherZhong · 2019-03-18T13:23:04Z

Hi, JunrQ:

Thanks for your work, it is really quite helpful~
I have a question: I found that in your FBNet source code, you generate batch_size models for batch_size samples per batch, however, the total loss is summed and the loss.backward() function is called. So how this backward() function is applied? For a single model or for batch_size model? Besides, I wonder that why you use this method for FBNet while a single model is generated, loss.backward() is called and then two .step() function is applied in SNAS code.

JunrQ · 2019-03-19T01:48:04Z

hi @GuntherZhong, I don't really understand the meaning of

generate batch_size models for batch_size samples per batch

And if i understand you right, the

two .step()
means

NAS/snas/snas/snas.py

Lines 318 to 319 in f5b0f25

self.w_opt.step()

self.t_opt.step()

The reason is that the model parameters and architecture parameters are trained together in snas, Actually i'm not sure if this search procedure is right.

I do experiments with cifar10, training them together or alternatively , they both converges well. But cifar10 may be too easy.

GuntherZhong · 2019-03-19T03:23:08Z

hi @GuntherZhong, I don't really understand the meaning of

generate batch_size models for batch_size samples per batch

And if i understand you right, the
two .step()
means
  [NAS/snas/snas/snas.py](https://github.com/JunrQ/NAS/blob/f5b0f2548d7ea6e72b661b608c35b9de6afa6259/snas/snas/snas.py#L318-L319)


    Lines 318 to 319
  in
  [f5b0f25](/JunrQ/NAS/commit/f5b0f2548d7ea6e72b661b608c35b9de6afa6259)





    
      
       self.w_opt.step() 
    

    
      
       self.t_opt.step()
The reason is that the model parameters and architecture parameters are trained together in snas, Actually i'm not sure if this search procedure is right.

I do experiments with cifar10, training them together or alternatively , they both converges well. But cifar10 may be too easy.

@JunrQ Thanks for your reply.

Sorry that I may not describe it clearly. Well, I found that in your source code, in the FBNet.forward() function, the theta.repeat(batch_size, 1) function is called, and after that the weight = nn.functional.gumbel_softmax(t, temperature) is called. Since weights for the samples in this batch are different, in my opinion, it can be seen as batch_size models are generated from the theta parameters for batch_size samples. And this implementation is quite different from the implementation in SNAS code.

And you get the point which I mean the two .steps() exactly :-), I also found the single-level optimization in the original SNAS paper is not described quite clearly. Thus it is quite a pity that although both the SNAS paper and the FBNet paper use Gumbel softmax, we don't know how to implement it correctly :-(

JunrQ · 2019-03-19T05:20:35Z

@GuntherZhong, thank you for your good question.

According to BP algorithm, when do loss.backward(), every weights generated by gumbel_softmax will have its grads with shape [batch, ...], which means every repeated theta will have grads with the same shape. When do backward for function repeat, the grads will be summed along the batch_size axis.

For the fact that loss is mean, so i think the grad of theta is the mean of the batch_size generated models.

I don't know which is better, using mean of different generated models or the same models (like DARTS).

GuntherZhong · 2019-04-02T06:55:42Z

Thanks for your answer. Quite sorry to reply so late. It is quite an efficient way to solve the expectation minimization problem. However, I wonder that where I can find some reference or papers to explain the algorithm you use :-) Thank you ~

Jihao-Li · 2019-07-25T10:19:10Z

@GuntherZhong I think Gumbel softmax coefficients should be multiplied on the block, not on each feature map. The code of DARTS is implemented as I said. However, I haven't done the experiment yet. If you get the result, we can talk about it in detail.

GuntherZhong changed the title ~~Forward and backword for FBNet~~ Forward and backward for FBNet Mar 19, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Forward and backward for FBNet #13

Forward and backward for FBNet #13

GuntherZhong commented Mar 18, 2019

JunrQ commented Mar 19, 2019 •

edited

Loading

GuntherZhong commented Mar 19, 2019

JunrQ commented Mar 19, 2019

GuntherZhong commented Apr 2, 2019

Jihao-Li commented Jul 25, 2019

Forward and backward for FBNet #13

Forward and backward for FBNet #13

Comments

GuntherZhong commented Mar 18, 2019

JunrQ commented Mar 19, 2019 • edited Loading

GuntherZhong commented Mar 19, 2019

JunrQ commented Mar 19, 2019

GuntherZhong commented Apr 2, 2019

Jihao-Li commented Jul 25, 2019

JunrQ commented Mar 19, 2019 •

edited

Loading