-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Forward and backward for FBNet #13
Comments
hi @GuntherZhong, I don't really understand the meaning of
And if i understand you right, the
The reason is that the model parameters and architecture parameters are trained together in snas, Actually i'm not sure if this search procedure is right. I do experiments with cifar10, training them together or alternatively , they both converges well. But cifar10 may be too easy. |
@JunrQ Thanks for your reply. Sorry that I may not describe it clearly. Well, I found that in your source code, in the FBNet.forward() function, the theta.repeat(batch_size, 1) function is called, and after that the weight = nn.functional.gumbel_softmax(t, temperature) is called. Since weights for the samples in this batch are different, in my opinion, it can be seen as batch_size models are generated from the theta parameters for batch_size samples. And this implementation is quite different from the implementation in SNAS code. And you get the point which I mean the two .steps() exactly :-), I also found the single-level optimization in the original SNAS paper is not described quite clearly. Thus it is quite a pity that although both the SNAS paper and the FBNet paper use Gumbel softmax, we don't know how to implement it correctly :-( |
@GuntherZhong, thank you for your good question. According to BP algorithm, when do For the fact that loss is mean, so i think the grad of I don't know which is better, using mean of different generated models or the same models (like DARTS). |
Thanks for your answer. Quite sorry to reply so late. It is quite an efficient way to solve the expectation minimization problem. However, I wonder that where I can find some reference or papers to explain the algorithm you use :-) Thank you ~ |
@GuntherZhong I think Gumbel softmax coefficients should be multiplied on the |
Hi, JunrQ:
Thanks for your work, it is really quite helpful~
I have a question: I found that in your FBNet source code, you generate batch_size models for batch_size samples per batch, however, the total loss is summed and the loss.backward() function is called. So how this backward() function is applied? For a single model or for batch_size model? Besides, I wonder that why you use this method for FBNet while a single model is generated, loss.backward() is called and then two .step() function is applied in SNAS code.
The text was updated successfully, but these errors were encountered: