You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I think it would be better if we train the network weights and the architecture weights separately, to be exact , frozen the grad of α,β when updating w, also frozen the gradient of w when updating α,β.
By the definition of:
The text was updated successfully, but these errors were encountered:
I believe that the code works this way already. The optimizer of the model only contains wight parameters and the optimizer in architecture does alpha and beta only. Please correct me if it isn't right.
I think it would be better if we train the network weights and the architecture weights separately, to be exact , frozen the grad of α,β when updating w, also frozen the gradient of w when updating α,β.
By the definition of:
![微信图片_20190719180918](https://user-images.githubusercontent.com/29011527/61528260-e2c24980-aa50-11e9-935d-18c6d570e98d.png)
The text was updated successfully, but these errors were encountered: