The network weights and the architecture weights(coefficient) train together? #51

Linfengscat · 2019-07-19T10:15:00Z

I think it would be better if we train the network weights and the architecture weights separately, to be exact , frozen the grad of α，β when updating w, also frozen the gradient of w when updating α，β.

By the definition of:

HankKung · 2019-07-19T15:04:45Z

I believe that the code works this way already. The optimizer of the model only contains wight parameters and the optimizer in architecture does alpha and beta only. Please correct me if it isn't right.

Linfengscat · 2019-07-22T07:23:51Z

@HankKung Sorry I was careless, Thanks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The network weights and the architecture weights(coefficient) train together? #51

The network weights and the architecture weights(coefficient) train together? #51

Linfengscat commented Jul 19, 2019

HankKung commented Jul 19, 2019

Linfengscat commented Jul 22, 2019

The network weights and the architecture weights(coefficient) train together? #51

The network weights and the architecture weights(coefficient) train together? #51

Comments

Linfengscat commented Jul 19, 2019

HankKung commented Jul 19, 2019

Linfengscat commented Jul 22, 2019