-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
None gradients for 'y' layers #27
Comments
|
@haoming-codes yes and this leads to the network layers using y to not be updated during training. |
Hello, For the regressor model in the conditional generation experiments on the contrary, the output dimensions of Clement |
The part of the network transforming y in the last transformer layer (y_y, e_y, x_y) is also not training. But I get what you mean by 'y' is still useful, since it's at least incorporating time to the other variables in the network. Thanks for clarifying! Best, |
It looks like the gradients of the y_mlp_out and all components involving y in the last transformer neural network layer are None. Therefore, this part of the model is not training. The components of other inputs (X and E) seems to be working normally.
To reproduce the behavior, replace the 'trainer' line with this code:
The problem appears to be that the 'y' output is not used when computing the loss. I am not sure how to use a cross-entropy loss on X and E alone but still back-propagating to the layers of 'y'.
It's also not clear why the None gradients only appear in the last layers.
The text was updated successfully, but these errors were encountered: