New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
After adding self-implemented Layer-Normalization, the backward time of gradient_penalty became large #10
Comments
Do you confirm that the value is wrong? Maybe you can refer to the |
@caogang The result seems reasonable but the time seems a bit too long. I'll test using |
Oh, your problem is the bigger time cost after pluggin your Layer_Norm module? From my intuition, it may because of the |
@caogang what is the other way to complete similar behavior here as |
Yeah, maybe it is the best methods using expand in current branch. Just waiting for the broadcasting feature to be merged, or you can contrubute to the above PR. :) |
My implementation of layer-normalization is:
After plugging in this before
ReLU
, the backward ofgradient_penalty
became large0.1149s
compared to the former value0.0075s
.I compiled the source code from master branch, commit deb0aef30cdaa78f9840bfa4a919ad206e8e73a7 and also modified the ReLU source code before compiling according to your instruction.
I am wondering if it is because the my implementation of layer-normalization contains something not suitable for double backward?
The text was updated successfully, but these errors were encountered: