New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fused LSTM grad-grad #3256
Fused LSTM grad-grad #3256
Conversation
a6bd4fd
to
fee7d2d
Compare
jenkins, test this please. |
return cuda.fusion.tanh(x * half) * half + half | ||
|
||
|
||
@cuda.fuse() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about using input_num
?
lstm_grad_grad
will accept writable arguments.
@cuda.fuse(input_num=13)
def lstm_grad_grad(
c_prev, a, i, f, o, c, gc, gh, ggc_prev, gga, ggi, ggf, ggo,
gc_prev, ga, gi, gf, go, gc_next, ggc, ggh):
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How to fix?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wrote sample. That kernel will reduce array copy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, This is my mistake.
LGTM! |
LSTM grad-grad calls too many kernels. I used
cupy.fuse
to combine them.Merge #3206 first.