Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Matmul code and speed improvements #1483

Merged
merged 3 commits into from
Oct 25, 2018
Merged

Matmul code and speed improvements #1483

merged 3 commits into from
Oct 25, 2018

Conversation

pmichel31415
Copy link
Collaborator

@pmichel31415 pmichel31415 commented Oct 24, 2018

This builds on #1355 and rewrite the matmul and affinetransform nodes to reduce code duplication (now all the matmul/matmul+transpose logic lives in matrix-multiply.h only).

This should also enable cublas batched matmul in the backward pass, which hopefully will make things like dot attention faster.

TODO:

  • Confirm speed gain
  • figure out why there is an ""assert false" in matrix-multiply.h (cf Added batched matrix multiplies on CUDA #1355)
  • [ ] Add matmul + transpose as an operation (add the option to transpose in the matmul node) <- maybe for a future PR but this should help further with attention (both memory and speed wise)

@FilippoC
Copy link
Contributor

FilippoC commented Oct 24, 2018

FYI, I just tried this with my implementation of "A Decomposable Attention Model for Natural Language Inference" (I remove the line "DYNET_ARG_CHECK(false, "MatrixMultiplyTranspAcc");"). My implementation uses a static graph and batches of size 64.

  • Before: ~188 seconds per epoch
  • Now: ~105 seconds per epoch

This is really nice because the Pytorch implementation of a colleague of the exact same network takes ~108 seconds per epoch.

Having a matmul + transpose operation would be awesome!

@pmichel31415
Copy link
Collaborator Author

Good to hear! I've also seen speedup in my implementation of transformer (not as big as yours but still went from 23 to 17 min per epoch).

I think this one should be merged now and matmul+transpose can be done in another PR.

@pmichel31415 pmichel31415 changed the title [WIP] Matmul code (and speed?) improvements Matmul code and speed improvements Oct 24, 2018
@neubig neubig merged commit 946200c into master Oct 25, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants