Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Tensorcore fullyconnected support2 #7447

Closed

Conversation

DickJC123
Copy link
Contributor

Consider this an alternative approach to getting TensorCore working with FullyConnected. It is far simpler than my first PR for this new functionality. If anything, this is my proof that one can invoke TensorCore algos through manipulation of the cublas handle along with the existing dot function's use of Hgemm and SgemmEx. This PR also shows the type of per-instance handle manipulations that are necessary, since blindly setting the handle globally to enable TensorCore will have the unfortunate side-effect of introducing fp16-casts on the inputs of fp32-I/O gemms. Bottom line, I wouldn't expect you to accept this PR without a discussion.

I have begun studying the new linear algebra code with the idea of producing an enable-TensorCore PR for this new approach. I notice the new LA code doesn't support fp16 I/O gemms yet, and the solution there will not fit the mold of the existing function templates. Also, what is the plan for switching over MXNET's use of dot() to use the new functions?

@piiswrong
Copy link
Contributor

We are going to gradually switch from dot to linalg_gemm.
Why wouldn't it work for fp16? I think you can just specialize the template for fp16 right?

@DickJC123
Copy link
Contributor Author

DickJC123 commented Aug 15, 2017 via email

@piiswrong
Copy link
Contributor

should this be closed?

@DickJC123
Copy link
Contributor Author

This work superseded by a later PR.

@DickJC123 DickJC123 closed this Aug 18, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants