Tensorcore fullyconnected support2 #7447

DickJC123 · 2017-08-14T01:28:16Z

Consider this an alternative approach to getting TensorCore working with FullyConnected. It is far simpler than my first PR for this new functionality. If anything, this is my proof that one can invoke TensorCore algos through manipulation of the cublas handle along with the existing dot function's use of Hgemm and SgemmEx. This PR also shows the type of per-instance handle manipulations that are necessary, since blindly setting the handle globally to enable TensorCore will have the unfortunate side-effect of introducing fp16-casts on the inputs of fp32-I/O gemms. Bottom line, I wouldn't expect you to accept this PR without a discussion.

I have begun studying the new linear algebra code with the idea of producing an enable-TensorCore PR for this new approach. I notice the new LA code doesn't support fp16 I/O gemms yet, and the solution there will not fit the mold of the existing function templates. Also, what is the plan for switching over MXNET's use of dot() to use the new functions?

piiswrong · 2017-08-15T03:15:35Z

We are going to gradually switch from dot to linalg_gemm.
Why wouldn't it work for fp16? I think you can just specialize the template for fp16 right?

DickJC123 · 2017-08-15T03:29:32Z

I'm working on such a specialization. It will have sgemmex and hgemm under the covers just like the dot implementation.

…

Sent from my iPhone

On Aug 14, 2017, at 8:15 PM, Eric Junyuan Xie ***@***.***> wrote: We are going to gradually switch from dot to linalg_gemm. Why wouldn't it work for fp16? I think you can just specialize the template for fp16 right? — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

piiswrong · 2017-08-17T20:35:20Z

should this be closed?

DickJC123 · 2017-08-18T01:49:08Z

This work superseded by a later PR.

DickJC123 added 2 commits August 13, 2017 11:43

Adds TensorCore support to FullyConnected operator.

a26bb30

Propagates device_id of context to created streams.

c6bc508

DickJC123 closed this Aug 18, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tensorcore fullyconnected support2 #7447

Tensorcore fullyconnected support2 #7447

DickJC123 commented Aug 14, 2017

piiswrong commented Aug 15, 2017

DickJC123 commented Aug 15, 2017 via email

piiswrong commented Aug 17, 2017

DickJC123 commented Aug 18, 2017

Tensorcore fullyconnected support2 #7447

Tensorcore fullyconnected support2 #7447

Conversation

DickJC123 commented Aug 14, 2017

piiswrong commented Aug 15, 2017

DickJC123 commented Aug 15, 2017 via email

piiswrong commented Aug 17, 2017

DickJC123 commented Aug 18, 2017