Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug][Feature] Added more missing FP16 specializations #4140

Merged
merged 8 commits into from
Jun 27, 2022

Conversation

ndickson-nvidia
Copy link
Contributor

@ndickson-nvidia ndickson-nvidia commented Jun 17, 2022

Description

  • Continuation of adding more missing FP16 specializations after PR [Bug][Feature] Added cublasGemm<__half> specialization (#3988) #4029 , which addressed the specific case from issue Add FP16 support for GatherMM kernel #3988
  • Added missing specializations for __half of DLDataTypeTraits, IndexSelect, Full, Scatter_, CSRGetData, CSRMM, CSRSum
  • Fixed casting issue in _LinearSearchKernel that was preventing it from supporting __half
  • Added more specific error messages for unimplemented FP16 specializations of Xgeam, CSRGEMM, and CSRGEAM, which would require functions that aren't provided by cublas

Checklist

  • The PR title starts with [$CATEGORY] (such as [NN], [Model], [Doc], [Feature]])
  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage
  • Code is well-documented
  • To the best of my knowledge, examples are either not affected by this change,
    or have been fixed to be compatible with this change
  • Related issue is referred in this PR

@dgl-bot
Copy link
Collaborator

dgl-bot commented Jun 17, 2022

To trigger regression tests:

  • @dgl-bot run [instance-type] [which tests] [compare-with-branch];
    For example: @dgl-bot run g4dn.4xlarge all dmlc/master or @dgl-bot run c5.9xlarge kernel,api dmlc/master

@dgl-bot
Copy link
Collaborator

dgl-bot commented Jun 17, 2022

Commit ID: 1751ca3

Build ID: 1

Status: ❌ CI test failed in Stage [Lint Check].

Report path: link

Full logs path: link

@yaox12 yaox12 requested a review from nv-dlasalle June 20, 2022 06:25
@mufeili mufeili requested a review from isratnisa June 20, 2022 06:28
@dgl-bot
Copy link
Collaborator

dgl-bot commented Jun 20, 2022

Commit ID: 9f63277538d3f54bc2f79004984c07efd7dafac1

Build ID: 2

Status: ❌ CI test failed in Stage [GPU Build].

Report path: link

Full logs path: link

@dgl-bot
Copy link
Collaborator

dgl-bot commented Jun 20, 2022

Commit ID: d0bc60a19bae0d11db5214fe3527abcf1e9094b3

Build ID: 3

Status: ❌ CI test failed in Stage [Lint Check].

Report path: link

Full logs path: link

@dgl-bot
Copy link
Collaborator

dgl-bot commented Jun 20, 2022

Commit ID: ad8972c

Build ID: 4

Status: ❌ CI test failed in Stage [Lint Check].

Report path: link

Full logs path: link

@dgl-bot
Copy link
Collaborator

dgl-bot commented Jun 20, 2022

Commit ID: 29f464e

Build ID: 5

Status: ✅ CI test succeeded

Report path: link

Full logs path: link

src/array/cuda/spmm.cuh Outdated Show resolved Hide resolved
src/array/cuda/cusparse_dispatcher.cuh Outdated Show resolved Hide resolved
src/array/cuda/cusparse_dispatcher.cuh Outdated Show resolved Hide resolved
#ifdef USE_FP16
// The initialization constructor for __half is apparently a device-
// only function in some setups, but the current function isn't run
// on the device.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not clear which 'current' function we are talking here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"the current function" here refers to the function containing this comment, the function that is currently being run (on the host) as this comment is passed. If I referred to it by name instead, i.e. IndexSelect, it might sound like the comment is referring to the other function named IndexSelect, above. Maybe it would be clear if I included both, though. 🤔 I'll try something when I add the "TODO"s.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the comment. Hopefully it's a bit clearer now. Thanks!

@dgl-bot
Copy link
Collaborator

dgl-bot commented Jun 23, 2022

Commit ID: 10c371f85996c031b10a7153b517e3cb149ae956

Build ID: 6

Status: ❌ CI test failed in Stage [Torch CPU (Win64) Unit test].

Report path: link

Full logs path: link

@dgl-bot
Copy link
Collaborator

dgl-bot commented Jun 23, 2022

Commit ID: c425866

Build ID: 7

Status: ✅ CI test succeeded

Report path: link

Full logs path: link

Copy link
Collaborator

@nv-dlasalle nv-dlasalle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Functionally it looks good--just the way some of the errors are reported needs updating.

src/array/cuda/cusparse_dispatcher.cuh Outdated Show resolved Hide resolved
src/array/cuda/cusparse_dispatcher.cuh Outdated Show resolved Hide resolved
src/array/cuda/spmm.cuh Show resolved Hide resolved
…IndexSelect`, `Full`, `Scatter_`, `CSRGetData`, `CSRMM`, `CSRSum`, `IndexSelectCPUFromGPU`

* Fixed casting issue in `_LinearSearchKernel` that was preventing it from supporting `__half`
* Added `#if`'d out specializations of `CSRGEMM`, `CSRGEAM`, and `Xgeam`, which would require functions that aren't currently provided by cublas
* Added clearer comment explaining why the cast to long long is necessary
* Also changed the existing Xgeam function for unsupported data types from LOG(INFO) to LOG(FATAL)
@dgl-bot
Copy link
Collaborator

dgl-bot commented Jun 27, 2022

Commit ID: 83d41e9

Build ID: 9

Status: ✅ CI test succeeded

Report path: link

Full logs path: link

@nv-dlasalle nv-dlasalle merged commit a5d8460 into dmlc:master Jun 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants