Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bugfix] Wrap cub with CUB_NS_PREFIX and remove dependency on Thrust to linking issues with Torch 1.8 #2758

Merged
merged 4 commits into from
Mar 22, 2021

Conversation

nv-dlasalle
Copy link
Collaborator

Description

This wraps all cub usages with CUB_NS_PREFIX, and removes the one use of thrust. This prevents symbol collision when linking against PyTorch 1.8.

Checklist

Please feel free to remove inapplicable items for your PR.

  • The PR title starts with [$CATEGORY] (such as [NN], [Model], [Doc], [Feature]])
  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage
  • Code is well-documented
  • To the my best knowledge, examples are either not affected by this change,
    or have been fixed to be compatible with this change

Changes

This adds /src/array/cuda/dgl_cub.cuh, which should be included in place of cub/cub.cuh, and it wraps it in the appropriate namespace.

This also changes the implementation of NonZero to use cub::DeviceSelect::If instead of thrust.

@nv-dlasalle nv-dlasalle changed the title [Bugfix] Wrap cub with CUB_NS_PREFIX and remove dependency on Thrust [WIP] Wrap cub with CUB_NS_PREFIX and remove dependency on Thrust to linking issues with Torch 1.8 Mar 17, 2021
@nv-dlasalle nv-dlasalle changed the title [WIP] Wrap cub with CUB_NS_PREFIX and remove dependency on Thrust to linking issues with Torch 1.8 [Bugfix] Wrap cub with CUB_NS_PREFIX and remove dependency on Thrust to linking issues with Torch 1.8 Mar 17, 2021
@BarclayII
Copy link
Collaborator

BarclayII commented Mar 18, 2021

So should we avoid using thrust altogether in DGL later on?

Also a side question: if I write a PyTorch CUDA extension that uses thrust (or linking/dlopening two libraries both using thrust), would it also have symbol duplication?

EDIT: the answer is yes: pytorch/pytorch#52663

@nv-dlasalle
Copy link
Collaborator Author

@BarclayII I think until NVIDIA/thrust#1401 is resolved, it will be simplest to avoid using it in DGL.

@yzh119
Copy link
Member

yzh119 commented Mar 22, 2021

@BarclayII I think this PR fix the low-accuracy issue of dgl + cu112.

Copy link
Member

@jermainewang jermainewang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for the fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants