Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Fix that pin_prefetcher is not actually enabled #4169

Merged
merged 1 commit into from
Jun 27, 2022

Conversation

yaox12
Copy link
Collaborator

@yaox12 yaox12 commented Jun 26, 2022

Description

Fix that pin_prefetcher is not actually enabled.

Changes

  1. Set pin_memory=self.pin_prefetcher to enable pin memory for returned indices. Note: subgraph is still not pinned.
  2. Correctly enable pin memory for column fetching.

@yaox12 yaox12 requested a review from BarclayII June 26, 2022 03:59
@yaox12
Copy link
Collaborator Author

yaox12 commented Jun 26, 2022

I think pin_prefetcher now has the same semantic as Pytorch dataloader's pin_memory except that sampled subgraphs are not pinned yet. pin_prefetcher is valid only when sampling is on CPU and destination device is GPU and not use_uva, so pinning CUDA tensors won't occur.

# (BarclayII) I hoped that pin_prefetcher can be merged into PyTorch's native
# pin_memory argument. But our neighbor samplers and subgraph samplers
# return indices, which could be CUDA tensors (e.g. during UVA sampling)
# hence cannot be pinned. PyTorch's native pin memory thread does not ignore
# CUDA tensors when pinning and will crash. To enable pin memory for prefetching
# features and disable pin memory for sampler's return value, I had to use
# a different argument. Of course I could change the meaning of pin_memory
# to pinning prefetched features and disable pin memory for sampler's returns
# no matter what, but I doubt if it's reasonable.

And since the previous pin_prefetcher didn't take effect correctly (as fixed in this PR), I think merging pin_prefetcher into PyTorch's native pin_memory would not affect existing code and examples.

@dgl-bot
Copy link
Collaborator

dgl-bot commented Jun 27, 2022

To trigger regression tests:

  • @dgl-bot run [instance-type] [which tests] [compare-with-branch];
    For example: @dgl-bot run g4dn.4xlarge all dmlc/master or @dgl-bot run c5.9xlarge kernel,api dmlc/master

@dgl-bot

This comment was marked as outdated.

@dgl-bot
Copy link
Collaborator

dgl-bot commented Jun 27, 2022

Commit ID: 62c39e3

Build ID: 2

Status: ✅ CI test succeeded

Report path: link

Full logs path: link

@yaox12 yaox12 merged commit b8f905f into dmlc:master Jun 27, 2022
@yaox12 yaox12 deleted the fix_pin_prefetcher branch June 27, 2022 08:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants