-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bugfix] Accessing data from the indexes stored in same device #4242
Conversation
To trigger regression tests:
|
@@ -133,6 +133,7 @@ def run(proc_id, n_gpus, args, devices, data): | |||
# blocks. | |||
tic_step = time.time() | |||
for step, (input_nodes, pos_graph, neg_graph, blocks) in enumerate(dataloader): | |||
input_nodes = input_nodes.to(nfeat.device) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When features are on the CPU, input_nodes
will first be copied to GPU in the dataloader and then copied back to the CPU for indexing. It's not caused by this PR and can be eliminated by unifying --graph_device
and --data_device
just like other examples. Perhaps we should take a note and fix it when we refactor this example in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I generally agree. It does sound odd that copying input_nodes
back and forth and it should be fixed when we refactor it. However, here, it seems input_nodes
always belong to the same device
as dataloader (--gpu
), which is configured/controlled differently than nfeat (--data-device
) and g (--graph-device
)...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the new sampling pipeline, we can specify prefetch_node_features
to the sampler and don't need batch_inputs = nfeat[input_nodes].to(device)
anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see... Thanks Xin for noting this. I will keep it in mind :)
pos_graph = pos_graph.to(device) | ||
neg_graph = neg_graph.to(device) | ||
blocks = [block.int().to(device) for block in blocks] | ||
blocks = [block.int() for block in blocks] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since pos_graph
, neg_graph
, blocks
all reside in the same device as dataloader (L105), moving to device
is redundant.
…4242) * First update to fix two examples * Update to fix RGCN/graphsage example and dataloader * Update
Description
To address #4234, this PR fixes the crashing example cases, rgcn and graphsage, due to recent pytorch update
Checklist
Please feel free to remove inapplicable items for your PR.
or have been fixed to be compatible with this change
Changes
Move index to the same device as data.