-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Performance] In HeteroNodeView, build arange on target device, instead of on CPU and copying it #2266
Conversation
How was the speed up? Just curious; not knowing the number is fine. |
The overall speed up is pretty small, but in the region immediately following its in invocation in the graphsage example, we see a reduction from 1.8ms to 547us (> 3x), as a result of not having to wait for the H2D copy: But as more synchronization points are removed, the impact could be larger. In the above images, synchronizing to check if the COO is sorted is what limits the benefit. |
d574a9f
to
959a1be
Compare
959a1be
to
e48d87b
Compare
return nd.cast_to_signed(nd.from_dlpack(zerocopy_to_dlpack(data))) | ||
else: | ||
return nd.from_dlpack(zerocopy_to_dlpack(data)) | ||
return nd.from_dlpack(zerocopy_to_dlpack(data)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need this modification?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the effect is the same. Do you prefer keeping the old code? It doesn't look like a big deal to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a potential bug(feature?) in tensorflow dlpack, the device of int32 tensor is not accurate. I think this is fixed in the latest tf, but we need to keep it for backward compatibility
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reverted the change the else clause in 2abb841.
With respect as to why this method needed to be modified, in the unit tests for int64_t I ran into the issue that the pointer in the tensor was silently converted to a CPU pointer rather than a GPU pointer. The code was already handling the in32_t case, so I expanded it to also handle the int64_t case. This was with TensorFlow 2.3.1.
This wasn't an issue in the unit tests before this PR, as the tensor was generated on the CPU, and then explicitly copied to the GPU.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm good after Da's comment is resolved.
…ad of on CPU and copying it (dmlc#2266) * Build arange on target device * Utilize arange device in viewpy:HeteroNodeView.__call__ * Work around uint64 error in TF to_dlpack * Restore else clause Co-authored-by: Jinjing Zhou <VoVAllen@users.noreply.github.com> Co-authored-by: Da Zheng <zhengda1936@gmail.com>
…ad of on CPU and copying it (dmlc#2266) * Build arange on target device * Utilize arange device in viewpy:HeteroNodeView.__call__ * Work around uint64 error in TF to_dlpack * Restore else clause Co-authored-by: Jinjing Zhou <VoVAllen@users.noreply.github.com> Co-authored-by: Da Zheng <zhengda1936@gmail.com>
Description
Create the arange on the target device directly using the backend, rather than creating it on the CPU, and copying it the target device.
Checklist
Please feel free to remove inapplicable items for your PR.
test_update_routines()
,test_update_0deg()
, etc. )or have been fixed to be compatible with this change
Changes
The
arange
method in the backends was modified to accept a device context, and the HeteroNodeView was modified to make use of this, to avoid generating the range on the CPU, and then synchronously copying it up to the GPU.