-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GraphBolt] Update ItemSampler
#7408
Conversation
To trigger regression tests:
|
The fact that runtime performance is unchanged is good. However, to verify whether the old or the new implementation is more performant, we need to track the CPU utilization. Since ItemSampler and rest of the sampling pipeline runs concurrently, runtime is not enough information to determine that. |
Sure, could u please give me some guide on how to do that? |
The simplest way to monitor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
code looks clean and code to me.
One more thing is please benchmark on larger dataset with larger batch-size such as ogbn-papers100M
and heterogenous dataset, link prediction datasets(to measure the perf of indexing on tuple of tensors). Let's make sure it's performance efficient on most common datasets.
@mfbalin Thank you for your valuable suggestions, but considering the scope of this PR, I'd like to defer them to another PR. @Rhett-Ying If everything looks good to you, please approve so I can work on. |
@Skeleton003 According to the doc, |
Description
ItemSampler
to support correct stochastic sharding across distributed groups.ItemSet.__getitem__()
when index is an iterable of int.Benchmark: https://docs.google.com/document/d/1Pzk2PJoFtTZSu17wTXVK4mqvfrMLAj2xK6fcGC1pwEg/edit?usp=sharing
Checklist
Please feel free to remove inapplicable items for your PR.
Changes