Conversation
|
shower thought: if you want, I think this PR would be appropriate for updating CHANGELOG. Or you can do so once SNC is fully complete. Not blocking just something for the future :) |
svij-sc
left a comment
There was a problem hiding this comment.
Thanks for the changes.
Just one blocking comment re: use of .clamp_ - rest are nits, but would also appreciate addressing.
There was a problem hiding this comment.
Its probably also worth supporting more of the api here:
https://pytorch-geometric.readthedocs.io/en/latest/generated/torch_geometric.transforms.RandomNodeSplit.html#torch_geometric.transforms.RandomNodeSplit
i.e.
num_val (int or float, optional) – The number of validation samples. If float, it represents the ratio of samples to include in the validation set. (default: 500)****
I'll leave this as a follow-up and add a task for this, since this also should be done for the HashedNodeAnchorLinkSplitter in that case and this PR is already large enough as is. |
Scope of work done
This implementation provides additional value over the GLT splitting implementation by making less rigorous assumptions about the node ids that are provided to the splitter, such as not requiring every node to be split upon (by requiring as an argument the total number of nodes N and splitting the ids from 0 to N). This may be useful in cases where we may only want to provide a subset of nodes to the splitter when we have an imbalanced dataset. Additionally, if we want to include a node id multiple times, it ensures that it will be hashed to the same split each time.
Where is the documentation for this feature?: N/A
Did you add automated tests or write a test plan?
Updated Changelog.md? NO
Ready for code review?: NO