Skip to content

Add HashedNodeSplitter#273

Merged
mkolodner-sc merged 22 commits intomainfrom
mkolodner-sc/add_hashed_node_splitter
Aug 19, 2025
Merged

Add HashedNodeSplitter#273
mkolodner-sc merged 22 commits intomainfrom
mkolodner-sc/add_hashed_node_splitter

Conversation

@mkolodner-sc
Copy link
Copy Markdown
Collaborator

@mkolodner-sc mkolodner-sc commented Aug 14, 2025

Scope of work done

  • Adds the NodeAnchorSplitter/HashedNodeAnchorSplitter classes which aims to split node ids instead of edge indices, which the HashedNodeAnchorLinkSplitter is responsible for.
  • Re-uses constructs of the HashedNodeLinkSplitter where possible
  • Adds relevant tests

This implementation provides additional value over the GLT splitting implementation by making less rigorous assumptions about the node ids that are provided to the splitter, such as not requiring every node to be split upon (by requiring as an argument the total number of nodes N and splitting the ids from 0 to N). This may be useful in cases where we may only want to provide a subset of nodes to the splitter when we have an imbalanced dataset. Additionally, if we want to include a node id multiple times, it ensures that it will be hashed to the same split each time.

Where is the documentation for this feature?: N/A

Did you add automated tests or write a test plan?

Updated Changelog.md? NO

Ready for code review?: NO

Comment thread .github/workflows/on-pr-comment.yml
Copy link
Copy Markdown
Collaborator

@kmontemayor2-sc kmontemayor2-sc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Matt!

Comment thread .github/workflows/on-pr-comment.yml
Comment thread python/gigl/utils/data_splitters.py Outdated
Comment thread python/gigl/utils/data_splitters.py Outdated
Comment thread python/gigl/utils/data_splitters.py Outdated
Comment thread python/gigl/utils/data_splitters.py Outdated
Comment thread python/tests/unit/utils/data_splitters_test.py
Comment thread python/gigl/utils/data_splitters.py Outdated
Comment thread python/gigl/utils/data_splitters.py Outdated
Comment thread python/gigl/utils/data_splitters.py Outdated
@kmontemayor2-sc
Copy link
Copy Markdown
Collaborator

shower thought: if you want, I think this PR would be appropriate for updating CHANGELOG. Or you can do so once SNC is fully complete.

Not blocking just something for the future :)

Copy link
Copy Markdown
Collaborator

@yliu2-sc yliu2-sc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment thread python/gigl/utils/data_splitters.py
Comment thread python/gigl/utils/data_splitters.py Outdated
Comment thread python/gigl/utils/data_splitters.py Outdated
Comment thread python/tests/unit/utils/data_splitters_test.py Outdated
Comment thread python/tests/unit/utils/data_splitters_test.py
Comment thread CHANGELOG.md
Comment thread python/tests/unit/utils/data_splitters_test.py
Comment thread python/tests/unit/utils/data_splitters_test.py Outdated
Comment thread python/tests/unit/utils/data_splitters_test.py
Comment thread python/gigl/utils/data_splitters.py
Comment thread python/gigl/utils/data_splitters.py Outdated
Comment thread python/gigl/utils/data_splitters.py Outdated
Comment thread python/gigl/utils/data_splitters.py Outdated
Comment thread python/gigl/utils/data_splitters.py
Comment thread python/gigl/utils/data_splitters.py Outdated
Comment thread python/gigl/utils/data_splitters.py Outdated
Comment thread CHANGELOG.md
Comment thread python/gigl/utils/data_splitters.py Outdated
Comment thread python/gigl/utils/data_splitters.py
Comment thread python/tests/unit/utils/data_splitters_test.py
Copy link
Copy Markdown
Collaborator

@svij-sc svij-sc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes.
Just one blocking comment re: use of .clamp_ - rest are nits, but would also appreciate addressing.

Comment thread python/gigl/utils/data_splitters.py
Comment thread python/gigl/utils/data_splitters.py Outdated
Comment thread python/tests/unit/utils/data_splitters_test.py
Comment thread python/tests/unit/utils/data_splitters_test.py Outdated
Copy link
Copy Markdown
Collaborator

@svij-sc svij-sc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its probably also worth supporting more of the api here:
https://pytorch-geometric.readthedocs.io/en/latest/generated/torch_geometric.transforms.RandomNodeSplit.html#torch_geometric.transforms.RandomNodeSplit

i.e.

num_val (int or float, optional) – The number of validation samples. If float, it represents the ratio of samples to include in the validation set. (default: 500)****

Comment thread python/tests/unit/utils/data_splitters_test.py
Comment thread python/gigl/utils/data_splitters.py Outdated
Comment thread python/tests/unit/utils/data_splitters_test.py
Comment thread python/gigl/utils/data_splitters.py Outdated
@mkolodner-sc
Copy link
Copy Markdown
Collaborator Author

Its probably also worth supporting more of the api here: https://pytorch-geometric.readthedocs.io/en/latest/generated/torch_geometric.transforms.RandomNodeSplit.html#torch_geometric.transforms.RandomNodeSplit

i.e.

num_val (int or float, optional) – The number of validation samples. If float, it represents the ratio of samples to include in the validation set. (default: 500)****

I'll leave this as a follow-up and add a task for this, since this also should be done for the HashedNodeAnchorLinkSplitter in that case and this PR is already large enough as is.

@mkolodner-sc mkolodner-sc marked this pull request as ready for review August 19, 2025 17:46
@mkolodner-sc mkolodner-sc requested a review from nshah-sc as a code owner August 19, 2025 17:46
@mkolodner-sc mkolodner-sc added this pull request to the merge queue Aug 19, 2025
Merged via the queue into main with commit 21701da Aug 19, 2025
5 checks passed
@mkolodner-sc mkolodner-sc deleted the mkolodner-sc/add_hashed_node_splitter branch August 19, 2025 19:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants