Skip to content

Enabled Weighted Sampling#635

Merged
mkolodner-sc merged 20 commits into
mainfrom
mkolodner-sc/enable_weighted_sampling
May 19, 2026
Merged

Enabled Weighted Sampling#635
mkolodner-sc merged 20 commits into
mainfrom
mkolodner-sc/enable_weighted_sampling

Conversation

@mkolodner-sc
Copy link
Copy Markdown
Collaborator

@mkolodner-sc mkolodner-sc commented May 12, 2026

Summary

Adds native weighted edge sampling to GiGL's distributed training pipeline via GLT's CPUWeightedSampler. When enabled, neighbors are sampled proportionally to edge weights rather than uniformly.

New API

  • DistPartitioner.register_edge_weights(edge_weights) — registers a 1D per-edge weight tensor (homogeneous or dict[EdgeType, Tensor] for heterogeneous) before calling partition_edge_index_and_edge_features(). Weights are partitioned alongside edge features in the same pass (co-partitioned, mirroring the node features + labels pattern).
  • load_torch_tensors_from_tf_record(weight_edge_feat_name=...) — accepts the name of an existing edge feature column to extract as sampling weights during TFRecord loading. The column is sliced out of the feature tensor and stored in LoadedGraphTensors.edge_weights; it is never duplicated in memory.
  • build_dataset(weight_edge_feat_name=...) — threads weight_edge_feat_name through to TFRecord loading and then calls register_edge_weights() with the extracted weights.
  • DistNeighborLoader(with_weight=True) / DistABLPLoader(with_weight=True) — enables weighted sampling. Defaults to False; must be set explicitly.
  • BaseDistLoader.validate_with_weight() — shared validation: raises ValueError if with_weight=True but no weights are registered in the dataset; raises NotImplementedError if used with PPRSamplerOptions (weight-proportional PPR residual propagation is deferred to a future PR).

Implementation notes

  • LoadedGraphTensors.edge_weights — new field carrying extracted weights from TFRecord loading through to register_edge_weights().
  • GraphPartitionData.weights (field already existed) carries the partitioned weight tensor to DistDataset._initialize_graph(), which forwards it to GLT's init_graph(edge_weights=...).
  • DistDataset.has_edge_weights property reflects whether weights were registered at construction time.
  • SamplingConfig.with_weight is now threaded through from the loader rather than hardcoded to False.
  • Graph Store mode: DistServer.get_edge_weights_registered() and RemoteDistDataset.fetch_edge_weights_registered() propagate has_edge_weights across the RPC boundary so compute nodes can validate with_weight against the remote dataset.

Tests

  • tests/unit/distributed/distributed_weighted_sampling_test.py (8 new tests):
    • Correctness (homogeneous + heterogeneous): weight=0 edges to "bad" nodes are never traversed in sampled subgraphs — verified by encoding node class in features and asserting no bad node appears after weighted sampling.
    • Partitioner edge cases: features only, weights only, neither, both (with consistency check that GraphPartitionData.edge_ids == FeaturePartitionData.ids), and heterogeneous partial weights (one edge type weighted, another not).
  • tests/unit/common/data/dataloaders_test.py (1 new test): test_load_edge_weights_from_tf_record — verifies that load_torch_tensors_from_tf_record correctly extracts a named column into edge_weights, removes it from edge_features, and returns the right shapes and values.

@mkolodner-sc mkolodner-sc changed the title [WIP] Enabled Weighted Sampling Enabled Weighted Sampling May 13, 2026
Copy link
Copy Markdown
Collaborator

@kmontemayor2-sc kmontemayor2-sc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Matt! Me and the robots did a first pass, it's possible they're imagining some of the issues here but I figured I'd flag :)

Comment thread gigl/common/data/load_torch_tensors.py Outdated
Comment thread gigl/distributed/dist_ablp_neighborloader.py Outdated
Comment thread gigl/distributed/dist_dataset.py Outdated
Comment thread gigl/distributed/dist_partitioner.py Outdated
Comment thread gigl/distributed/dist_partitioner.py Outdated
Comment thread gigl/distributed/dist_partitioner.py
Comment thread gigl/common/data/load_torch_tensors.py
Comment thread gigl/distributed/dist_dataset.py
Comment thread gigl/common/data/load_torch_tensors.py Outdated
Comment thread gigl/common/data/load_torch_tensors.py Outdated
@mkolodner-sc
Copy link
Copy Markdown
Collaborator Author

/unit_test

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 14, 2026

GiGL Automation

@ 23:03:34UTC : 🔄 Python Unit Test started.

@ 24:11:45UTC : ❌ Workflow failed.
Please check the logs for more details.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 14, 2026

GiGL Automation

@ 23:03:35UTC : 🔄 C++ Unit Test started.

@ 23:05:34UTC : ✅ Workflow completed successfully.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 14, 2026

GiGL Automation

@ 23:03:35UTC : 🔄 Scala Unit Test started.

@ 23:13:48UTC : ✅ Workflow completed successfully.

@mkolodner-sc
Copy link
Copy Markdown
Collaborator Author

/unit_test

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 15, 2026

GiGL Automation

@ 06:09:08UTC : 🔄 Scala Unit Test started.

@ 06:19:33UTC : ✅ Workflow completed successfully.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 15, 2026

GiGL Automation

@ 06:09:09UTC : 🔄 C++ Unit Test started.

@ 06:13:19UTC : ✅ Workflow completed successfully.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 15, 2026

GiGL Automation

@ 06:09:09UTC : 🔄 Python Unit Test started.

@ 07:13:24UTC : ✅ Workflow completed successfully.

Copy link
Copy Markdown
Collaborator

@kmontemayor2-sc kmontemayor2-sc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Matt!

Comment thread gigl/common/data/load_torch_tensors.py Outdated
Comment thread gigl/distributed/base_dist_loader.py Outdated
Comment thread gigl/distributed/dist_partitioner.py Outdated
Comment thread gigl/distributed/dist_partitioner.py Outdated
Comment thread gigl/distributed/dist_dataset.py Outdated
Comment thread gigl/distributed/dist_dataset.py Outdated
@mkolodner-sc
Copy link
Copy Markdown
Collaborator Author

/unit_test

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 18, 2026

GiGL Automation

@ 22:16:50UTC : 🔄 Python Unit Test started.

@ 23:36:03UTC : ❌ Workflow failed.
Please check the logs for more details.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 18, 2026

GiGL Automation

@ 22:16:51UTC : 🔄 Scala Unit Test started.

@ 22:26:16UTC : ✅ Workflow completed successfully.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 18, 2026

GiGL Automation

@ 22:16:51UTC : 🔄 C++ Unit Test started.

@ 22:18:35UTC : ✅ Workflow completed successfully.

@mkolodner-sc mkolodner-sc marked this pull request as ready for review May 18, 2026 23:54
@mkolodner-sc mkolodner-sc added this pull request to the merge queue May 18, 2026
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks May 18, 2026
@mkolodner-sc mkolodner-sc added this pull request to the merge queue May 19, 2026
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks May 19, 2026
@mkolodner-sc mkolodner-sc added this pull request to the merge queue May 19, 2026
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks May 19, 2026
@mkolodner-sc mkolodner-sc added this pull request to the merge queue May 19, 2026
Merged via the queue into main with commit 99c26a1 May 19, 2026
7 checks passed
@mkolodner-sc mkolodner-sc deleted the mkolodner-sc/enable_weighted_sampling branch May 19, 2026 04:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants