-
Notifications
You must be signed in to change notification settings - Fork 13
Open
Description
Re: This thread https://github.com/Snapchat/GiGL/pull/432/files#r2687675025
We currently sample equally from each storage node for each compute node. So if we have the below setup with 2 storage nodes and 4 compute nodes:
Storage node 0:
[0, 1, 2, 3]
Storage node 1:
[4, 5, 6, 7]
Compute node 0 samples:
[[0], [4]]
Compute node 1 samples:
[[1], [5]]
Compute node 2 samples:
[[2], [6]]
Compute node 3 samples:
[[3], [7]]
This may not be efficient as we'd have more overall network connections, and it may be more efficient to have some setup like:
Compute node 0 samples:
[[0, 1], []]
Compute node 1 samples:
[[2, 3], []]
Compute node 2 samples:
[[], [4, 5]]
Compute node 3 samples:
[[], [6, 7]]
To reduce overall network chatter across the cluster.
Fortunately since the input_nodes are entirely user controller - they should be able to tune it, and we can add some flag to RemoteDistDataset.get_node_ids 1 to control how we shard.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels