Ensure node ids are always selected into the same split#104
Ensure node ids are always selected into the same split#104kmontemayor2-sc merged 14 commits intomainfrom
Conversation
|
/unit_test |
GiGL Automation@ 02:38:55UTC : 🔄 @ 03:06:03UTC : ❌ Workflow failed. |
|
/unit_test |
|
/integration_test |
|
/e2e_test |
GiGL Automation@ 03:53:41UTC : 🔄 @ 04:27:15UTC : ✅ Workflow completed successfully. |
GiGL Automation@ 03:53:44UTC : 🔄 @ 04:34:46UTC : ✅ Workflow completed successfully. |
GiGL Automation@ 03:53:48UTC : 🔄 @ 05:23:37UTC : ✅ Workflow completed successfully. |
|
/unit_test |
GiGL Automation@ 23:20:25UTC : 🔄 @ 23:43:39UTC : ❌ Workflow failed. |
|
/unit_test |
GiGL Automation@ 17:03:47UTC : 🔄 @ 17:25:55UTC : ❌ Workflow failed. |
|
/unit_test |
GiGL Automation@ 20:41:18UTC : 🔄 @ 21:06:43UTC : ❌ Workflow failed. |
svij-sc
left a comment
There was a problem hiding this comment.
Thanks for the iterations
|
/unit_test |
GiGL Automation@ 22:10:49UTC : 🔄 |
Changes:
val_numandtest_numas floats - since we can't guarantee counts easily anymore.The issue with our previous splitting logic, is that even though a given node id would always have the same hash on different machines, since we sorted the hashed, it's "position" may be different and so it may be selected into different splits.
For instance, let's assume the hash function is the identity, and
rank_0_nodes: [0, 1, 2, 3] rank_1_nodes: [3, 4, 5, 6]On rank 0,
3would be selected into Test, as its hash value is the greatest sorted, while it would be in train on rank 1.Now what we do is fine the globally largest/smallest hash and then normalize the hash values per machine.
We then select the nodes based on the normalized values, since the hashes are consistent, and they get normalized the same, the same node id will be selected into the same split always now.