Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sort the server list before iterate to keep consistent between workers #2956

Merged
merged 1 commit into from Jun 30, 2023

Conversation

sighingnow
Copy link
Collaborator

What do these changes do?

In some cases the server list is in different order inside different process, and yields unexpected partition dispatch plan, e.g.,

worker 1:

2023-06-30 04:55:27.508318003 INFO  (src/bin/gaia_executor.rs:167) [main] partition_lists before dedup = {1: [0, 1, 2, 3], 0: [0, 1, 2, 3]}
2023-06-30 04:55:27.508339604 INFO  (src/bin/gaia_executor.rs:177) [main] partition_lists = {1: [2, 3], 0: [0, 1]}
2023-06-30 04:55:27.508355604 INFO  (src/bin/gaia_executor.rs:190) [main] partition_server_index_map = {0: 0, 2: 1, 3: 1, 1: 0}
2023-06-30 04:55:27.508361404 INFO  (src/bin/gaia_executor.rs:105) [main] server_index: 0, partition_server_index_map: {0: 0, 2: 1, 3: 1, 1: 0}
2023-06-30 04:55:27.508449905 INFO  (/work/interactive_engine/executor/engine/pegasus/server/src/rpc.rs:314) [main] starting RPC job server on 0.0.0.0:8257 ...
2023-06-30 04:55:27.508457305 INFO  (src/bin/gaia_executor.rs:198) [main] RPC server of server[0] start on 0.0.0.0:8257

worker 2:

2023-06-30 04:55:27.508489205 INFO  (src/bin/gaia_executor.rs:167) [main] partition_lists before dedup = {0: [0, 1, 2, 3], 1: [0, 1, 2, 3]}
2023-06-30 04:55:27.508507605 INFO  (src/bin/gaia_executor.rs:177) [main] partition_lists = {0: [2, 3], 1: [0, 1]}
2023-06-30 04:55:27.508523506 INFO  (src/bin/gaia_executor.rs:190) [main] partition_server_index_map = {2: 0, 0: 1, 3: 0, 1: 1}
2023-06-30 04:55:27.508529206 INFO  (src/bin/gaia_executor.rs:105) [main] server_index: 1, partition_server_index_map: {2: 0, 0: 1, 3: 0, 1: 1}
2023-06-30 04:55:27.508591606 INFO  (/work/interactive_engine/executor/engine/pegasus/server/src/rpc.rs:314) [main] starting RPC job server on 0.0.0.0:8259 ...
2023-06-30 04:55:27.508598706 INFO  (src/bin/gaia_executor.rs:198) [main] RPC server of server[1] start on 0.0.0.0:8259

Related issue number

Fixes #2675

Signed-off-by: Tao He <linzhu.ht@alibaba-inc.com>
@sighingnow sighingnow marked this pull request as ready for review June 30, 2023 13:22
@sighingnow sighingnow merged commit 21fd644 into alibaba:main Jun 30, 2023
18 checks passed
@sighingnow sighingnow deleted the ht/fix-servers-order branch June 30, 2023 13:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] (Nondeterministic) incorrect query result on subgraphs
1 participant