Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Underministic behavior of addprocs() of SSHManager #8

Open
JonnyKong opened this issue Oct 27, 2023 · 1 comment
Open

Underministic behavior of addprocs() of SSHManager #8

JonnyKong opened this issue Oct 27, 2023 · 1 comment

Comments

@JonnyKong
Copy link

JonnyKong commented Oct 27, 2023

Given an array of node addresses as input, this function returns an array of launched worker PIDs. However, the returned pids do not necesssarily match the order of input addresses.

For example, the outcome of (p1, p2) = addprocs([machine1, machine2]) may be p1 running on machine2 and p2 running on machine1, or vice versa.

The cause of such underministic behavior is that launch(manager::SSHManager, ...) launches workers in parallel. Upon launching each worker, the pid of that worker will be pushed to launched, where no synchronization / ordering is performed:

@sync for (i, (machine, cnt)) in enumerate(manager.machines)
let machine=machine, cnt=cnt
@async try
launch_on_machine(manager, $machine, $cnt, params, launched, launch_ntfy)
catch e
print(stderr, "exception launching on machine $(machine) : $(e)\n")
end
end
end

While this is not a bug, this undeterministic behavior seems counter-intuitive and is error-prone.

@JonnyKong JonnyKong reopened this Oct 27, 2023
@JonnyKong
Copy link
Author

Reopening this issue (was closed by mistake).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant