Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ClusterManager should use dispatch instead of function pointers #8168

Closed
eschnett opened this issue Aug 28, 2014 · 11 comments
Closed

ClusterManager should use dispatch instead of function pointers #8168

eschnett opened this issue Aug 28, 2014 · 11 comments

Comments

@eschnett
Copy link
Contributor

ClusterManager is defined like this:

immutable LocalManager <: ClusterManager
    launch::Function
    manage::Function
    LocalManager() = new(launch_local_workers, manage_local_worker)
end
launch_local_workers = ...
manage_local_worker = ...

This uses function pointers instead of multiple dispatch. There does not seem to be a need for this flexibility -- cluster manager objects do not modify these pointers once they are created.

This should probably read like this instead:

immutable LocalManager <: ClusterManager
end
cm_launch_workers = ...
cm_manager_worker = ...

Similarly for SSHManager.

@JeffBezanson
Copy link
Sponsor Member

I don't see a significant need to change this.

@simonster
Copy link
Member

I think the interface here should be revisited in any case. At the moment, the cluster manager's launch function launches the workers and then Julia connects to them after all of the workers are launched. While simple in principle, this is problematic if one calls addprocs with a large n and some workers get started faster than others. The launch function needs to wait for all workers to be launched before it returns, and by then it may be more than 60 seconds since the first workers launched, so they may be dead. It might be better if the cluster manager were responsible for calling create_worker as soon as each worker is launched.

@floswald
Copy link

+1 for revisiting this. Maybe it's feasible to integrate something I've been asking for on the ClusterManagers.jl repo, i.e. that there should be room for a batch version of the cluster manager (or at least some guidance). on most large systems it's infeasible to run a job interactively, so I think it would be desirable to have some kind of batch manager. That means you first get a list of compute node names from a scheduler, then launch a master julia which takes that list and launches on all nodes with addprocs. (example linked in above issue).
Related: Is it necessary to have this ClusterManagers.jl package at all? Most of what I used is in multi.jl anyway. (ping @ViralBShah, JuliaParallel/ClusterManagers.jl#14)

@ViralBShah
Copy link
Member

That is essentially what ClusterManagers does. If you allocate the nodes on your own outside of julia and then pass those to julia at startup, ClusterManagers shouldn't be required. This is fine for batch usage, but for interactive usage to connect the head node to the allocated clusters, some more setup is perhaps required.

Cc: @amitmurthy

@floswald
Copy link

Yes. I guess i'm talking about something that does the part

if you allocate the nodes on your own outside of julia and then pass those to julia at startup

I guess you refer to julia --machinefile? here is my experience with that on my cluster:

[eisuc151@blue34 cluster]$ 2node
qsub: waiting for job 3975313.blue30 to start
qsub: job 3975313.blue30 ready

Running PBS prologue script on red0183

End of prologue. Now executing user's job
------------------------------------------------------------------------------

[eisuc151@red0183 ~]$ cat $PBS_NODEFILE
red0183
red0183
red0183
red0183
red0183
red0183
red0183
red0183
red0183
red0183
red0183
red0183
red0360
red0360
red0360
red0360
red0360
red0360
red0360
red0360
red0360
red0360
red0360
red0360
[eisuc151@red0183 ~]$ julia --machinefile $PBS_NODEFILE
Warning: Permanently added 'red0183,10.11.8.183' (RSA) to the list of known hosts.
ssh_exchange_identification: Connection closed by remote host
ssh_exchange_identification: Connection closed by remote host
ssh_exchange_identification: Connection closed by remote host
Warning: Permanently added 'red0360,10.11.9.106' (RSA) to the list of known hosts.
ssh_exchange_identification: Connection closed by remote host
Master process (id 1) could not connect within 60.0 seconds.
exiting.
Master process (id 1) could not connect within 60.0 seconds.
exiting.
Master process (id 1) could not connect within 60.0 seconds.
exiting.
...
ERROR: interrupt
 in exec at ./pcre.jl:128
 in match at ./regex.jl:119
 in parse_connection_info at multi.jl:1090
 in read_worker_host_port at multi.jl:1037
 in read_cb_response at multi.jl:1015
 in start_cluster_workers at multi.jl:1027
 in addprocs_internal at multi.jl:1234
 in addprocs at multi.jl:1244
 in process_options at ./client.jl:240
 in _start at ./client.jl:354
 in _start_3B_1720 at /home/eisuc151/git/julia0.3/usr/bin/../lib/julia/sys.so

If instead i launch one julia master, read and parse ENV["PBS_NODEFILE"], and then add all the workers one by one things work just fine. Not sure how, but it seems to relate somehow to what @simonster said above.

@amitmurthy
Copy link
Contributor

The above problem could probably be solved by tweaking MaxStartups in sshd_config .

From the man page:

MaxStartups

Specifies the maximum number of concurrent unauthenticated connections 
to the SSH daemon.  Additional connections will be dropped until authentication
succeeds or the LoginGraceTime expires for a connection. 
The default is 10:30:100.

Alternatively, random early drop can be enabled by specifying the three colon 
separated values “start:rate:full” (e.g. "10:30:60").  sshd(8) will refuse connection
attempts with a probability of “rate/100” (30%) if there are currently “start” (10) 
unauthenticated connections.  The probability increases linearly and all connection 
attempts are refused if the number of unauthenticated connections reaches “full” (60).

So, on my machine, with 10 concurrent unauthenticated connections (i.e., ssh setup is in progress), further connections will be refused with a probability rate of 30%.

You could try tweaking the above parameter, or alternatively, limiting each unique hostname in $PBS_NODEFILE to 10.

As an example, you could look at ec2_addprocs in https://github.com/amitmurthy/AWS.jl/blob/master/src/ec2_simple.jl which basically limits concurrent unauthenticated connections to 10.

@amitmurthy
Copy link
Contributor

I am wondering if this should be the default behavior of addprocs itself. i.e., it internally loops and sets up workers on each unique host 10 at a time. A keyword arg could provide a means to tweak this number.

#7616 should be enhanced to support this arg too.

@floswald
Copy link

floswald commented Sep 1, 2014

Gotcha. That must be the reason we my manual approach of looping over the
machine file and addprocs one by one does work while this here doesn't. I'm
pretty sure I can't change ssh_config as a user on this machine, so
changing the addprocs default would probably be helpful.

On Monday, 1 September 2014, Amit Murthy notifications@github.com wrote:

I am wondering if this should be the default behavior of addprocs itself.
i.e., it internally loops and sets up workers on each unique host 10 at a
time. A keyword arg could provide a means to tweak this number.

#7616 #7616 should be enhanced
to support this arg too.


Reply to this email directly or view it on GitHub
#8168 (comment).

@JeffBezanson
Copy link
Sponsor Member

I'm glad this has spurred a useful discussion. Would somebody like to open a new issue for MaxStartups?

However this is pretty far from the original topic of this issue. @amitmurthy how do you feel about the proposed change?

@amitmurthy
Copy link
Contributor

Will submit a PR addressing @floswald issue in the next couple of days.

In principle, I agree with revisiting the ClusterManager interface. Other than the multiple dispatch change, separating the "launch" and "create_worker" into two concurrent tasks will also address @simonster 's issue w.r.t the delayed "create_worker" causing some workers to prematurely exit.

@JeffBezanson
Copy link
Sponsor Member

fixed by #8306

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants