SGE issue: could not connect #24

dkoslicki · 2015-10-08T05:05:39Z

I tried using the following example:

using ClusterManagers
ClusterManagers.addprocs_sge(4)

And got the following output:

job id is 2559006, waiting for job to start ............................................................
    From worker 0:  Master process (id 1) could not connect within 60.0 seconds.
    From worker 0:  exiting.

I noticed (using qstat -U) that the jobs were started:

-bash-4.1$ qstat -U koslickd
job-ID  prior   name       user         state submit/start at     queue                          slots ja-task-ID 
-----------------------------------------------------------------------------------------------------------------
2559006 0.50500 julia-3457 koslickd     t     10/07/2015 21:58:44 all.q@chrom15.cgrb.oregonstate     1 4
2559006 0.50500 julia-3457 koslickd     t     10/07/2015 21:58:44 all.q@chrom16.cgrb.oregonstate     1 3
2559006 0.50500 julia-3457 koslickd     t     10/07/2015 21:58:44 all.q@chrom22.cgrb.oregonstate     1 2
2559006 0.50500 julia-3457 koslickd     r     10/07/2015 21:58:44 math@math0.cgrb.oregonstate.lo     1 1

But shortly thereafter, the jobs seemed to disappear:

-bash-4.1$ qstat -U koslickd
job-ID  prior   name       user         state submit/start at     queue                          slots ja-task-ID 
-----------------------------------------------------------------------------------------------------------------
2559006 0.50500 julia-3457 koslickd     r     10/07/2015 21:58:44 math@math0.cgrb.oregonstate.lo     1 1

Julia then seemed to freeze indefinitely and had to be killed manually.

Any ideas as to what went wrong?

Thanks,

~David

The text was updated successfully, but these errors were encountered:

gcamilo · 2015-10-09T15:28:33Z

I think the problem is that the nodes are not being launched with the --worker flags, but It's curious that I ran into some problems that you did not. Try adding to line 29 in qsub.jl --worker after $execflags.

dkoslicki · 2015-10-09T15:43:54Z

I think I may have found the problem: I'm using Julia version 0.3.3, so Pkg.add("ClusterManagers") was giving me version 0.0.2. I'll have to ask the sys admin to update Julia to v 0.4.0 I guess.

gcamilo · 2015-10-09T15:48:46Z

I installed Julia to my user directory as that path could take months over here.

dkoslicki · 2015-10-09T15:51:01Z

Yeah, that will probably be the best route, thanks!

jiahao · 2016-04-12T17:26:27Z

Closing as probably fixed by upgrading to Julia 0.4

jiahao closed this as completed Apr 12, 2016

lstagner mentioned this issue Sep 27, 2016

PBS Updates #47

Merged

oschub mentioned this issue Sep 4, 2017

Sometimes addprocs_sge works, sometimes it doesn't #78

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SGE issue: could not connect #24

SGE issue: could not connect #24

dkoslicki commented Oct 8, 2015

gcamilo commented Oct 9, 2015

dkoslicki commented Oct 9, 2015

gcamilo commented Oct 9, 2015

dkoslicki commented Oct 9, 2015

jiahao commented Apr 12, 2016

SGE issue: could not connect #24

SGE issue: could not connect #24

Comments

dkoslicki commented Oct 8, 2015

gcamilo commented Oct 9, 2015

dkoslicki commented Oct 9, 2015

gcamilo commented Oct 9, 2015

dkoslicki commented Oct 9, 2015

jiahao commented Apr 12, 2016