New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
addprocs_slurm() fails since Julia 0.5 #48
Comments
Do you have more than one version of Julia installed? |
This is most probably due to the workers being 0.4 and the master 0.5 . |
I have just the official 0.5 precompiled binary of Julia on all systems.
Edit: I had 0.4.6 and 0.5.0 installed previously, then removed 0.4.6 and tried again with the same result. |
I just tested |
Have you done a |
I did |
Unfortunately I don't have access to a SLURM setup to try this out. Can you post any output from the dead worker. I think it is written as Ref: ClusterManagers.jl/src/slurm.jl Line 49 in 00b1139
|
I found that |
Here the
I experience no difference between |
Can you checkout branch |
Here are the results:
|
I just pushed another update to |
Seems you had the right nose here, it is working now :)
So there is something wrong with the |
Maybe it has something to do with the local environment on your cluster. What is the locale on the worker nodes? Non-english language? @vtjnash : do you have any ideas as to what could be the problem here? The cookie is being passed as a required arg with The comparison which is failing on @axsk 's system is here - https://github.com/JuliaLang/julia/blob/0faf8ce200103839577171d28a4d1545fa827336/base/multi.jl#L1402-L1406 - basically the cookie read from the command line is compared to the one read from the socket. @axsk : are you open to building julia from source for the workers? I can provide a patch with appropriate debug statements to track down this issue. |
I'm open to building julia with the debug statetemets, but probably not today anymore, since I now got enough work with actually running the code I needed on the cluster :) |
Here I am with another update: I reinstalled all the packages (now on Julia 0.5.2), hence the |
Too old to reproduce. Please check the new release. |
When trying to start new procs on the Slurm cluster via
addprocs(SlurmManager(n))
I get the following error message (this worked with 0.4):Entering
Ctrl+C
after nothing happens after the error message crashes Julia :/The text was updated successfully, but these errors were encountered: