-
-
Notifications
You must be signed in to change notification settings - Fork 138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Slurm refactor and other fixes #25
Conversation
Ping @jhamman for review too. |
This all looks fine to me. It would also be good if 1 or 2 folks could give it a spin on a slurm cluster. |
i'll have a look tomorrow.
…On Tue, Mar 27, 2018 at 4:33 PM, Joe Hamman ***@***.***> wrote:
This all looks fine to me. It would also be good if 1 or 2 folks could
give it a spin on a slurm cluster.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#25 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABJwrN-JDYg1lkMgkN2iRy0I1GUZd8Xeks5tisxEgaJpZM4S9vJP>
.
--
Ben Weinstein, Ph.D.
Postdoctoral Fellow
University of Florida
http://benweinstein.weebly.com/
|
Me too. Thanks.
John
… On Mar 27, 2018, at 7:37 PM, Ben Weinstein ***@***.***> wrote:
i'll have a look tomorrow.
On Tue, Mar 27, 2018 at 4:33 PM, Joe Hamman ***@***.***>
wrote:
> This all looks fine to me. It would also be good if 1 or 2 folks could
> give it a spin on a slurm cluster.
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <#25 (comment)>,
> or mute the thread
> <https://github.com/notifications/unsubscribe-auth/ABJwrN-JDYg1lkMgkN2iRy0I1GUZd8Xeks5tisxEgaJpZM4S9vJP>
> .
>
--
Ben Weinstein, Ph.D.
Postdoctoral Fellow
University of Florida
http://benweinstein.weebly.com/
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
I had some problems getting this working. I think it was independent of these changes. Using dask-scheduler and dask-worker on the command line I saw intermittent failure. Once I installed distributed from the master branch on github things became a littler saner. Then I installed dask_jobqueue and ran a simple series of commands:
Works fine. I'll report back if I have any problems with more realistic pipelines. By the way there's a few places where pbs needs to be replaced with slurm in slurm.py |
Any final comments here? This looks good to me. |
Perhaps change the default resources? As far as I could tell it inherits the defaults from a dask class and it’s quite a lot of resources that are requested. Might be kinder to people who are learning the interface to have more modest ones set.
… On Mar 29, 2018, at 6:23 PM, Joe Hamman ***@***.***> wrote:
Any final comments here? This looks good to me.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
I guess changing the readme to include memory, and thread specification would solve this best though.
John
… On Mar 29, 2018, at 6:23 PM, Joe Hamman ***@***.***> wrote:
Any final comments here? This looks good to me.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
So I've followed leej3 command to lower the thread and processes default to something lower. That maid me think that there was currently no way to specify your own cpus or mem limit to give to SBATCH Slurm command. So I've added some parameters to match for the PBS "resource_spec" keyword. No we are able to specify independently the cpus/mem requested to Slurm, and the processes/threads/memory options to give to workers. |
It looks like a couple people are waiting for this to go in. Is there anything stopping progress here or is it just that no one feels comfortable merging? |
@mrocklin - I had missed the notification that this was all done. At this point I'd prefer to encourage fairly rapid iterations as we work out the API so I'll try not to let things sit idle like this again. Thanks @guillaumeeb. |
Closes #24, closes #22, closes #15, closes #21, closes #19;
There are a lot of modifications here, sorry... I was working on #22 when I found the #24 problem. I've also corrected some docstrings errors.
Moreover, I've extracted the environment variable part from SLURMCluster to put it directly in JobQueueCluster. Don't know if we should keep it at all, that may be usefull.
I've no way to test the SLURM part, so if somebody is willing to do it: @leej3 or @bw4sz?