Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add shebang option to support more shells #189

Merged
merged 19 commits into from Nov 9, 2018

Conversation

@danpf
Copy link
Contributor

danpf commented Nov 5, 2018

Previously the shell(shebang) was hardcoded to be '#!/bin/bash'.
This fixes that, because some of the queueing schedulers had their own
shebangs hardcoded as well, so you actually ended up having two shebang
lines. In order to accomodate more shells, we can
add an option to specify shell via shebangs. If you think this
implementation is fine, I can/will add this to the documentation as
well.

Previously the shell(shebang) was hardcoded to be '#!/bin/bash'.
This fixes that, because some of the queueing schedulers had their own
shebangs hardcoded as well, so you actually ended up having two shebang
lines.  In order to accomodate more shells, we can
add an option to specify shell via shebangs.  If you think this
implementation is fine, I can/will add this to the documentation as
well.
@@ -206,6 +207,8 @@ def __init__(self,
env_extra = dask.config.get('jobqueue.%s.env-extra' % self.scheduler_name)
if log_directory is None:
log_directory = dask.config.get('jobqueue.%s.log-directory' % self.scheduler_name)
if shebang is None:

This comment has been minimized.

Copy link
@willirath

willirath Nov 5, 2018

Collaborator

I'd reconsider adding a kwarg for this. In my understanding, kwargs are for things you change frequently. This looks like an issue that you have to sort out once for your deployment and then never touch it again. Just adding it to the config should cover all cases.

This comment has been minimized.

Copy link
@guillaumeeb

guillaumeeb Nov 5, 2018

Member

I'm not sure, and I have no strong feeling on this. Currently it looks like we have kind of a match between kwargs and config file. I'm not sure which way we should go.

This comment has been minimized.

Copy link
@danpf

danpf Nov 5, 2018

Author Contributor

I did it this way only because that's how everything else in the yaml's are set. I don't really mind either way. maybe let me know @guillaumeeb ?

This comment has been minimized.

Copy link
@guillaumeeb

guillaumeeb Nov 5, 2018

Member

Yeah, that is what I thought. If so, leave it as is, as the everything else is done. Maybe we'll need to check whether all the kwargs are justified or not in the future.

@@ -34,6 +35,7 @@ jobqueue:
local-directory: null # Location of fast local storage like /scratch or $TMPDIR

# PBS resource manager options
shebang: "#!/bin/bash"

This comment has been minimized.

Copy link
@willirath

willirath Nov 5, 2018

Collaborator

Shouldn't we go for #!/usr/bin/env bash ?

This comment has been minimized.

Copy link
@guillaumeeb

guillaumeeb Nov 5, 2018

Member

Yep, looks like it, good catch!

@@ -48,6 +48,7 @@ def test_job_script():
with SLURMCluster(walltime='00:02:00', processes=4, cores=8, memory='28GB') as cluster:

job_script = cluster.job_script()
assert job_script.startswith("#!/bin/bash")

This comment has been minimized.

Copy link
@willirath

willirath Nov 5, 2018

Collaborator

This should also go into the tests for the other schedulers.

This comment has been minimized.

Copy link
@guillaumeeb

guillaumeeb Nov 5, 2018

Member

I believe this should only go in https://github.com/dask/dask-jobqueue/blob/master/dask_jobqueue/tests/test_jobqueue_core.py as this is not specific to any scheduler.

@willirath

This comment has been minimized.

Copy link
Collaborator

willirath commented Nov 5, 2018

Looks good!

Off Topic:

Looking at this PR, adding all these identical lines to the configs (e.g., https://github.com/dask/dask-jobqueue/pull/189/files#diff-06f4938c65f61f791782337bf4187054R15) feels wrong.

Copy link
Member

guillaumeeb left a comment

Nice catch @danpf, and many thanks for the PR!

It looks good, just address the points highlighted by @willirath. Just not sure about whether it should go into kwargs or not...

@@ -48,6 +48,7 @@ def test_job_script():
with SLURMCluster(walltime='00:02:00', processes=4, cores=8, memory='28GB') as cluster:

job_script = cluster.job_script()
assert job_script.startswith("#!/bin/bash")

This comment has been minimized.

Copy link
@guillaumeeb

guillaumeeb Nov 5, 2018

Member

I believe this should only go in https://github.com/dask/dask-jobqueue/blob/master/dask_jobqueue/tests/test_jobqueue_core.py as this is not specific to any scheduler.

@@ -34,6 +35,7 @@ jobqueue:
local-directory: null # Location of fast local storage like /scratch or $TMPDIR

# PBS resource manager options
shebang: "#!/bin/bash"

This comment has been minimized.

Copy link
@guillaumeeb

guillaumeeb Nov 5, 2018

Member

Yep, looks like it, good catch!

@@ -206,6 +207,8 @@ def __init__(self,
env_extra = dask.config.get('jobqueue.%s.env-extra' % self.scheduler_name)
if log_directory is None:
log_directory = dask.config.get('jobqueue.%s.log-directory' % self.scheduler_name)
if shebang is None:

This comment has been minimized.

Copy link
@guillaumeeb

guillaumeeb Nov 5, 2018

Member

I'm not sure, and I have no strong feeling on this. Currently it looks like we have kind of a match between kwargs and config file. I'm not sure which way we should go.

@guillaumeeb

This comment has been minimized.

Copy link
Member

guillaumeeb commented Nov 5, 2018

Looking at this PR, adding all these identical lines to the configs (e.g., https://github.com/dask/dask-jobqueue/pull/189/files#diff-06f4938c65f61f791782337bf4187054R15) feels wrong.

@willirath you're right, this is not ideal. We did that when implementing multiple config in #68. Feel free to open an issue to discuss other possibilities for that point.

@guillaumeeb

This comment has been minimized.

Copy link
Member

guillaumeeb commented Nov 5, 2018

And thanks @willirath for the review here, this is really appreciated.

danpf added 3 commits Nov 5, 2018
Previously shell was default "/bin/bash" now it respects the environment.

Also shebang tests were only in slurm, now there is one for each
scheduler that we support.
Copy link
Member

guillaumeeb left a comment

I think you can simplify tests a bit with pytest parametrize feature.

@@ -42,6 +42,58 @@ def test_command_template():
assert ' --preload mymodule' in cluster._command_template


def test_shebang_settings():

This comment has been minimized.

Copy link
@guillaumeeb

guillaumeeb Nov 5, 2018

Member

You should use parametrized test as the one below for all this!

This comment has been minimized.

Copy link
@danpf

danpf Nov 5, 2018

Author Contributor

really cool!!!!

@@ -66,7 +66,7 @@ def test_job_script():
with SLURMCluster(walltime='00:02:00', processes=4, cores=8, memory='28GB',
env_extra=['export LANG="en_US.utf8"',
'export LANGUAGE="en_US.utf8"',
'export LC_ALL="en_US.utf8"']
'export LC_ALL="en_US.utf8"'],

This comment has been minimized.

Copy link
@guillaumeeb

guillaumeeb Nov 5, 2018

Member

A comma is remaining here.

@guillaumeeb

This comment has been minimized.

Copy link
Member

guillaumeeb commented Nov 5, 2018

LGTM, @willirath do you see something else?

I updated the documentation to reflect shebangs being required.  because
otherwise a batch script could be submitted without an interpreter.  I
also added a test to cover this.
@danpf

This comment has been minimized.

Copy link
Contributor Author

danpf commented Nov 5, 2018

sorry to bother you more, please review the updates to the documentation I added as well.
Also added some functionality to make sure there was always a shebang + test for that.

danpf added 4 commits Nov 6, 2018
Copy link
Collaborator

willirath left a comment

I'm not sure the additional requirement mock is needed to solve the problem at hand. In Xarray's tests, they use a context manager to directly change the dask config:
https://github.com/pydata/xarray/blob/4a5d88dba5fd50d48ab00ed2ebaee058287ab0bf/xarray/tests/test_dask.py#L838

So you could just use

    with pytest.raises(ValueError, match="batch script interpreter"):
        with (dask.config.set(shebang=None)):
            Cluster(cores=2, memory='4GB')
@willirath

This comment has been minimized.

Copy link
Collaborator

willirath commented Nov 6, 2018

Apart from the question about the added dependency on mock, this looks very good!

danpf added 2 commits Nov 6, 2018
assert job_script.startswith(default_shebang)

with pytest.raises(ValueError, match="batch script interpreter"):
with (dask.config.set(shebang=None)):

This comment has been minimized.

Copy link
@willirath

willirath Nov 6, 2018

Collaborator

Sorry. My example code was wrong. dask.config.set('jobqueue.%s.shebang' % Cluster.scheduler_name, None) should work.

This comment has been minimized.

Copy link
@danpf

danpf Nov 6, 2018

Author Contributor

thanks, my initial idea was way worse.

danpf added 2 commits Nov 6, 2018
Copy link
Collaborator

willirath left a comment

Looks good!

@willirath

This comment has been minimized.

Copy link
Collaborator

willirath commented Nov 6, 2018

LGTM, @willirath do you see something else?

@guillaumeeb Good to go from py point of view!

Copy link
Member

guillaumeeb left a comment

We're almost there, but I feel there are two things to address:

  • I don't think we want to raise an exception if no shebang is define (but I may be wrong)
  • We shouldn't put the shebang kwarg or config value in all the docs, as the default is perfectly fine for most cases.

Sorry to be a bit pernickety...

docs/source/configuration-setup.rst Outdated Show resolved Hide resolved
@@ -216,6 +221,10 @@ def __init__(self,
if memory is None:
raise ValueError("You must specify how much memory to use per job like ``memory='24 GB'``")

if shebang is None:
raise ValueError("You must specify a batch script interpreter to use"
" like ``#!/usr/bin/env bash``")

This comment has been minimized.

Copy link
@guillaumeeb

guillaumeeb Nov 6, 2018

Member

Is this really a problem if there is no shebang?
What concerns me here is that this exception might break existing jobqueue configurations, where no shebang is declared. From my point of view it shouldn't.

This comment has been minimized.

Copy link
@danpf

danpf Nov 6, 2018

Author Contributor

We could maybe add a deprecation warning? and then a warning to say that if shebang is None everywhere, automatically set it to #!/usr/bin/env bash to preserve the old configuration

You are right that this breaks existing configs in its current state. But you cannot submit a batch script to most schedulers without a shebang:

sbatch: error: This does not look like a batch script.  The first
sbatch: error: line must start with #! followed by the path to an interpreter.
sbatch: error: For instance: #!/bin/sh

This comment has been minimized.

Copy link
@guillaumeeb

guillaumeeb Nov 6, 2018

Member

Wait, I may be wrong about this. I need to try this PR to see if it breaks my config. Maybe it doesn't, default value should be taken from the module default jobqueue.yaml file.

This comment has been minimized.

Copy link
@danpf

danpf Nov 7, 2018

Author Contributor

looking at the code I think you're right. It should default to bash. I'll wait for you to try it though...

This comment has been minimized.

Copy link
@guillaumeeb

guillaumeeb Nov 7, 2018

Member

Sorry did not find time today, and this will not be possible until friday. You could verify it though, just create a jobqueue.yaml config file, and do not add a shebang line.
I believe the only way to reach that code is to create a local config file, and set shebang to null in it. I'm thus in favor of removing this code, as it seems very unlikely a user run in this case without knowing it.

@@ -103,6 +105,7 @@ ARM Stratus
name: dask-worker
cores: 36
memory: 270GB
shebang: '#!/usr/bin/env bash'

This comment has been minimized.

Copy link
@guillaumeeb

guillaumeeb Nov 6, 2018

Member

You should remove the shebang from "real life" configurations, as keeping the default shebang is perfectly fine apparently for those. This include ARM, but also above Cheyenne and Nersc.

This comment has been minimized.

Copy link
@danpf

danpf Nov 6, 2018

Author Contributor

Are these what the DOE are using as their jobqueue.yaml ? in that case they 'should' set the shebang k-v in this updated version. Currently the config is set per-scheduler and the shebang is pulled from the scheduler's k-v config. This is solved if we use the deprecation method I suggested. Although I think it's best practice for the documentation to show these as set so people understand.

This comment has been minimized.

Copy link
@guillaumeeb

guillaumeeb Nov 6, 2018

Member

I think they shouldn't if they keep the default value, see my response to the other comment. I will try this tomorrow. Default shebang value should be inherited from default config if not specified.

This comment has been minimized.

Copy link
@guillaumeeb

guillaumeeb Nov 7, 2018

Member

And so based on the above discussion, I think we really should remove the majority of examples containing shebang in the doc, only keeping a few specific. Maybe add the config you are using in your cluster in the example section?

This comment has been minimized.

Copy link
@danpf

danpf Nov 8, 2018

Author Contributor

I deleted a bunch, I only left it in places where the docs describe the extra args as well!!

@@ -23,6 +24,7 @@ PBS Deployments
cluster = PBSCluster(processes=18,
threads=4,
shebang='#!/usr/bin/env bash',

This comment has been minimized.

Copy link
@guillaumeeb

guillaumeeb Nov 6, 2018

Member

Still the same, shebang is optional, maybe put it once, but not everywhere.

@guillaumeeb

This comment has been minimized.

Copy link
Member

guillaumeeb commented Nov 8, 2018

You've got flake8 checks failing here.

@willirath

This comment has been minimized.

Copy link
Collaborator

willirath commented Nov 9, 2018

Looks like this is all fine now?

@guillaumeeb

This comment has been minimized.

Copy link
Member

guillaumeeb commented Nov 9, 2018

Yes ! Thanks to both of you here, merging.

@guillaumeeb guillaumeeb merged commit ea1b4ee into dask:master Nov 9, 2018
1 check passed
1 check passed
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@danpf danpf deleted the danpf:danpf/fix_shebang branch Nov 9, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.