Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add shebang option to support more shells #189

Merged
merged 19 commits into from Nov 9, 2018
Merged

Add shebang option to support more shells #189

merged 19 commits into from Nov 9, 2018

Conversation

@danpf
Copy link
Contributor

@danpf danpf commented Nov 5, 2018

Previously the shell(shebang) was hardcoded to be '#!/bin/bash'.
This fixes that, because some of the queueing schedulers had their own
shebangs hardcoded as well, so you actually ended up having two shebang
lines. In order to accomodate more shells, we can
add an option to specify shell via shebangs. If you think this
implementation is fine, I can/will add this to the documentation as
well.

Previously the shell(shebang) was hardcoded to be '#!/bin/bash'.
This fixes that, because some of the queueing schedulers had their own
shebangs hardcoded as well, so you actually ended up having two shebang
lines.  In order to accomodate more shells, we can
add an option to specify shell via shebangs.  If you think this
implementation is fine, I can/will add this to the documentation as
well.
@@ -206,6 +207,8 @@ def __init__(self,
env_extra = dask.config.get('jobqueue.%s.env-extra' % self.scheduler_name)
if log_directory is None:
log_directory = dask.config.get('jobqueue.%s.log-directory' % self.scheduler_name)
if shebang is None:
Copy link
Collaborator

@willirath willirath Nov 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd reconsider adding a kwarg for this. In my understanding, kwargs are for things you change frequently. This looks like an issue that you have to sort out once for your deployment and then never touch it again. Just adding it to the config should cover all cases.

Copy link
Member

@guillaumeeb guillaumeeb Nov 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure, and I have no strong feeling on this. Currently it looks like we have kind of a match between kwargs and config file. I'm not sure which way we should go.

Copy link
Contributor Author

@danpf danpf Nov 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did it this way only because that's how everything else in the yaml's are set. I don't really mind either way. maybe let me know @guillaumeeb ?

Copy link
Member

@guillaumeeb guillaumeeb Nov 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that is what I thought. If so, leave it as is, as the everything else is done. Maybe we'll need to check whether all the kwargs are justified or not in the future.

@@ -34,6 +35,7 @@ jobqueue:
local-directory: null # Location of fast local storage like /scratch or $TMPDIR

# PBS resource manager options
shebang: "#!/bin/bash"
Copy link
Collaborator

@willirath willirath Nov 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we go for #!/usr/bin/env bash ?

Copy link
Member

@guillaumeeb guillaumeeb Nov 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, looks like it, good catch!

@@ -48,6 +48,7 @@ def test_job_script():
with SLURMCluster(walltime='00:02:00', processes=4, cores=8, memory='28GB') as cluster:

job_script = cluster.job_script()
assert job_script.startswith("#!/bin/bash")
Copy link
Collaborator

@willirath willirath Nov 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should also go into the tests for the other schedulers.

Copy link
Member

@guillaumeeb guillaumeeb Nov 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this should only go in https://github.com/dask/dask-jobqueue/blob/master/dask_jobqueue/tests/test_jobqueue_core.py as this is not specific to any scheduler.

@willirath
Copy link
Collaborator

@willirath willirath commented Nov 5, 2018

Looks good!

Off Topic:

Looking at this PR, adding all these identical lines to the configs (e.g., https://github.com/dask/dask-jobqueue/pull/189/files#diff-06f4938c65f61f791782337bf4187054R15) feels wrong.

Copy link
Member

@guillaumeeb guillaumeeb left a comment

Nice catch @danpf, and many thanks for the PR!

It looks good, just address the points highlighted by @willirath. Just not sure about whether it should go into kwargs or not...

@@ -48,6 +48,7 @@ def test_job_script():
with SLURMCluster(walltime='00:02:00', processes=4, cores=8, memory='28GB') as cluster:

job_script = cluster.job_script()
assert job_script.startswith("#!/bin/bash")
Copy link
Member

@guillaumeeb guillaumeeb Nov 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this should only go in https://github.com/dask/dask-jobqueue/blob/master/dask_jobqueue/tests/test_jobqueue_core.py as this is not specific to any scheduler.

@@ -34,6 +35,7 @@ jobqueue:
local-directory: null # Location of fast local storage like /scratch or $TMPDIR

# PBS resource manager options
shebang: "#!/bin/bash"
Copy link
Member

@guillaumeeb guillaumeeb Nov 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, looks like it, good catch!

@@ -206,6 +207,8 @@ def __init__(self,
env_extra = dask.config.get('jobqueue.%s.env-extra' % self.scheduler_name)
if log_directory is None:
log_directory = dask.config.get('jobqueue.%s.log-directory' % self.scheduler_name)
if shebang is None:
Copy link
Member

@guillaumeeb guillaumeeb Nov 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure, and I have no strong feeling on this. Currently it looks like we have kind of a match between kwargs and config file. I'm not sure which way we should go.

@guillaumeeb
Copy link
Member

@guillaumeeb guillaumeeb commented Nov 5, 2018

Looking at this PR, adding all these identical lines to the configs (e.g., https://github.com/dask/dask-jobqueue/pull/189/files#diff-06f4938c65f61f791782337bf4187054R15) feels wrong.

@willirath you're right, this is not ideal. We did that when implementing multiple config in #68. Feel free to open an issue to discuss other possibilities for that point.

@guillaumeeb
Copy link
Member

@guillaumeeb guillaumeeb commented Nov 5, 2018

And thanks @willirath for the review here, this is really appreciated.

danpf added 3 commits Nov 5, 2018
Previously shell was default "/bin/bash" now it respects the environment.

Also shebang tests were only in slurm, now there is one for each
scheduler that we support.
Copy link
Member

@guillaumeeb guillaumeeb left a comment

I think you can simplify tests a bit with pytest parametrize feature.

@@ -42,6 +42,58 @@ def test_command_template():
assert ' --preload mymodule' in cluster._command_template


def test_shebang_settings():
Copy link
Member

@guillaumeeb guillaumeeb Nov 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should use parametrized test as the one below for all this!

Copy link
Contributor Author

@danpf danpf Nov 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

really cool!!!!

@@ -66,7 +66,7 @@ def test_job_script():
with SLURMCluster(walltime='00:02:00', processes=4, cores=8, memory='28GB',
env_extra=['export LANG="en_US.utf8"',
'export LANGUAGE="en_US.utf8"',
'export LC_ALL="en_US.utf8"']
'export LC_ALL="en_US.utf8"'],
Copy link
Member

@guillaumeeb guillaumeeb Nov 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A comma is remaining here.

@guillaumeeb
Copy link
Member

@guillaumeeb guillaumeeb commented Nov 5, 2018

LGTM, @willirath do you see something else?

I updated the documentation to reflect shebangs being required.  because
otherwise a batch script could be submitted without an interpreter.  I
also added a test to cover this.
@danpf
Copy link
Contributor Author

@danpf danpf commented Nov 5, 2018

sorry to bother you more, please review the updates to the documentation I added as well.
Also added some functionality to make sure there was always a shebang + test for that.

Copy link
Collaborator

@willirath willirath left a comment

I'm not sure the additional requirement mock is needed to solve the problem at hand. In Xarray's tests, they use a context manager to directly change the dask config:
https://github.com/pydata/xarray/blob/4a5d88dba5fd50d48ab00ed2ebaee058287ab0bf/xarray/tests/test_dask.py#L838

So you could just use

    with pytest.raises(ValueError, match="batch script interpreter"):
        with (dask.config.set(shebang=None)):
            Cluster(cores=2, memory='4GB')

@willirath
Copy link
Collaborator

@willirath willirath commented Nov 6, 2018

Apart from the question about the added dependency on mock, this looks very good!

assert job_script.startswith(default_shebang)

with pytest.raises(ValueError, match="batch script interpreter"):
with (dask.config.set(shebang=None)):
Copy link
Collaborator

@willirath willirath Nov 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry. My example code was wrong. dask.config.set('jobqueue.%s.shebang' % Cluster.scheduler_name, None) should work.

Copy link
Contributor Author

@danpf danpf Nov 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, my initial idea was way worse.

Copy link
Collaborator

@willirath willirath left a comment

Looks good!

@willirath
Copy link
Collaborator

@willirath willirath commented Nov 6, 2018

LGTM, @willirath do you see something else?

@guillaumeeb Good to go from py point of view!

Copy link
Member

@guillaumeeb guillaumeeb left a comment

We're almost there, but I feel there are two things to address:

  • I don't think we want to raise an exception if no shebang is define (but I may be wrong)
  • We shouldn't put the shebang kwarg or config value in all the docs, as the default is perfectly fine for most cases.

Sorry to be a bit pernickety...

docs/source/configuration-setup.rst Outdated Show resolved Hide resolved
@@ -216,6 +221,10 @@ def __init__(self,
if memory is None:
raise ValueError("You must specify how much memory to use per job like ``memory='24 GB'``")

if shebang is None:
raise ValueError("You must specify a batch script interpreter to use"
" like ``#!/usr/bin/env bash``")
Copy link
Member

@guillaumeeb guillaumeeb Nov 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this really a problem if there is no shebang?
What concerns me here is that this exception might break existing jobqueue configurations, where no shebang is declared. From my point of view it shouldn't.

Copy link
Contributor Author

@danpf danpf Nov 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could maybe add a deprecation warning? and then a warning to say that if shebang is None everywhere, automatically set it to #!/usr/bin/env bash to preserve the old configuration

You are right that this breaks existing configs in its current state. But you cannot submit a batch script to most schedulers without a shebang:

sbatch: error: This does not look like a batch script.  The first
sbatch: error: line must start with #! followed by the path to an interpreter.
sbatch: error: For instance: #!/bin/sh

Copy link
Member

@guillaumeeb guillaumeeb Nov 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, I may be wrong about this. I need to try this PR to see if it breaks my config. Maybe it doesn't, default value should be taken from the module default jobqueue.yaml file.

Copy link
Contributor Author

@danpf danpf Nov 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking at the code I think you're right. It should default to bash. I'll wait for you to try it though...

Copy link
Member

@guillaumeeb guillaumeeb Nov 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry did not find time today, and this will not be possible until friday. You could verify it though, just create a jobqueue.yaml config file, and do not add a shebang line.
I believe the only way to reach that code is to create a local config file, and set shebang to null in it. I'm thus in favor of removing this code, as it seems very unlikely a user run in this case without knowing it.

@@ -103,6 +105,7 @@ ARM Stratus
name: dask-worker
cores: 36
memory: 270GB
shebang: '#!/usr/bin/env bash'
Copy link
Member

@guillaumeeb guillaumeeb Nov 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should remove the shebang from "real life" configurations, as keeping the default shebang is perfectly fine apparently for those. This include ARM, but also above Cheyenne and Nersc.

Copy link
Contributor Author

@danpf danpf Nov 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these what the DOE are using as their jobqueue.yaml ? in that case they 'should' set the shebang k-v in this updated version. Currently the config is set per-scheduler and the shebang is pulled from the scheduler's k-v config. This is solved if we use the deprecation method I suggested. Although I think it's best practice for the documentation to show these as set so people understand.

Copy link
Member

@guillaumeeb guillaumeeb Nov 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think they shouldn't if they keep the default value, see my response to the other comment. I will try this tomorrow. Default shebang value should be inherited from default config if not specified.

Copy link
Member

@guillaumeeb guillaumeeb Nov 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And so based on the above discussion, I think we really should remove the majority of examples containing shebang in the doc, only keeping a few specific. Maybe add the config you are using in your cluster in the example section?

Copy link
Contributor Author

@danpf danpf Nov 8, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I deleted a bunch, I only left it in places where the docs describe the extra args as well!!

@@ -23,6 +24,7 @@ PBS Deployments
cluster = PBSCluster(processes=18,
threads=4,
shebang='#!/usr/bin/env bash',
Copy link
Member

@guillaumeeb guillaumeeb Nov 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still the same, shebang is optional, maybe put it once, but not everywhere.

@guillaumeeb
Copy link
Member

@guillaumeeb guillaumeeb commented Nov 8, 2018

You've got flake8 checks failing here.

@willirath
Copy link
Collaborator

@willirath willirath commented Nov 9, 2018

Looks like this is all fine now?

@guillaumeeb
Copy link
Member

@guillaumeeb guillaumeeb commented Nov 9, 2018

Yes ! Thanks to both of you here, merging.

@guillaumeeb guillaumeeb merged commit ea1b4ee into dask:master Nov 9, 2018
1 check passed
@danpf danpf deleted the danpf/fix_shebang branch Nov 9, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

3 participants