Skip to content

Add shebang option to support more shells#189

Merged
guillaumeeb merged 19 commits into
dask:masterfrom
danpf:danpf/fix_shebang
Nov 9, 2018
Merged

Add shebang option to support more shells#189
guillaumeeb merged 19 commits into
dask:masterfrom
danpf:danpf/fix_shebang

Conversation

@danpf

@danpf danpf commented Nov 5, 2018

Copy link
Copy Markdown
Contributor

Previously the shell(shebang) was hardcoded to be '#!/bin/bash'.
This fixes that, because some of the queueing schedulers had their own
shebangs hardcoded as well, so you actually ended up having two shebang
lines. In order to accomodate more shells, we can
add an option to specify shell via shebangs. If you think this
implementation is fine, I can/will add this to the documentation as
well.

Previously the shell(shebang) was hardcoded to be '#!/bin/bash'.
This fixes that, because some of the queueing schedulers had their own
shebangs hardcoded as well, so you actually ended up having two shebang
lines.  In order to accomodate more shells, we can
add an option to specify shell via shebangs.  If you think this
implementation is fine, I can/will add this to the documentation as
well.
Comment thread dask_jobqueue/core.py
env_extra = dask.config.get('jobqueue.%s.env-extra' % self.scheduler_name)
if log_directory is None:
log_directory = dask.config.get('jobqueue.%s.log-directory' % self.scheduler_name)
if shebang is None:

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd reconsider adding a kwarg for this. In my understanding, kwargs are for things you change frequently. This looks like an issue that you have to sort out once for your deployment and then never touch it again. Just adding it to the config should cover all cases.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure, and I have no strong feeling on this. Currently it looks like we have kind of a match between kwargs and config file. I'm not sure which way we should go.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did it this way only because that's how everything else in the yaml's are set. I don't really mind either way. maybe let me know @guillaumeeb ?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that is what I thought. If so, leave it as is, as the everything else is done. Maybe we'll need to check whether all the kwargs are justified or not in the future.

Comment thread dask_jobqueue/jobqueue.yaml Outdated
local-directory: null # Location of fast local storage like /scratch or $TMPDIR

# PBS resource manager options
shebang: "#!/bin/bash"

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we go for #!/usr/bin/env bash ?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, looks like it, good catch!

Comment thread dask_jobqueue/tests/test_slurm.py Outdated
with SLURMCluster(walltime='00:02:00', processes=4, cores=8, memory='28GB') as cluster:

job_script = cluster.job_script()
assert job_script.startswith("#!/bin/bash")

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should also go into the tests for the other schedulers.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this should only go in https://github.com/dask/dask-jobqueue/blob/master/dask_jobqueue/tests/test_jobqueue_core.py as this is not specific to any scheduler.

@willirath

Copy link
Copy Markdown
Collaborator

Looks good!

Off Topic:

Looking at this PR, adding all these identical lines to the configs (e.g., https://github.com/dask/dask-jobqueue/pull/189/files#diff-06f4938c65f61f791782337bf4187054R15) feels wrong.

@guillaumeeb guillaumeeb left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch @danpf, and many thanks for the PR!

It looks good, just address the points highlighted by @willirath. Just not sure about whether it should go into kwargs or not...

Comment thread dask_jobqueue/tests/test_slurm.py Outdated
with SLURMCluster(walltime='00:02:00', processes=4, cores=8, memory='28GB') as cluster:

job_script = cluster.job_script()
assert job_script.startswith("#!/bin/bash")

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this should only go in https://github.com/dask/dask-jobqueue/blob/master/dask_jobqueue/tests/test_jobqueue_core.py as this is not specific to any scheduler.

Comment thread dask_jobqueue/jobqueue.yaml Outdated
local-directory: null # Location of fast local storage like /scratch or $TMPDIR

# PBS resource manager options
shebang: "#!/bin/bash"

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, looks like it, good catch!

Comment thread dask_jobqueue/core.py
env_extra = dask.config.get('jobqueue.%s.env-extra' % self.scheduler_name)
if log_directory is None:
log_directory = dask.config.get('jobqueue.%s.log-directory' % self.scheduler_name)
if shebang is None:

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure, and I have no strong feeling on this. Currently it looks like we have kind of a match between kwargs and config file. I'm not sure which way we should go.

@guillaumeeb

Copy link
Copy Markdown
Member

Looking at this PR, adding all these identical lines to the configs (e.g., https://github.com/dask/dask-jobqueue/pull/189/files#diff-06f4938c65f61f791782337bf4187054R15) feels wrong.

@willirath you're right, this is not ideal. We did that when implementing multiple config in #68. Feel free to open an issue to discuss other possibilities for that point.

@guillaumeeb

Copy link
Copy Markdown
Member

And thanks @willirath for the review here, this is really appreciated.

danpf added 3 commits November 5, 2018 07:19
Previously shell was default "/bin/bash" now it respects the environment.

Also shebang tests were only in slurm, now there is one for each
scheduler that we support.

@guillaumeeb guillaumeeb left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can simplify tests a bit with pytest parametrize feature.

assert ' --preload mymodule' in cluster._command_template


def test_shebang_settings():

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should use parametrized test as the one below for all this!

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

really cool!!!!

Comment thread dask_jobqueue/tests/test_slurm.py Outdated
env_extra=['export LANG="en_US.utf8"',
'export LANGUAGE="en_US.utf8"',
'export LC_ALL="en_US.utf8"']
'export LC_ALL="en_US.utf8"'],

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A comma is remaining here.

@guillaumeeb

Copy link
Copy Markdown
Member

LGTM, @willirath do you see something else?

I updated the documentation to reflect shebangs being required.  because
otherwise a batch script could be submitted without an interpreter.  I
also added a test to cover this.
@danpf

danpf commented Nov 5, 2018

Copy link
Copy Markdown
Contributor Author

sorry to bother you more, please review the updates to the documentation I added as well.
Also added some functionality to make sure there was always a shebang + test for that.

@willirath willirath left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure the additional requirement mock is needed to solve the problem at hand. In Xarray's tests, they use a context manager to directly change the dask config:
https://github.com/pydata/xarray/blob/4a5d88dba5fd50d48ab00ed2ebaee058287ab0bf/xarray/tests/test_dask.py#L838

So you could just use

    with pytest.raises(ValueError, match="batch script interpreter"):
        with (dask.config.set(shebang=None)):
            Cluster(cores=2, memory='4GB')

@willirath

Copy link
Copy Markdown
Collaborator

Apart from the question about the added dependency on mock, this looks very good!

assert job_script.startswith(default_shebang)

with pytest.raises(ValueError, match="batch script interpreter"):
with (dask.config.set(shebang=None)):

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry. My example code was wrong. dask.config.set('jobqueue.%s.shebang' % Cluster.scheduler_name, None) should work.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, my initial idea was way worse.

@willirath willirath left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@willirath

Copy link
Copy Markdown
Collaborator

LGTM, @willirath do you see something else?

@guillaumeeb Good to go from py point of view!

@guillaumeeb guillaumeeb left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're almost there, but I feel there are two things to address:

  • I don't think we want to raise an exception if no shebang is define (but I may be wrong)
  • We shouldn't put the shebang kwarg or config value in all the docs, as the default is perfectly fine for most cases.

Sorry to be a bit pernickety...

Comment thread docs/source/configuration-setup.rst Outdated
Comment thread dask_jobqueue/core.py Outdated

if shebang is None:
raise ValueError("You must specify a batch script interpreter to use"
" like ``#!/usr/bin/env bash``")

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this really a problem if there is no shebang?
What concerns me here is that this exception might break existing jobqueue configurations, where no shebang is declared. From my point of view it shouldn't.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could maybe add a deprecation warning? and then a warning to say that if shebang is None everywhere, automatically set it to #!/usr/bin/env bash to preserve the old configuration

You are right that this breaks existing configs in its current state. But you cannot submit a batch script to most schedulers without a shebang:

sbatch: error: This does not look like a batch script.  The first
sbatch: error: line must start with #! followed by the path to an interpreter.
sbatch: error: For instance: #!/bin/sh

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, I may be wrong about this. I need to try this PR to see if it breaks my config. Maybe it doesn't, default value should be taken from the module default jobqueue.yaml file.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking at the code I think you're right. It should default to bash. I'll wait for you to try it though...

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry did not find time today, and this will not be possible until friday. You could verify it though, just create a jobqueue.yaml config file, and do not add a shebang line.
I believe the only way to reach that code is to create a local config file, and set shebang to null in it. I'm thus in favor of removing this code, as it seems very unlikely a user run in this case without knowing it.

Comment thread docs/source/configurations.rst Outdated
name: dask-worker
cores: 36
memory: 270GB
shebang: '#!/usr/bin/env bash'

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should remove the shebang from "real life" configurations, as keeping the default shebang is perfectly fine apparently for those. This include ARM, but also above Cheyenne and Nersc.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these what the DOE are using as their jobqueue.yaml ? in that case they 'should' set the shebang k-v in this updated version. Currently the config is set per-scheduler and the shebang is pulled from the scheduler's k-v config. This is solved if we use the deprecation method I suggested. Although I think it's best practice for the documentation to show these as set so people understand.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think they shouldn't if they keep the default value, see my response to the other comment. I will try this tomorrow. Default shebang value should be inherited from default config if not specified.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And so based on the above discussion, I think we really should remove the majority of examples containing shebang in the doc, only keeping a few specific. Maybe add the config you are using in your cluster in the example section?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I deleted a bunch, I only left it in places where the docs describe the extra args as well!!

Comment thread docs/source/examples.rst Outdated

cluster = PBSCluster(processes=18,
threads=4,
shebang='#!/usr/bin/env bash',

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still the same, shebang is optional, maybe put it once, but not everywhere.

@guillaumeeb

Copy link
Copy Markdown
Member

You've got flake8 checks failing here.

@willirath

Copy link
Copy Markdown
Collaborator

Looks like this is all fine now?

@guillaumeeb

Copy link
Copy Markdown
Member

Yes ! Thanks to both of you here, merging.

@guillaumeeb guillaumeeb merged commit ea1b4ee into dask:master Nov 9, 2018
@danpf danpf deleted the danpf/fix_shebang branch November 9, 2018 22:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants