New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: pass values through shlex_quote, fixes #3625 #3626
Conversation
datalad/interface/run_procedure.py
Outdated
script=procedure_file if on_windows else shlex_quote(procedure_file), | ||
ds='' if not ds else (ds.path if on_windows else shlex_quote(ds.path)), | ||
args=shlex_quote(u' '.join(u'"{}"'.format(a) for a in args) if args else '') if not on_windows | ||
else u' '.join(u'"{}"'.format(a) for a in args) if args else '') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shelx_quote() would need to happen on each a
inside the format call AFAICS:
u' '.join(u'"{}"'.format(a if on_windows else shlex_quote(a)) for a in args) if args else '')
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, thx!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think the entire u'"{}".fornat
can go and a
or quoted a
passed directly.
You explained it pretty well yourself :] As @mih suggests, you should map shlex_quote over each argument.
That's good.
I suppose every shlex_quote() call should get the "not on windows" condition, so it seems like we should add a helper (probably to datalad.utils) that handles this. Also, while I personally don't like our "BF, ENH" subject convention, it is pretty consistently used in this code base, so please prepend those :] |
Codecov Report
@@ Coverage Diff @@
## 0.11.x #3626 +/- ##
==========================================
- Coverage 81.44% 81.31% -0.13%
==========================================
Files 256 256
Lines 33823 33851 +28
==========================================
- Hits 27547 27527 -20
- Misses 6276 6324 +48
Continue to review full report at Codecov.
|
Thanks for working on this.
I left a few comments. The only things that I think really need to be addressed are
-
the
str
issue -
there is a failing test on the sym link test run
https://travis-ci.org/datalad/datalad/jobs/577417054#L3410
That looks to be because the test uses a custom call format and puts quotes around the values.
This should also get a regression test, but I can add that on top if you'd like.
return {'type': u'python_script', | ||
'template': u'python "{script}" "{ds}" {args}', | ||
'template': u'%s {script} {ds} {args}' % ex, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These changes are OK as is, but just a note that essentially the same change is happening three places. This is a good indication that the common bits could have been pulled out into a variable and then the change could have been made to just one spot.
datalad/interface/run_procedure.py
Outdated
script=procedure_file if on_windows else shlex_quote(procedure_file), | ||
ds='' if not ds else (ds.path if on_windows else shlex_quote(ds.path)), | ||
args=(u' '.join(shlex_quote(a) for a in args) if args else '') if not on_windows | ||
else u' '.join(str(a) for a in args) if args else '') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes to all these arguments could be simplified by adding a shlex_quote helper that returns the value as is on windows. Even if you prefer not to add a general helper right now, you could add a local one. Something like (untested)
diff --git a/datalad/interface/run_procedure.py b/datalad/interface/run_procedure.py
index 9d2f8e4a3..048f8690f 100644
--- a/datalad/interface/run_procedure.py
+++ b/datalad/interface/run_procedure.py
@@ -448,11 +448,11 @@ def __call__(
raise ValueError("No idea how to execute procedure %s. "
"Missing 'execute' permissions?" % procedure_file)
+ maybe_shlex_quote = lambda x: x if on_windows else shlex_quote
cmd = ex['template'].format(
- script=procedure_file if on_windows else shlex_quote(procedure_file),
- ds='' if not ds else (ds.path if on_windows else shlex_quote(ds.path)),
- args=(u' '.join(shlex_quote(a) for a in args) if args else '') if not on_windows
- else u' '.join(str(a) for a in args) if args else '')
+ script=maybe_shlex_quote(procedure_file),
+ ds=maybe_shlex_quote(ds.path) if ds else '',
+ args=u' '.join(maybe_shlex_quote(a) for a in args))
lgr.debug('Attempt to run procedure {} as: {}'.format(
name,
cmd))
In addition to moving the condition to one place, this takes advantage of ' '.join([]) => ''
to drop an unnecessary check for the empty args
case.
And one comment that isn't stylistic/subjective: the str() call in u' '.join(str(a) for a in args)
can cause unicode issues on python 2. six.text_type
is one way you can use unicode/str only py2/py3, but you actually don't need it here because you can just do away with the list comprehension and give args
directly to join(). (Taking a quick glance, there would still be upstream issues with unicode in py2, but the code here wouldn't be contributing to the issue.)
Oops, sorry, I should've made it clear that I was talking about the commit message subjects. |
FTR: The test might be fixed like this: diff --git a/datalad/interface/tests/test_run_procedure.py b/datalad/interface/tests/test_run_procedure.py
index d9a76919e..4a17b7765 100644
--- a/datalad/interface/tests/test_run_procedure.py
+++ b/datalad/interface/tests/test_run_procedure.py
@@ -238,7 +238,7 @@ def test_configs(path):
# for run:
ds.config.add(
'datalad.procedures.datalad_test_proc.call-format',
- 'python "{script}" "{ds}" {{mysub}} {args}',
+ 'python {script} {ds} {{mysub}} {args}',
where='dataset'
)
ds.config.add(
@@ -258,7 +258,7 @@ def test_configs(path):
# config on dataset level:
ds.config.add(
'datalad.procedures.datalad_test_proc.call-format',
- 'python "{script}" "{ds}" local {args}',
+ 'python {script} {ds} local {args}',
where='local'
)
ds.unlock("fromproc.txt") I could not test it though, because the current PR is against 0.11, rebase wasn't straightforward and with all the extensions my dev environment cannot easily handle this. Sorry. |
@mih this makes it work for me, I've pushed those changes. Thanks! |
datalad/utils.py
Outdated
|
||
lgr.log(5, "Done importing datalad.utils") | ||
lgr.log(5, "Done importing datalad.utils") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I personally prefer line endings all the way to the last line, but we are not agreement within the project already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
datalad/interface/run_procedure.py
Outdated
@@ -15,7 +15,9 @@ | |||
|
|||
from glob import iglob | |||
from argparse import REMAINDER | |||
from six.moves import shlex_quote |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is no longer needed with maybe_shlex_quote
datalad/interface/run_procedure.py
Outdated
@@ -33,6 +35,7 @@ | |||
from datalad.distribution.dataset import datasetmethod | |||
from datalad.support.exceptions import InsufficientArgumentsError | |||
from datalad.support.exceptions import NoDatasetArgumentFound | |||
from datalad.utils import on_windows, maybe_shlex_quote |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not used consistently in the code base (yet), but we now do (for any import with more than a single item)
from datalad.utils import (
on_windows,
maybe_shlex_quot,
)
i.e. one per line, each line (incl last one has a trailing comma) -- this minimizes the diffs and make them quicker to grasp.
That being said on_windows
is no longer needed, see below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes, I missed this. I removed all superfluous imports now.
datalad/interface/run_procedure.py
Outdated
'state': state} | ||
elif script_file.endswith('.py'): | ||
ex = sys.executable if on_windows else shlex_quote(sys.executable) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can also be maybe_shlex_quote(sys.executable)
now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point, thx.
Tests did not run (apt issue). Restarted. |
@@ -258,7 +258,7 @@ def test_configs(path): | |||
# config on dataset level: | |||
ds.config.add( | |||
'datalad.procedures.datalad_test_proc.call-format', | |||
'python "{script}" "{ds}" local {args}', | |||
'python {script} {ds} local {args}', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here and above we should also use sys.executable
not python
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh right. Thanks!
Hi @kyleam, I have tried to write a regression test, but that ended up being a test with args that contain spaces -- which is a test that I think cannot fail currently (although there currently is no test that explicitly gives args with spaces, so I pushed it anyway). I wasn't able to come up with any test that can trigger a failure in the case of quoted placeholders... If you have an idea how to write a test for this, you can let me know or add one yourself, if you like. |
This can fail if the command has non-ascii characters, as it will in the next commit. Note that this will conflict with changes on master, which is a good thing because a similar fix needs to be applied there.
Unlike the spaces check in the last commit, this would fail before the recent the shlex_quote() changes because the arguments contain quotes. This very likely needs to be skipped on windows, where our quoting is a no-op, but I'm leaving it as is in this commit to see how the AppVeyor run fails.
Thanks!
Right, it passes without the changes from this PR (i.e. on top of 0.11.x).
Sounds good.
I've pushed a test. It should fail on AppVeyor, but I'd like to see that before marking it as a known windows failure. Feel free to mark it once the test run is over. (edit: already failed and already marked) |
This test fails on AppVeyor [0]. run_procedure() uses a no-op shlex_quote() on Windows [1], so this is expected. [0]: https://ci.appveyor.com/project/mih/datalad/builds/27022010/job/jq3lr977ht5vrmpj#L1109 [1]: datalad#3624 (comment)
test_spaces() fails on AppVeyor. While run_procedure's previous approach of simply surrounding the argument with quotes is incorrect, it would have worked on windows for the simple case of a file name with spaces (I think, though of course we didn't have this test). Perhaps we should make the windows branch of maybe_shlex_quote() smarter so that it can handle simple cases like this, but how we handle shlex_quote on windows needs more work and thought in general. For the purposes of this PR I think we should switch that test's "skip if indirect" check to a full "skip on windows". |
run_procedure() can't use shlex_quote() on Windows [0], so an argument with spaces will not be handled correctly [1]. We should probably teach the windows branch of maybe_shlex_quote() to handle simple cases like a name with spaces. [0]: datalad#3624 (comment) [1]: https://ci.appveyor.com/project/mih/datalad/builds/27022362/job/xoum5f43c135bfgd#L1125
0.11.7 (Sep 02, 2019) -- python2-we-still-love-you-but-... Primarily bugfixes with some optimizations and refactorings. Fixes - [addurls][] - now provides better handling when the URL file isn't in the expected format. ([#3579][]) - always considered a relative file for the URL file argument as relative to the current working directory, which goes against the convention used by other commands of taking relative paths as relative to the dataset argument. ([#3582][]) - [run-procedure][] - hard coded "python" when formatting the command for non-executable procedures ending with ".py". `sys.executable` is now used. ([#3624][]) - failed if arguments needed more complicated quoting than simply surrounding the value with double quotes. This has been resolved for systems that support `shlex.quote`, but note that on Windows values are left unquoted. ([#3626][]) - [siblings][] now displays an informative error message if a local path is given to `--url` but `--name` isn't specified. ([#3555][]) - [sshrun][], the command DataLad uses for `GIT_SSH_COMMAND`, didn't support all the parameters that Git expects it to. ([#3616][]) - Fixed a number of Unicode py2-compatibility issues. ([#3597][]) Enhancements and new features - The [annotate-paths][] helper now caches subdatasets it has seen to avoid unnecessary calls. ([#3570][]) - A repeated configuration query has been dropped from the handling of `--proc-pre` and `--proc-post`. ([#3576][]) - Calls to `git annex find` now use `--in=.` instead of the alias `--in=here` to take advantage of an optimization that git-annex (as of the current release, 7.20190730) applies only to the former. ([#3574][]) - [addurls][] now suggests close matches when the URL or file format contains an unknown field. ([#3594][]) - Shared logic used in the setup.py files of Datalad and its extensions has been moved to modules in the _datalad_build_support/ directory. ([#3600][]) - Get ready for upcoming git-annex dropping support for direct mode ([#3631][]) * tag '0.11.7': (87 commits) DOC: Added an entry to changelogn on merged 3631 ENH: finalizing changelog for 0.11.7 TST: Update tests for a git-annex without direct mode TST: utils: Add decorator that skips when direct mode is unsupported ENH: annexrepo: Refuse to initialize in direct mode if unsupported ENH: annexrepo: Add check_direct_mode_support method BF+TST: Avoid leaking patched git-annex version TST+RF: test_annexrepo: Split up a test CHANGELOG.md: Second batch for 0.11.7 TST: run_procedure: Mark test_spaces() as known Windows failure TST: run_procedure: Mark test_quoting as known windows failure TST: run_procedure: Test more arguments that need quoting BF(py2): run_procedure: Avoid encoding error in log message TST: add run_procedure test with spaces in file name TST/RF: non-hardcoded Python executable RF: newline at end of file RF: helper instead of conditional RF: remove superfluous imports BF/TST: remove quoting ENH: replace conditionals with helper function ...
0.11.7 (Sep 02, 2019) -- python2-we-still-love-you-but-... Primarily bugfixes with some optimizations and refactorings. Fixes - [addurls][] - now provides better handling when the URL file isn't in the expected format. ([#3579][]) - always considered a relative file for the URL file argument as relative to the current working directory, which goes against the convention used by other commands of taking relative paths as relative to the dataset argument. ([#3582][]) - [run-procedure][] - hard coded "python" when formatting the command for non-executable procedures ending with ".py". `sys.executable` is now used. ([#3624][]) - failed if arguments needed more complicated quoting than simply surrounding the value with double quotes. This has been resolved for systems that support `shlex.quote`, but note that on Windows values are left unquoted. ([#3626][]) - [siblings][] now displays an informative error message if a local path is given to `--url` but `--name` isn't specified. ([#3555][]) - [sshrun][], the command DataLad uses for `GIT_SSH_COMMAND`, didn't support all the parameters that Git expects it to. ([#3616][]) - Fixed a number of Unicode py2-compatibility issues. ([#3597][]) - [download-url][] now will create leading directories of the output path if they do not exist ([#3646][]) Enhancements and new features - The [annotate-paths][] helper now caches subdatasets it has seen to avoid unnecessary calls. ([#3570][]) - A repeated configuration query has been dropped from the handling of `--proc-pre` and `--proc-post`. ([#3576][]) - Calls to `git annex find` now use `--in=.` instead of the alias `--in=here` to take advantage of an optimization that git-annex (as of the current release, 7.20190730) applies only to the former. ([#3574][]) - [addurls][] now suggests close matches when the URL or file format contains an unknown field. ([#3594][]) - Shared logic used in the setup.py files of Datalad and its extensions has been moved to modules in the _datalad_build_support/ directory. ([#3600][]) - Get ready for upcoming git-annex dropping support for direct mode ([#3631][]) * tag '0.11.7': Changelog entry for download-url paths handling ENH: downloaders: Ensure directories for target exist
In this PR I have
_guess_exec
inrun_procedure.py
and__call__
, (if we are not on Windows)However, locally, this crashes two tests:
because multiple arguments are now returned as a single string. I lack the judgment to decide whether this is indeed problematic, and at which point I can do adjustments.
Can someone give me a pointer?
(I put the commits on top of those in #3624 to prevent merge conflicts, let me know if that's okay)