New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BF: fix execution of patool commands #3176
Conversation
BK since would stall if gzip or pigz is available, and not p7zip
…rue is probably not what you want
patool invokes with shell=True and behavior when cmd is provided as list and shell is True is not what you would expect - only the first list member is provided to the shell and the rest as additional arguments to the shell, not as a part of the command. That leads to incorrect execution of the line
Codecov Report
@@ Coverage Diff @@
## 0.11.x #3176 +/- ##
===========================================
- Coverage 90.83% 58.01% -32.83%
===========================================
Files 249 92 -157
Lines 32783 14652 -18131
===========================================
- Hits 29780 8500 -21280
- Misses 3003 6152 +3149
Continue to review full report at Codecov.
|
grr... Not sure which way to proceed here. So
I thought that may be we could provide our custom gunzip functionality which would respect that filename and thus would mitigate discrepancy, but
so I do not see an easy way to figure out that filename in the header. But we can't easily switch to "just use original filename stripped of .gz" since
|
This test passes whenever p7zip-full is installed
7d992d7
to
f355e1b
Compare
ok -- for now just decided to blow in no |
… files DataLad core has a number of improvements in handling of .gz files in datalad/datalad#3176 but overall proper solution requires RFing of how all the .gz should be handled including datalad-archives special remote, so they do not rely on any filename in the header. That is for now postponed. Installing p7zip-full should mitigate the issue for now
f355e1b
to
5e4d233
Compare
Proper solution is a bit too involved, so postponed, and decided just to blow with an informative message for now
5e4d233
to
06ab803
Compare
oops -- forgot about our bright future, changed base to be 0.11.x |
Thanks for summarizing the results of all your digging.
Sounds sensible. |
with @kyleam 's blessing will merge for now |
# | ||
if isinstance(cmd, (list, tuple)) and kwargs.get('shell'): | ||
# patool (as far as I see it) takes care about quoting args | ||
cmd = ' '.join(cmd) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also we might want to consider moving this into the runner itself. I didn't spot any place where we use shell and pass the list, so it shouldn't change any other behavior but might prevent us from hitting it again
## 0.11.4 (Mar 18, 2019) -- get-ready Largely a bug fix release with a few enhancements ### Important - 0.11.x series will be the last one with support for direct mode of [git-annex][] which is used on crippled (no symlinks and no locking) filesystems. v7 repositories should be used instead. ### Fixes - Extraction of .gz files is broken without p7zip installed. We now abort with an informative error in this situation. ([#3176][]) - Committing failed in some cases because we didn't ensure that the path passed to `git read-tree --index-output=...` resided on the same filesystem as the repository. ([#3181][]) - Some pointless warnings during metadata aggregation have been eliminated. ([#3186][]) - With Python 3 the LORIS token authenticator did not properly decode a response ([#3205][]). - With Python 3 downloaders unnecessarily decoded the response when getting the status, leading to an encoding error. ([#3210][]) - In some cases, our internal command Runner did not adjust the environment's `PWD` to match the current working directory specified with the `cwd` parameter. ([#3215][]) - The specification of the pyliblzma dependency was broken. ([#3220][]) - [search] displayed an uninformative blank log message in some cases. ([#3222][]) - The logic for finding the location of the aggregate metadata DB anchored the search path incorrectly, leading to a spurious warning. ([#3241][]) - Some progress bars were still displayed when stdout and stderr were not attached to a tty. ([#3281][]) - Check for stdin/out/err to not be closed before checking for `.isatty`. ([#3268][]) ### Enhancements and new features - Creating a new repository now aborts if any of the files in the directory are tracked by a repository in a parent directory. ([#3211][]) - [run] learned to replace the `{tmpdir}` placeholder in commands with a temporary directory. ([#3223][]) - [duecredit][] support has been added for citing DataLad itself as well as datasets that an analysis uses. ([#3184][]) - The `eval_results` interface helper unintentionally modified one of its arguments. ([#3249][]) - A few DataLad constants have been added, changed, or renamed ([#3250][]): - `HANDLE_META_DIR` is now `DATALAD_DOTDIR`. The old name should be considered deprecated. - `METADATA_DIR` now refers to `DATALAD_DOTDIR/metadata` rather than `DATALAD_DOTDIR/meta` (which is still available as `OLDMETADATA_DIR`). - The new `DATASET_METADATA_FILE` refers to `METADATA_DIR/dataset.json`. - The new `DATASET_CONFIG_FILE` refers to `DATALAD_DOTDIR/config`. - `METADATA_FILENAME` has been renamed to `OLDMETADATA_FILENAME`. * tag '0.11.4': (82 commits) Updated CHANGELOG.md for having merged check for not being closed [DATALAD RUNCMD] CHANGELOG: Re-linkify 0.11.4 entries CHANGELOG.md: Update 0.11.4 entries CHANGELOG.md: Adjust the format of a link BF: split lines on spaces and commas befoe doing sed capture of #issue RF: adjust tools/link_issues_CHANGELOG to have MD links as [#issue][] BF+DOC: Fix all links to mark them valid markdown BF: Fix markup bug that prevents sphinx-build from succeeding DOC: extend coumentation of is_interactive BF: guard .istty with try/except ENH: ui: Drop progress bars if not attached to tty ENH: ui: Add another name for SilentConsoleLog BF: check for stdin/out/err to not be closed before checking for .isatty RF: HANDLE_META_DIR -> DATALAD_DOTDIR (but keeping an alias for compatibility) RF: define/use consts DATASET_CONFIG_FILE, DATASET_METADATA_FILE, METADATA_DIR RF: METADATA_DIR/FILE -> OLDMETADATA_DIR/FILE BF: do not cause a side-effect on kwargs in @eval_results BF: join with ds.path when trying to see if any other metadata file is available RF: Move duecredit into its own section so it is not a part of install_requires [DATALAD RUNCMD] CHANGELOG: Re-linkify 0.11.4 entries ...
## 0.11.4 (Mar 18, 2019) -- get-ready Largely a bug fix release with a few enhancements ### Important - 0.11.x series will be the last one with support for direct mode of [git-annex][] which is used on crippled (no symlinks and no locking) filesystems. v7 repositories should be used instead. ### Fixes - Extraction of .gz files is broken without p7zip installed. We now abort with an informative error in this situation. ([datalad#3176][]) - Committing failed in some cases because we didn't ensure that the path passed to `git read-tree --index-output=...` resided on the same filesystem as the repository. ([datalad#3181][]) - Some pointless warnings during metadata aggregation have been eliminated. ([datalad#3186][]) - With Python 3 the LORIS token authenticator did not properly decode a response ([datalad#3205][]). - With Python 3 downloaders unnecessarily decoded the response when getting the status, leading to an encoding error. ([datalad#3210][]) - In some cases, our internal command Runner did not adjust the environment's `PWD` to match the current working directory specified with the `cwd` parameter. ([datalad#3215][]) - The specification of the pyliblzma dependency was broken. ([datalad#3220][]) - [search] displayed an uninformative blank log message in some cases. ([datalad#3222][]) - The logic for finding the location of the aggregate metadata DB anchored the search path incorrectly, leading to a spurious warning. ([datalad#3241][]) - Some progress bars were still displayed when stdout and stderr were not attached to a tty. ([datalad#3281][]) - Check for stdin/out/err to not be closed before checking for `.isatty`. ([datalad#3268][]) ### Enhancements and new features - Creating a new repository now aborts if any of the files in the directory are tracked by a repository in a parent directory. ([datalad#3211][]) - [run] learned to replace the `{tmpdir}` placeholder in commands with a temporary directory. ([datalad#3223][]) - [duecredit][] support has been added for citing DataLad itself as well as datasets that an analysis uses. ([datalad#3184][]) - The `eval_results` interface helper unintentionally modified one of its arguments. ([datalad#3249][]) - A few DataLad constants have been added, changed, or renamed ([datalad#3250][]): - `HANDLE_META_DIR` is now `DATALAD_DOTDIR`. The old name should be considered deprecated. - `METADATA_DIR` now refers to `DATALAD_DOTDIR/metadata` rather than `DATALAD_DOTDIR/meta` (which is still available as `OLDMETADATA_DIR`). - The new `DATASET_METADATA_FILE` refers to `METADATA_DIR/dataset.json`. - The new `DATASET_CONFIG_FILE` refers to `DATALAD_DOTDIR/config`. - `METADATA_FILENAME` has been renamed to `OLDMETADATA_FILENAME`. * tag '0.11.4': Updated CHANGELOG.md for having merged check for not being closed [DATALAD RUNCMD] CHANGELOG: Re-linkify 0.11.4 entries CHANGELOG.md: Update 0.11.4 entries CHANGELOG.md: Adjust the format of a link BF: split lines on spaces and commas befoe doing sed capture of #issue RF: adjust tools/link_issues_CHANGELOG to have MD links as [#issue][] BF+DOC: Fix all links to mark them valid markdown BF: Fix markup bug that prevents sphinx-build from succeeding DOC: extend coumentation of is_interactive BF: guard .istty with try/except BF: check for stdin/out/err to not be closed before checking for .isatty [DATALAD RUNCMD] CHANGELOG: Re-linkify 0.11.4 entries CHANGELOG: Add entries for 0.11.4 ENH: version boost and initial changes to CHANGELOG
All of the checks of the test generator fail on a system without p7zip. Before dataladgh-4041 (in particular, before beb5d04), the subset of checks that existed at that time was skipped with (see dataladgh-3176): SKIP: cmd:7z is missing. (Not) Funny enough but ATM we need p7zip installation to handle .gz files extraction 'correctly' However, with the change to the archive file name in 05e9353 (TST: Expand test coverage to additional relevant compression formats, 2020-01-17), the MissingExternalDependency exception that led to the skip for gzip formats is no longer triggered. And even if it were, the other file formats tested would fail. Mark these tests as known failures when p7zip isn't available.
It is an incomplete fix, which would still cause test failure in the crawler if no p7zip is installed in crawler due to #3175 , but wanted to see if anything breaks here before proceeding further