ENH: download_url: Use trailing separator to signal directory target #3854
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
f38d72c (BF: download_url: Update for new path resolution logic, 2019-06-03) didn't properly adjust path handling for downstream code that feeds the paths into AnnexRepo methods. We give these methods paths that are relative to the current directory when a dataset is not an instance, but these methods still expect paths to be either relative to the dataset or full paths. Pass AnnexRepo methods paths that are relative to the dataset. Fixes datalad#3847.
Use the resolve_path() helper rather than custom logic to resolve paths against the dataset. Using centralized logic helps avoid inconsistent behavior and allows us to take advantage of the non-trivial logic in resolve_path(). In particular, we avoid the use of normpath(), which is problematic for the reason mentioned in resolve_path's docstring and comments. Here's the pathlib documentation that resolve_path() references: Spurious slashes and single dots are collapsed, but double dots ('..') are not, since this would change the meaning of a path in the face of symbolic links: [...] (a naïve approach would make PurePosixPath('foo/../bar') equivalent to PurePosixPath('bar'), which is wrong if foo is a symbolic link to another directory) , which is problematic for the reasons mentioned in Re: datalad#3643 (comment)
On both master and 0.11x, there isn't an attempt to identify the dataset from --path argument. For example, if outside of the </path/to/ds/> dataset, running $ datalad download-url --path /path/to/ds/fname https://www.datalad.org/img/logo/studyforrest.png downloads the file to </path/to/ds/fname>, but it does not perform any of the dataset-dependent functionality (e.g., saving). Looking at 98153ec (ENH: download_url: Optionally add file to dataset, 2018-05-17), it appears that functionality was never supported and that this description was thoughtlessly copied from an existing --dataset description.
We've now dropped Python 2 support, so follow the suggestion of the deleted commented.
As of a570fcb (ENH: downloaders: Ensure directories for target exist, 2019-09-02), download() creates leading directories if it is given a path that does not exist for _non-directory_ targets. A directory target is supported, but it must exist. Move the "make directories if needed" logic early so that we can handle directory targets as well.
I've pushed an update with more tests and tweaked handling of the "path without slash points to existing directory" case. I'll take this out of draft mode, but label it with "do not merge" because it sits on top of gh-3850.
If the --path argument points to an existing directory, download_url() will dump content to files within that directory. The only way we know that the user wants a directory is that one exists. As a consequence, if there's a typo (as described in dataladgh-3484), download_url() can't be aware that a directory was intended and goes with the non-directory treatment. When combined with --archive, this can lead to a large number of files in a location the user didn't intend, typically the top-level directory of the repository. To improve this situation, require the user to tack on a trailing separator to indicate that they want the directory treatment. If the user has a typo in the directory name, at least the content goes into a misnamed subdirectory. And because download_url() knows what the user wanted regardless of whether the directory exists, download_url() can now support creating directories when they don't exist, which underneath is already supported by download(). Closes datalad#3848.
@@ Coverage Diff @@ ## master #3854 +/- ## =========================================== + Coverage 46.56% 80.73% +34.16% =========================================== Files 270 273 +3 Lines 36006 36058 +52 =========================================== + Hits 16767 29112 +12345 + Misses 19239 6946 -12293
Add this suggestion to a batch that can be applied as a single commit. This suggestion is invalid because no changes were made to the code. Suggestions cannot be applied while the pull request is closed. Suggestions cannot be applied while viewing a subset of changes. Only one suggestion per line can be applied in a batch. Add this suggestion to a batch that can be applied as a single commit. Applying suggestions on deleted lines is not supported. You must change the existing code in this line in order to create a valid suggestion. Outdated suggestions cannot be applied. This suggestion has been applied or marked resolved. Suggestions cannot be applied from pending reviews. Suggestions cannot be applied on multi-line comments. Suggestions cannot be applied while the pull request is queued to merge.