Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

download-url: Fix archive path handling when in subdirectory #3850

Merged
merged 2 commits into from Nov 25, 2019

Conversation

kyleam
Copy link
Collaborator

@kyleam kyleam commented Oct 30, 2019

This patch fixes gh-3847.

#!/bin/sh

set -eu

cd $(mktemp -d --tmpdir dl-XXXXXXX)
datalad create
mkdir out
url=http://swcarpentry.github.io/python-novice-inflammation/data/python-novice-inflammation-data.zip

datalad -C out download-url --archive $url
tree --charset=asci out | head
[...]
action summary:
  add (ok: 1)
  download_url (ok: 1)
  save (ok: 1)
out
`-- data
    |-- inflammation-01.csv -> ../../.git/annex/objects/Kg/60/MD5E-s5365--d5a838342924a0806eeeea0e4adaba0a.csv/MD5E-s5365--d5a838342924a0806eeeea0e4adaba0a.csv
    |-- inflammation-02.csv -> ../../.git/annex/objects/zp/qF/MD5E-s5314--0551caa3e8a0ef3a53e99a4a01c27a65.csv/MD5E-s5314--0551caa3e8a0ef3a53e99a4a01c27a65.csv
    |-- inflammation-03.csv -> ../../.git/annex/objects/2Q/Fj/MD5E-s5127--671c1563d3bd76538dab97537c370623.csv/MD5E-s5127--671c1563d3bd76538dab97537c370623.csv
    |-- inflammation-04.csv -> ../../.git/annex/objects/FZ/Pf/MD5E-s5367--fc8d2c99778fb8f6d243de481c33e2e9.csv/MD5E-s5367--fc8d2c99778fb8f6d243de481c33e2e9.csv
    |-- inflammation-05.csv -> ../../.git/annex/objects/fv/Gg/MD5E-s5345--13bd5318271c66ace4de33a10dca9b72.csv/MD5E-s5345--13bd5318271c66ace4de33a10dca9b72.csv
    |-- inflammation-06.csv -> ../../.git/annex/objects/Q5/FF/MD5E-s5330--934b26dfc803cabd100b9267f44e08fc.csv/MD5E-s5330--934b26dfc803cabd100b9267f44e08fc.csv
    |-- inflammation-07.csv -> ../../.git/annex/objects/2f/m3/MD5E-s5342--e13ba83ccf44f33c97fa0e240962cc78.csv/MD5E-s5342--e13ba83ccf44f33c97fa0e240962cc78.csv
    |-- inflammation-08.csv -> ../../.git/annex/objects/2Q/Fj/MD5E-s5127--671c1563d3bd76538dab97537c370623.csv/MD5E-s5127--671c1563d3bd76538dab97537c370623.csv

@codecov
Copy link

codecov bot commented Oct 31, 2019

Codecov Report

Merging #3850 into master will decrease coverage by 0.01%.
The diff coverage is 50%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #3850      +/-   ##
==========================================
- Coverage   80.76%   80.74%   -0.02%     
==========================================
  Files         273      273              
  Lines       36008    36029      +21     
==========================================
+ Hits        29081    29093      +12     
- Misses       6927     6936       +9
Impacted Files Coverage Δ
datalad/interface/tests/test_download_url.py 100% <100%> (ø) ⬆️
datalad/interface/download_url.py 38.27% <14.28%> (-3.16%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update afc3270...99844ed. Read the comment docs.

@kyleam kyleam marked this pull request as ready for review October 31, 2019 00:00
f38d72c (BF: download_url: Update for new path resolution logic,
2019-06-03) didn't properly adjust path handling for downstream code
that feeds the paths into AnnexRepo methods.  We give these methods
paths that are relative to the current directory when a dataset is not
an instance, but these methods still expect paths to be either
relative to the dataset or full paths.

Pass AnnexRepo methods paths that are relative to the dataset.

Fixes datalad#3847.
Use the resolve_path() helper rather than custom logic to resolve
paths against the dataset.  Using centralized logic helps avoid
inconsistent behavior and allows us to take advantage of the
non-trivial logic in resolve_path().  In particular, we avoid the use
of normpath(), which is problematic for the reason mentioned in
resolve_path's docstring and comments.  Here's the pathlib
documentation that resolve_path() references:

    Spurious slashes and single dots are collapsed, but double dots ('..')
    are not, since this would change the meaning of a path in the face of
    symbolic links:
    [...]
    (a naïve approach would make PurePosixPath('foo/../bar')
    equivalent to PurePosixPath('bar'), which is wrong if foo is a
    symbolic link to another directory) , which is problematic for the
    reasons mentioned in

Re: datalad#3643 (comment)
@kyleam
Copy link
Collaborator Author

kyleam commented Nov 1, 2019

The codecov report doesn't match what I see locally. Travis is quiet, so I've rebased this on top of the current master and pushed hoping that it sorts itself out.

@kyleam
Copy link
Collaborator Author

kyleam commented Nov 1, 2019

The codecov report doesn't match what I see locally. Travis is quiet, so I've rebased this on top of the current master and pushed hoping that it sorts itself out.

Nope, codecov still isn't happy. Dunno.

Here's what I see locally:

% coverage run -m nose -vs -x datalad/interface/tests/test_download_url.py
% coverage report -m --include='*/download_url.py'
Name                                Stmts   Miss  Cover   Missing
-----------------------------------------------------------------
datalad/interface/download_url.py      81      2    98%   195-196

where the two missing lines are

except AnnexBatchCommandError as exc:
lgr.warning("Registering %s with %s failed: %s",
path, url, exc_str(exc))

@yarikoptic
Copy link
Member

yeah, coverage doesn't make sense - test is covered but the code is not. non-believer me even ran locally -- code is executed. so let's hope it is just a temporary glitch in coverage

@yarikoptic yarikoptic merged commit 302c0cf into datalad:master Nov 25, 2019
@kyleam kyleam deleted the download-url-archive-path branch December 3, 2019 15:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

-C SUBDIR download-url --archive does not extract archive
2 participants