Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: download_url: Pass add_archive_content a path relative to dataset #3058

Merged
merged 2 commits into from Dec 8, 2018

Conversation

@kyleam
Copy link
Contributor

@kyleam kyleam commented Dec 4, 2018

Otherwise the archive's absolute path is shown in the commit message.

Copy link
Member

@yarikoptic yarikoptic left a comment

shouldn't this be fixed within add_archive_content itself? there should be never a reason to record full path in the commit msg

@kyleam kyleam force-pushed the download-url-archive-relpath branch from 9e407a3 to c7ea897 Dec 5, 2018
@codecov
Copy link

@codecov codecov bot commented Dec 5, 2018

Codecov Report

Merging #3058 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #3058      +/-   ##
==========================================
+ Coverage   90.23%   90.24%   +<.01%     
==========================================
  Files         248      248              
  Lines       32530    32527       -3     
==========================================
- Hits        29355    29353       -2     
+ Misses       3175     3174       -1
Impacted Files Coverage Δ
datalad/interface/tests/test_download_url.py 100% <100%> (ø) ⬆️
datalad/interface/download_url.py 97.33% <100%> (ø) ⬆️
datalad/distribution/tests/test_install.py 99.59% <0%> (+0.19%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 30b6da8...c7ea897. Read the comment docs.

@codecov
Copy link

@codecov codecov bot commented Dec 5, 2018

Codecov Report

Merging #3058 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #3058      +/-   ##
==========================================
+ Coverage   90.23%   90.24%   +<.01%     
==========================================
  Files         248      248              
  Lines       32539    32555      +16     
==========================================
+ Hits        29362    29378      +16     
  Misses       3177     3177
Impacted Files Coverage Δ
datalad/interface/tests/test_download_url.py 100% <100%> (ø) ⬆️
datalad/interface/add_archive_content.py 90.38% <100%> (+0.04%) ⬆️
...atalad/interface/tests/test_add_archive_content.py 100% <100%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c66a9b2...c31f49b. Read the comment docs.

@kyleam
Copy link
Contributor Author

@kyleam kyleam commented Dec 5, 2018

shouldn't this be fixed within add_archive_content itself?

Fair enough. I am a bit hesitant to touch add_archive_content because I'm not very familiar with that code and haven't studied it enough to understand the details. But I think I found a one-word change that should do.

@yarikoptic
Copy link
Member

@yarikoptic yarikoptic commented Dec 6, 2018

Looks good, but leads to an ERROR the run with a space in the path (restarted errorred due to connectivity another one):

======================================================================
ERROR: datalad.interface.tests.test_add_archive_content.test_add_archive_content_absolute_path
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/travis/virtualenv/python3.5.6/lib/python3.5/site-packages/nose/case.py", line 198, in runTest
    self.test(*self.arg)
  File "/home/travis/virtualenv/python3.5.6/lib/python3.5/site-packages/datalad/tests/utils.py", line 423, in newfunc
    return t(*(arg + (d,)), **kw)
  File "/home/travis/virtualenv/python3.5.6/lib/python3.5/site-packages/datalad/interface/tests/test_add_archive_content.py", line 345, in test_add_archive_content_absolute_path
    add_archive_content(abs_tar_gz, annex=repo)
  File "/home/travis/virtualenv/python3.5.6/lib/python3.5/site-packages/datalad/interface/add_archive_content.py", line 248, in __call__
    key = annex.get_file_key(archive_rpath)
  File "/home/travis/virtualenv/python3.5.6/lib/python3.5/site-packages/datalad/support/gitrepo.py", line 280, in newfunc
    files_new = [normalize(self.path, files)]
  File "/home/travis/virtualenv/python3.5.6/lib/python3.5/site-packages/datalad/support/gitrepo.py", line 203, in _normalize_path
    % path, filename=path)
datalad.support.exceptions.FileNotInRepositoryError: FileNotInRepositoryError: 
Path outside repository: /home/travis/build/datalad/var/tmp/sym link/datalad_temp_tree_test_add_archive_content_absolute_path_2jcpany/1.tar.gz
[Errno None] : Path outside repository: /home/travis/build/datalad/var/tmp/sym link/datalad_temp_tree_test_add_archive_content_absolute_path_2jcpany/1.tar.gz: '/home/travis/build/datalad/var/tmp/sym link/datalad_temp_tree_test_add_archive_content_absolute_path_2jcpany/1.tar.gz'

@kyleam
Copy link
Contributor Author

@kyleam kyleam commented Dec 6, 2018

Looks good, but leads to an ERROR the run with a space in the path (restarted errorred due to connectivity another one):

Yeah, looks like add_archive_content has an existing bug that the new test exposes.

@kyleam kyleam force-pushed the download-url-archive-relpath branch from c7ea897 to dd5faf3 Dec 6, 2018
@kyleam
Copy link
Contributor Author

@kyleam kyleam commented Dec 6, 2018

OK, I've tweaked add_archive_content's relpath handling so that the TMPDIR="/var/tmp/sym link" run should pass. I've also removed a harmless but unintentional check in test_add_delete_after_and_drop that was added in the previous iteration.

@yarikoptic
Copy link
Member

@yarikoptic yarikoptic commented Dec 6, 2018

Only Windows is happy now -- I thought I would not live to see that (let's pretend that it is fully tested ;))

@kyleam
Copy link
Contributor Author

@kyleam kyleam commented Dec 7, 2018

Only Windows is happy now

That's what I get for testing locally under python 3 only.

kyleam added 2 commits Dec 7, 2018
… link"

When passed the archive as an absolute path, add_archive_content takes
the relative path against AnnexRepo.path, but this fails in the
TMPDIR="/var/tmp/sym link" test case [0].  Use get_dataset_root() so
that the leading path of the archive isn't resolved.

[0]: https://travis-ci.org/datalad/datalad/jobs/463863227
Don't leak machine-specific details into the commit message.
@kyleam kyleam force-pushed the download-url-archive-relpath branch from dd5faf3 to c31f49b Dec 7, 2018
@kyleam
Copy link
Contributor Author

@kyleam kyleam commented Dec 7, 2018

Only Windows is happy now

That's what I get for testing locally under python 3 only.

The issue was relpath default argument change: py2 relpath(..., start=".") vs py3 relpath(..., start=None).

Update pushed:

% git range-diff kyleam/download-url-archive-relpath...download-url-archive-relpath
1:  e23788e61 ! 1:  352088975 BF: add_archive_content: Fix relative path under TMPDIR="/var/tmp/sym link"
    @@ -25,7 +25,11 @@
      
              # _rpath below should depict paths relative to the top of the annex
     -        archive_rpath = relpath(archive_path, annex_path)
    -+        archive_rpath = relpath(archive_path, get_dataset_root(archive_path))
    ++        archive_rpath = relpath(
    ++            archive_path,
    ++            # Use `get_dataset_root` to avoid resolving the leading path. If no
    ++            # repo is found, downstream code will raise FileNotInRepositoryError.
    ++            get_dataset_root(archive_path) or ".")
      
              if archive in annex.untracked_files:
                  raise RuntimeError(
2:  dd5faf3bd = 2:  c31f49b52 ENH: add_archive_content: Report relative path in commit subject

@yarikoptic
Copy link
Member

@yarikoptic yarikoptic commented Dec 8, 2018

THANKS!

@yarikoptic yarikoptic merged commit 857cca6 into datalad:master Dec 8, 2018
5 checks passed
@kyleam kyleam deleted the download-url-archive-relpath branch Dec 8, 2018
yarikoptic added a commit that referenced this issue Feb 8, 2019
 A variety of bugfixes and enhancements

 ### Major refactoring and deprecations

 - All extracted metadata is now placed under git-annex by default.
   Previously files smaller than 20 kb were stored in git. ([#3109])
 - TODO: get_runner #3104 and pending #3131

 ### Fixes

 - Improved handling of long commands:
   - The code that inspected `SC_ARG_MAX` didn't check that the
     reported value was a sensible, positive number. ([#3025])
   - More commands that invoke `git` and `git-annex` with file
     arguments learned to split up the command calls when it is likely
     that the command would fail due to exceeding the maximum supported
     length. ([#3138])
 - The `setup_yoda_dataset` procedure created a malformed
   .gitattributes line. ([#3057])
 - [download-url] unnecessarily tried to infer the dataset when
   `--no-save` was given. ([#3029])
 - [rerun] aborted too late and with a confusing message when a ref
   specified via `--onto` didn't exist. ([#3019])
 - [run]:
   - `run` didn't preserve the current directory prefix ("./") on
      inputs and outputs, which is problematic if the caller relies on
      this representation when formatting the command. ([#3037])
   - Fixed a number of unicode py2-compatibility issues. ([#3035]) ([#3046])
   - To proceed with a failed command, the user was confusingly
     instructed to use `save` instead of `add` even though `run` uses
     `add` underneath. ([#3080])
 - Fixed a case where the helper class for checking external modules
   incorrectly reported a module as unknown. ([#3051])
 - [add-archive-content] mishandled the archive path when the leading
   path contained a symlink. ([#3058])
 - Following denied access, the credential code failed to consider a
   scenario, leading to a type error rather than an appropriate error
   message. ([#3091])
 - Some tests failed when executed from a `git worktree` checkout of the
   source repository. ([#3129])
 - During metadata extraction, batched annex processes weren't properly
   terminated, leading to issues on Windows. ([#3137])
 - [add] incorrectly handled an "invalid repository" exception when
   trying to add a submodule. ([#3141])
 - Pass `GIT_SSH_VARIANT=ssh` to git processes to be able to specify
   alternative ports in SSH urls

 ### Enhancements and new features

 - [search] learned to suggest closely matching keys if there are no
   hits. ([#3089])
 - [create-sibling] gained a `--group` option so that the caller can
   specify the file system group for the repository. ([#3098])
 - Interface classes can now override the default renderer for
   summarizing results. ([#3061])
 - [run]:
   - `--input` and `--output` can now be shortened to `-i` and `-o`.
     ([#3066])
   - Placeholders such as "{inputs}" are now expanded in the command
     that is shown in the commit message subject. ([#3065])
   - `interface.run.run_command` gained an `extra_inputs` argument so
     that wrappers like [datalad-container] can specify additional inputs
     that aren't considered when formatting the command string. ([#3038])
   - "--" can now be used to separate options for `run` and those for
     the command in ambiguous cases. ([#3119])
 - The utilities `create_tree` and `ok_file_has_content` now support
   ".gz" files. ([#3049])
 - The Singularity container for 0.11.1 now uses [nd_freeze] to make
   its builds reproducible.
 - A [publications] page has been added to the documentation. ([#3099])
 - `GitRepo.set_gitattributes` now accepts a `mode` argument that
   controls whether the .gitattributes file is appended to (default) or
   overwritten. ([#3115])
 - `datalad --help` now avoids using `man` so that the list of
   subcommands is shown.  ([#3124])

* tag '0.11.2': (124 commits)
  Changelog entry for GIT_SSH_VARIANT change
  ENH: Declare our GIT_SSH_COMMAND as GIT_SSH_VARIANT=ssh
  BF: sshconnector: Don't use ssh's port flag as scp's
  RF: sshconnector: Simplify shlex quote import
  CHANGELOG(0.11.2): Fix some typos
  [DATALAD RUNCMD] CHANGELOG: Linkify 0.11.2 entries
  CHANGELOG: Do first pass for 0.11.2
  CHANGELOG: Add missing link target for download-url
  Start cooking the 0.11.2 release
  RF: appveyor - move test_install tests to be ran the last
  RF: text_type instead of str
  ENH(TST): provide my timing for the slow test
  BF(TST): adjust the test for the fact that AnnexRepo.add does not blow on nonexisting files
  ENH(TST): two tests which test quick or thorough for add failing with too long list of files
  BF: get stderr if present, otherwise just use str(e)
  BF: append out/err only if not empty/None
  Centrlize handling running commands with long files list in _run_command_files_split
  RF: remove minor duplication of -- handling, place all files handling closer to the call
  RF: move unrelated to try/except handling outside
  ENH+DOC: Report actual process handle, not just PID
  ...
@yarikoptic yarikoptic added this to the Release 0.11.2 milestone Feb 10, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

2 participants