Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: support of .gz in create_tree and ok_file_has_content + minor RF #3049

merged 4 commits into from Dec 8, 2018


Copy link

@yarikoptic yarikoptic commented Dec 3, 2018

This PR would be needed to facilitate testing of crawling portals with pure .gz files, e.g. #1967

additional ref: #1967

kyleam approved these changes Dec 3, 2018
Copy link

@kyleam kyleam left a comment


Copy link
Member Author

@yarikoptic yarikoptic commented Dec 4, 2018

oy, simingly benign changes seems have broke also the 2nd run of appveyor:

2nd run on appveyor fails
ERROR: datalad.interface.tests.test_run_procedure.test_configs
Traceback (most recent call last):
  File "C:\Miniconda35\envs\test-environment\lib\site-packages\datalad\tests\", line 435, in newfunc
    return t(*(arg + (d,)), **kw)
  File "C:\Miniconda35\envs\test-environment\lib\site-packages\datalad\interface\tests\", line 223, in test_configs
    ok_file_has_content(op.join(ds.path, 'fromproc.txt'), 'some_arg\n')
  File "C:\Miniconda35\envs\test-environment\lib\site-packages\datalad\tests\", line 418, in ok_file_has_content
    assert_equal(content, file_content, **kwargs)
  File "C:\Miniconda35\envs\test-environment\lib\unittest\", line 820, in assertEqual
    assertion_func(first, second, msg=msg)
  File "C:\Miniconda35\envs\test-environment\lib\unittest\", line 1193, in assertMultiLineEqual, standardMsg))
  File "C:\Miniconda35\envs\test-environment\lib\unittest\", line 665, in fail
    raise self.failureException(msg)
AssertionError: 'some_arg\n' != 'some_arg\r\n'
- some_arg
+ some_arg
?         +
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "C:\Miniconda35\envs\test-environment\lib\site-packages\nose\", line 198, in runTest
  File "C:\Miniconda35\envs\test-environment\lib\site-packages\datalad\tests\", line 877, in newfunc
    return func(*args, **kwargs)
  File "C:\Miniconda35\envs\test-environment\lib\site-packages\datalad\tests\", line 438, in newfunc
  File "C:\Miniconda35\envs\test-environment\lib\site-packages\datalad\", line 455, in rmtemp
    rmtree(f, *args, **kwargs)
  File "C:\Miniconda35\envs\test-environment\lib\site-packages\datalad\", line 368, in rmtree
    _rmtree(path, *args, **kwargs)
  File "C:\Miniconda35\envs\test-environment\lib\site-packages\datalad\", line 1766, in wrapped
    return f(*args, **kwargs)
  File "C:\Miniconda35\envs\test-environment\lib\site-packages\datalad\", line 1806, in _rmtree
    return shutil.rmtree(*args, **kwargs)
  File "C:\Miniconda35\envs\test-environment\lib\", line 488, in rmtree
    return _rmtree_unsafe(path, onerror)
  File "C:\Miniconda35\envs\test-environment\lib\", line 387, in _rmtree_unsafe
    onerror(os.rmdir, path, sys.exc_info())
  File "C:\Miniconda35\envs\test-environment\lib\", line 385, in _rmtree_unsafe
PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: 'C:\\Users\\appveyor\\AppData\\Local\\Temp\\1\\datalad_temp_tree_79dbipoy'
Ran 22 tests in 72.728s

I guess it is that RF for open 093f8a0 since the commit after it already was failing and it should've not had such an effect... heh heh

Copy link
Member Author

@yarikoptic yarikoptic commented Dec 4, 2018

I am really not sure how my changes brought in that rogue \r into the output file for that to fail on appveyor... weird! note though that it had

# FIXME: For some reason fails to commit correctly if on windows and in direct
# mode. However, direct mode on linux works
@skip_if(cond=on_windows and cfg.obtain("datalad.repo.version") < 6)

preamble and it does seems to run and pass on master ... aha - I added

    if isinstance(content, text_type):
        file_content = assure_unicode(file_content)

to match the type for the comparison - may be that is the "culprit" which made this issue to appear... worth troubleshooting

On windows, if any of the subsequent tests fail, harness would fail
to remove temporary directory since it would still be "busy". There is
no need to stay within open context
Copy link

@codecov codecov bot commented Dec 4, 2018

Codecov Report

Merging #3049 into master will increase coverage by <.01%.
The diff coverage is 95.23%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #3049      +/-   ##
+ Coverage   90.23%   90.24%   +<.01%     
  Files         246      246              
  Lines       32511    32521      +10     
+ Hits        29337    29347      +10     
  Misses       3174     3174
Impacted Files Coverage Δ
datalad/ 86.62% <100%> (+0.09%) ⬆️
datalad/tests/ 96.33% <100%> (+0.03%) ⬆️
datalad/tests/ 89.36% <92.85%> (-0.03%) ⬇️
datalad/support/tests/ 91.83% <0%> (-0.22%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5f69cfe...d110255. Read the comment docs.

Copy link
Member Author

@yarikoptic yarikoptic commented Dec 4, 2018

Jesus Christ
I didn't know... I will just workaround that if os.linesep is not \n, I would replace with \n

@yarikoptic yarikoptic merged commit de68577 into datalad:master Dec 8, 2018
4 of 7 checks passed
@yarikoptic yarikoptic deleted the enh-gz branch Dec 11, 2018
yarikoptic added a commit that referenced this issue Feb 8, 2019
 A variety of bugfixes and enhancements

 ### Major refactoring and deprecations

 - All extracted metadata is now placed under git-annex by default.
   Previously files smaller than 20 kb were stored in git. ([#3109])
 - TODO: get_runner #3104 and pending #3131

 ### Fixes

 - Improved handling of long commands:
   - The code that inspected `SC_ARG_MAX` didn't check that the
     reported value was a sensible, positive number. ([#3025])
   - More commands that invoke `git` and `git-annex` with file
     arguments learned to split up the command calls when it is likely
     that the command would fail due to exceeding the maximum supported
     length. ([#3138])
 - The `setup_yoda_dataset` procedure created a malformed
   .gitattributes line. ([#3057])
 - [download-url] unnecessarily tried to infer the dataset when
   `--no-save` was given. ([#3029])
 - [rerun] aborted too late and with a confusing message when a ref
   specified via `--onto` didn't exist. ([#3019])
 - [run]:
   - `run` didn't preserve the current directory prefix ("./") on
      inputs and outputs, which is problematic if the caller relies on
      this representation when formatting the command. ([#3037])
   - Fixed a number of unicode py2-compatibility issues. ([#3035]) ([#3046])
   - To proceed with a failed command, the user was confusingly
     instructed to use `save` instead of `add` even though `run` uses
     `add` underneath. ([#3080])
 - Fixed a case where the helper class for checking external modules
   incorrectly reported a module as unknown. ([#3051])
 - [add-archive-content] mishandled the archive path when the leading
   path contained a symlink. ([#3058])
 - Following denied access, the credential code failed to consider a
   scenario, leading to a type error rather than an appropriate error
   message. ([#3091])
 - Some tests failed when executed from a `git worktree` checkout of the
   source repository. ([#3129])
 - During metadata extraction, batched annex processes weren't properly
   terminated, leading to issues on Windows. ([#3137])
 - [add] incorrectly handled an "invalid repository" exception when
   trying to add a submodule. ([#3141])
 - Pass `GIT_SSH_VARIANT=ssh` to git processes to be able to specify
   alternative ports in SSH urls

 ### Enhancements and new features

 - [search] learned to suggest closely matching keys if there are no
   hits. ([#3089])
 - [create-sibling] gained a `--group` option so that the caller can
   specify the file system group for the repository. ([#3098])
 - Interface classes can now override the default renderer for
   summarizing results. ([#3061])
 - [run]:
   - `--input` and `--output` can now be shortened to `-i` and `-o`.
   - Placeholders such as "{inputs}" are now expanded in the command
     that is shown in the commit message subject. ([#3065])
   - `` gained an `extra_inputs` argument so
     that wrappers like [datalad-container] can specify additional inputs
     that aren't considered when formatting the command string. ([#3038])
   - "--" can now be used to separate options for `run` and those for
     the command in ambiguous cases. ([#3119])
 - The utilities `create_tree` and `ok_file_has_content` now support
   ".gz" files. ([#3049])
 - The Singularity container for 0.11.1 now uses [nd_freeze] to make
   its builds reproducible.
 - A [publications] page has been added to the documentation. ([#3099])
 - `GitRepo.set_gitattributes` now accepts a `mode` argument that
   controls whether the .gitattributes file is appended to (default) or
   overwritten. ([#3115])
 - `datalad --help` now avoids using `man` so that the list of
   subcommands is shown.  ([#3124])

* tag '0.11.2': (124 commits)
  Changelog entry for GIT_SSH_VARIANT change
  BF: sshconnector: Don't use ssh's port flag as scp's
  RF: sshconnector: Simplify shlex quote import
  CHANGELOG(0.11.2): Fix some typos
  [DATALAD RUNCMD] CHANGELOG: Linkify 0.11.2 entries
  CHANGELOG: Do first pass for 0.11.2
  CHANGELOG: Add missing link target for download-url
  Start cooking the 0.11.2 release
  RF: appveyor - move test_install tests to be ran the last
  RF: text_type instead of str
  ENH(TST): provide my timing for the slow test
  BF(TST): adjust the test for the fact that AnnexRepo.add does not blow on nonexisting files
  ENH(TST): two tests which test quick or thorough for add failing with too long list of files
  BF: get stderr if present, otherwise just use str(e)
  BF: append out/err only if not empty/None
  Centrlize handling running commands with long files list in _run_command_files_split
  RF: remove minor duplication of -- handling, place all files handling closer to the call
  RF: move unrelated to try/except handling outside
  ENH+DOC: Report actual process handle, not just PID
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
None yet
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

2 participants