-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: clean - remove annex transfer directories #3374
Conversation
Found a repository which a bunch of logs on what files failed to download: (git)smaug:/mnt/btrfs/datasets/datalad/crawl/openneuro/ds000201[master]git $> cat ./.git/annex/transfer/failed/download/d46236ce-1f9e-4216-9676-35f10fd6c553/MD5E-s131899529--67071615cc9e1041d0700547dd676ced.nii.gz 1543974478.691019986s sub-9100/ses-1/func/sub-9100_ses-1_task-sleepiness_bold.nii.gz I think it should be safe to remove them. Cons of the approach, that removing top level directories so not providing list of specific files, which could be taken as pros since there could be many ;)
(ANNEX_TEMP_DIR, "annex-tmp", | ||
"temporary annex", ("file", "files")), | ||
"temporary annex", FILES_PLURAL), | ||
(ANNEX_TRANSFER_DIR, "annex-transfer", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't you add "annex-transfer" to the what
option (or perhaps just classify this as "annex-tmp")?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ho ho -- thank you @kyleam for the thorough review! This gotcha shows again that duplication is evil, will RF a bit to avoid it! ;)
Codecov Report
@@ Coverage Diff @@
## 0.11.x #3374 +/- ##
==========================================
+ Coverage 90.84% 91.02% +0.17%
==========================================
Files 252 255 +3
Lines 33127 33442 +315
==========================================
+ Hits 30095 30440 +345
+ Misses 3032 3002 -30
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, refactoring could wait ;-) ready to be merged
* origin/0.11.x: (3256 commits) MNT: Avoid invalid escape sequences in strings BF: export_to_figshare: Don't test identity of string literal BF(TST): do not assume user naiveness - treat any url-like looking path as a path BF: Check for /, \ or # in the username@hostname part while detecting SSHRI CHANGELOG.md: Add entry for dataladgh-3374 BF: revert back (remove) check for path being PathRI BF: list annex-transfer also in cmdline opt choice for "what" RF: convert imports into "tupled lists" of what to import BF: stop relying on setup.py runtime environment using platform.dist CHANGELOG.md: Update for dataladgh-3407 RF: Set precedence to datalad.ui.color, NO_COLOR, datalad.ui.ui.is_interactive TEST: Test ANSI colors tools directly ENH: ls: Replace custom logic with color_word() RF: Move datalad.ui.color to common_cfg DOC: Add Zenodo NF: Add check for whether color is enabled RF: disable wrapt workaround CHANGELOG.md: Update for dataladgh-3396 DOC: log_progress: Reword cross-reference from 96b45b4 ENH: adjust code comments ...
* origin/0.11.x: (3256 commits) MNT: Avoid invalid escape sequences in strings BF: export_to_figshare: Don't test identity of string literal BF(TST): do not assume user naiveness - treat any url-like looking path as a path BF: Check for /, \ or # in the username@hostname part while detecting SSHRI CHANGELOG.md: Add entry for dataladgh-3374 BF: revert back (remove) check for path being PathRI BF: list annex-transfer also in cmdline opt choice for "what" RF: convert imports into "tupled lists" of what to import BF: stop relying on setup.py runtime environment using platform.dist CHANGELOG.md: Update for dataladgh-3407 RF: Set precedence to datalad.ui.color, NO_COLOR, datalad.ui.ui.is_interactive TEST: Test ANSI colors tools directly ENH: ls: Replace custom logic with color_word() RF: Move datalad.ui.color to common_cfg DOC: Add Zenodo NF: Add check for whether color is enabled RF: disable wrapt workaround CHANGELOG.md: Update for dataladgh-3396 DOC: log_progress: Reword cross-reference from 96b45b4 ENH: adjust code comments ...
0.11.5 (May 23, 2019) -- stability is not overrated Should be faster and less buggy, with a few enhancements. Fixes - [create-sibling][] ([#3318][]) - Siblings are no longer configured with a post-update hook unless a web interface is requested with `--ui`. - `git submodule update --init` is no longer called from the post-update hook. - If `--inherit` is given for a dataset without a superdataset, a warning is now given instead of raising an error. - The internal command runner failed on Python 2 when its `env` argument had unicode values. ([#3332][]) - The safeguard that prevents creating a dataset in a subdirectory that already contains tracked files for another repository failed on Git versions before 2.14. For older Git versions, we now warn the caller that the safeguard is not active. ([#3347][]) - A regression introduced in v0.11.1 prevented [save][] from committing changes under a subdirectory when the subdirectory was specified as a path argument. ([#3106][]) - A workaround introduced in v0.11.1 made it possible for [save][] to do a partial commit with an annex file that has gone below the `annex.largefiles` threshold. The logic of this workaround was faulty, leading to files being displayed as typechanged in the index following the commit. ([#3365][]) - The resolve_path() helper confused paths that had a semicolon for SSH RIs. ([#3425][]) - The detection of SSH RIs has been improved. ([#3425][]) Enhancements and new features - The internal command runner was too aggressive in its decision to sleep. ([#3322][]) - The "INFO" label in log messages now retains the default text color for the terminal rather than using white, which only worked well for terminals with dark backgrounds. ([#3334][]) - A short flag `-R` is now available for the `--recursion-limit` flag, a flag shared by several subcommands. ([#3340][]) - The authentication logic for [create-sibling-github][] has been revamped and now supports 2FA. ([#3180][]) - New configuration option `datalad.ui.progressbar` can be used to configure the default backend for progress reporting ("none", for example, results in no progress bars being shown). ([#3396][]) - A new progress backend, available by setting datalad.ui.progressbar to "log", replaces progress bars with a log message upon completion of an action. ([#3396][]) - DataLad learned to consult the [NO_COLOR][] environment variable and the new `datalad.ui.color` configuration option when deciding to color output. The default value, "auto", retains the current behavior of coloring output if attached to a TTY ([#3407][]). - [clean][] now removes annex transfer directories, which is useful for cleaning up failed downloads. ([#3374][]) - [clone][] no longer refuses to clone into a local path that looks like a URL, making its behavior consistent with `git clone`. ([#3425][]) - [wtf][] - Learned to fall back to the `dist` package if `platform.dist`, which has been removed in the yet-to-be-release Python 3.8, does not exist. ([#3439][]) - Gained a `--section` option for limiting the output to specific sections and a `--decor` option, which currently knows how to format the output as GitHub's `<details>` section. ([#3440][]) * tag '0.11.5': (96 commits) [DATALAD RUNCMD] make update-changelog Version boost and finalize CHANGELOG.md record ENH: new Makefile rule linkissues-changelog to link issues, which now will also be prerequisite for update-changelog CHANGELOG.md: Add entries for recently merged PRs ENH: require "distro" for python >= 3.8 ENH: compat with python 3.8 which removed .dist -- try distro CLN: wtf: Remove unused (and duplicated) import DOC: wtf: Avoid double period in -S's description ENH: -D|--decor html_details -- to make it ready for pasting to github issue without clutter BF: assure bytes while giving to pyperclip upon its demand (on Py2) RF: move always present path + type "section" into "location" section, retain order of sections from cmdline RF: switch from nargs="*" to action=append for wtf -S ENH: wtf -S to specify which sections to query/display (by default -- all) MNT: Avoid invalid escape sequences in strings BF: export_to_figshare: Don't test identity of string literal BF(TST): do not assume user naiveness - treat any url-like looking path as a path BF: Check for /, \ or # in the username@hostname part while detecting SSHRI CHANGELOG.md: Add entry for gh-3374 BF: revert back (remove) check for path being PathRI BF: list annex-transfer also in cmdline opt choice for "what" ...
0.11.5 (May 23, 2019) -- stability is not overrated Should be faster and less buggy, with a few enhancements. Fixes - [create-sibling][] ([#3318][]) - Siblings are no longer configured with a post-update hook unless a web interface is requested with `--ui`. - `git submodule update --init` is no longer called from the post-update hook. - If `--inherit` is given for a dataset without a superdataset, a warning is now given instead of raising an error. - The internal command runner failed on Python 2 when its `env` argument had unicode values. ([#3332][]) - The safeguard that prevents creating a dataset in a subdirectory that already contains tracked files for another repository failed on Git versions before 2.14. For older Git versions, we now warn the caller that the safeguard is not active. ([#3347][]) - A regression introduced in v0.11.1 prevented [save][] from committing changes under a subdirectory when the subdirectory was specified as a path argument. ([#3106][]) - A workaround introduced in v0.11.1 made it possible for [save][] to do a partial commit with an annex file that has gone below the `annex.largefiles` threshold. The logic of this workaround was faulty, leading to files being displayed as typechanged in the index following the commit. ([#3365][]) - The resolve_path() helper confused paths that had a semicolon for SSH RIs. ([#3425][]) - The detection of SSH RIs has been improved. ([#3425][]) Enhancements and new features - The internal command runner was too aggressive in its decision to sleep. ([#3322][]) - The "INFO" label in log messages now retains the default text color for the terminal rather than using white, which only worked well for terminals with dark backgrounds. ([#3334][]) - A short flag `-R` is now available for the `--recursion-limit` flag, a flag shared by several subcommands. ([#3340][]) - The authentication logic for [create-sibling-github][] has been revamped and now supports 2FA. ([#3180][]) - New configuration option `datalad.ui.progressbar` can be used to configure the default backend for progress reporting ("none", for example, results in no progress bars being shown). ([#3396][]) - A new progress backend, available by setting datalad.ui.progressbar to "log", replaces progress bars with a log message upon completion of an action. ([#3396][]) - DataLad learned to consult the [NO_COLOR][] environment variable and the new `datalad.ui.color` configuration option when deciding to color output. The default value, "auto", retains the current behavior of coloring output if attached to a TTY ([#3407][]). - [clean][] now removes annex transfer directories, which is useful for cleaning up failed downloads. ([#3374][]) - [clone][] no longer refuses to clone into a local path that looks like a URL, making its behavior consistent with `git clone`. ([#3425][]) - [wtf][] - Learned to fall back to the `dist` package if `platform.dist`, which has been removed in the yet-to-be-release Python 3.8, does not exist. ([#3439][]) - Gained a `--section` option for limiting the output to specific sections and a `--decor` option, which currently knows how to format the output as GitHub's `<details>` section. ([#3440][]) * tag '0.11.5': BF: make test for url download more reliable in cases where connection fails RF: remove stale commented out duecredit in setup.py. It has now its own section
Found a repository which a bunch of logs on what files failed to download:
I think it should be safe to remove them. Cons of the approach, that removing
top level directories so not providing list of specific files, which could be
taken as pros since there could be many ;)