Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: clean - remove annex transfer directories #3374

Merged
merged 2 commits into from
May 16, 2019

Conversation

yarikoptic
Copy link
Member

Found a repository which a bunch of logs on what files failed to download:

(git)smaug:/mnt/btrfs/datasets/datalad/crawl/openneuro/ds000201[master]git
$> cat ./.git/annex/transfer/failed/download/d46236ce-1f9e-4216-9676-35f10fd6c553/MD5E-s131899529--67071615cc9e1041d0700547dd676ced.nii.gz
1543974478.691019986s
sub-9100/ses-1/func/sub-9100_ses-1_task-sleepiness_bold.nii.gz

I think it should be safe to remove them. Cons of the approach, that removing
top level directories so not providing list of specific files, which could be
taken as pros since there could be many ;)

Found a repository which a bunch of logs on what files failed to download:

(git)smaug:/mnt/btrfs/datasets/datalad/crawl/openneuro/ds000201[master]git
$> cat ./.git/annex/transfer/failed/download/d46236ce-1f9e-4216-9676-35f10fd6c553/MD5E-s131899529--67071615cc9e1041d0700547dd676ced.nii.gz
1543974478.691019986s
sub-9100/ses-1/func/sub-9100_ses-1_task-sleepiness_bold.nii.gz

I think it should be safe to remove them.  Cons of the approach, that removing
top level directories so not providing list of specific files, which could be
taken as pros since there could be many ;)
datalad/interface/clean.py Show resolved Hide resolved
(ANNEX_TEMP_DIR, "annex-tmp",
"temporary annex", ("file", "files")),
"temporary annex", FILES_PLURAL),
(ANNEX_TRANSFER_DIR, "annex-transfer",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't you add "annex-transfer" to the what option (or perhaps just classify this as "annex-tmp")?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ho ho -- thank you @kyleam for the thorough review! This gotcha shows again that duplication is evil, will RF a bit to avoid it! ;)

@codecov
Copy link

codecov bot commented May 16, 2019

Codecov Report

Merging #3374 into 0.11.x will increase coverage by 0.17%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           0.11.x    #3374      +/-   ##
==========================================
+ Coverage   90.84%   91.02%   +0.17%     
==========================================
  Files         252      255       +3     
  Lines       33127    33442     +315     
==========================================
+ Hits        30095    30440     +345     
+ Misses       3032     3002      -30
Impacted Files Coverage Δ
datalad/consts.py 100% <100%> (ø) ⬆️
datalad/interface/clean.py 87.27% <100%> (ø) ⬆️
datalad/ui/progressbars.py 83.1% <0%> (-3.04%) ⬇️
datalad/ui/dialog.py 92.35% <0%> (-0.82%) ⬇️
datalad/ui/__init__.py 97.67% <0%> (-0.06%) ⬇️
datalad/interface/tests/test_save.py 100% <0%> (ø) ⬆️
datalad/log.py 89.9% <0%> (ø) ⬆️
datalad/ui/base.py 95.45% <0%> (ø) ⬆️
datalad/downloaders/tests/test_credentials.py 100% <0%> (ø) ⬆️
... and 14 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 514545a...3ef68ec. Read the comment docs.

Copy link
Member Author

@yarikoptic yarikoptic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, refactoring could wait ;-) ready to be merged

@kyleam kyleam merged commit 3ef68ec into datalad:0.11.x May 16, 2019
kyleam added a commit that referenced this pull request May 16, 2019
yarikoptic added a commit to yarikoptic/datalad that referenced this pull request May 21, 2019
* origin/0.11.x: (3256 commits)
  MNT: Avoid invalid escape sequences in strings
  BF: export_to_figshare: Don't test identity of string literal
  BF(TST): do not assume user naiveness - treat any url-like looking path as a path
  BF: Check for /, \ or # in the username@hostname part while detecting SSHRI
  CHANGELOG.md: Add entry for dataladgh-3374
  BF: revert back (remove) check for path being PathRI
  BF: list annex-transfer also in cmdline opt choice for "what"
  RF: convert imports into "tupled lists" of what to import
  BF: stop relying on setup.py runtime environment using platform.dist
  CHANGELOG.md: Update for dataladgh-3407
  RF: Set precedence to datalad.ui.color, NO_COLOR, datalad.ui.ui.is_interactive
  TEST: Test ANSI colors tools directly
  ENH: ls: Replace custom logic with color_word()
  RF: Move datalad.ui.color to common_cfg
  DOC: Add Zenodo
  NF: Add check for whether color is enabled
  RF: disable wrapt workaround
  CHANGELOG.md: Update for dataladgh-3396
  DOC: log_progress: Reword cross-reference from 96b45b4
  ENH: adjust code comments
  ...
yarikoptic added a commit to yarikoptic/datalad that referenced this pull request May 21, 2019
* origin/0.11.x: (3256 commits)
  MNT: Avoid invalid escape sequences in strings
  BF: export_to_figshare: Don't test identity of string literal
  BF(TST): do not assume user naiveness - treat any url-like looking path as a path
  BF: Check for /, \ or # in the username@hostname part while detecting SSHRI
  CHANGELOG.md: Add entry for dataladgh-3374
  BF: revert back (remove) check for path being PathRI
  BF: list annex-transfer also in cmdline opt choice for "what"
  RF: convert imports into "tupled lists" of what to import
  BF: stop relying on setup.py runtime environment using platform.dist
  CHANGELOG.md: Update for dataladgh-3407
  RF: Set precedence to datalad.ui.color, NO_COLOR, datalad.ui.ui.is_interactive
  TEST: Test ANSI colors tools directly
  ENH: ls: Replace custom logic with color_word()
  RF: Move datalad.ui.color to common_cfg
  DOC: Add Zenodo
  NF: Add check for whether color is enabled
  RF: disable wrapt workaround
  CHANGELOG.md: Update for dataladgh-3396
  DOC: log_progress: Reword cross-reference from 96b45b4
  ENH: adjust code comments
  ...
yarikoptic added a commit that referenced this pull request May 28, 2019
0.11.5 (May 23, 2019) -- stability is not overrated

Should be faster and less buggy, with a few enhancements.

 Fixes

- [create-sibling][]  ([#3318][])
  - Siblings are no longer configured with a post-update hook unless a
    web interface is requested with `--ui`.
  - `git submodule update --init` is no longer called from the
    post-update hook.
  - If `--inherit` is given for a dataset without a superdataset, a
    warning is now given instead of raising an error.
- The internal command runner failed on Python 2 when its `env`
  argument had unicode values.  ([#3332][])
- The safeguard that prevents creating a dataset in a subdirectory
  that already contains tracked files for another repository failed on
  Git versions before 2.14.  For older Git versions, we now warn the
  caller that the safeguard is not active.  ([#3347][])
- A regression introduced in v0.11.1 prevented [save][] from committing
  changes under a subdirectory when the subdirectory was specified as
  a path argument.  ([#3106][])
- A workaround introduced in v0.11.1 made it possible for [save][] to
  do a partial commit with an annex file that has gone below the
  `annex.largefiles` threshold.  The logic of this workaround was
  faulty, leading to files being displayed as typechanged in the index
  following the commit.  ([#3365][])
- The resolve_path() helper confused paths that had a semicolon for
  SSH RIs.  ([#3425][])
- The detection of SSH RIs has been improved.  ([#3425][])

 Enhancements and new features

- The internal command runner was too aggressive in its decision to
  sleep.  ([#3322][])
- The "INFO" label in log messages now retains the default text color
  for the terminal rather than using white, which only worked well for
  terminals with dark backgrounds.  ([#3334][])
- A short flag `-R` is now available for the `--recursion-limit` flag,
  a flag shared by several subcommands.  ([#3340][])
- The authentication logic for [create-sibling-github][] has been
  revamped and now supports 2FA.  ([#3180][])
- New configuration option `datalad.ui.progressbar` can be used to
  configure the default backend for progress reporting ("none", for
  example, results in no progress bars being shown).  ([#3396][])
- A new progress backend, available by setting datalad.ui.progressbar
  to "log", replaces progress bars with a log message upon completion
  of an action.  ([#3396][])
- DataLad learned to consult the [NO_COLOR][] environment variable and
  the new `datalad.ui.color` configuration option when deciding to
  color output.  The default value, "auto", retains the current
  behavior of coloring output if attached to a TTY ([#3407][]).
- [clean][] now removes annex transfer directories, which is useful
  for cleaning up failed downloads. ([#3374][])
- [clone][] no longer refuses to clone into a local path that looks
  like a URL, making its behavior consistent with `git clone`.
  ([#3425][])
- [wtf][]
  - Learned to fall back to the `dist` package if `platform.dist`,
    which has been removed in the yet-to-be-release Python 3.8, does
    not exist.  ([#3439][])
  - Gained a `--section` option for limiting the output to specific
    sections and a `--decor` option, which currently knows how to
    format the output as GitHub's `<details>` section.  ([#3440][])

* tag '0.11.5': (96 commits)
  [DATALAD RUNCMD] make update-changelog
  Version boost and finalize CHANGELOG.md record
  ENH: new Makefile rule linkissues-changelog to link issues, which now will also be prerequisite for update-changelog
  CHANGELOG.md: Add entries for recently merged PRs
  ENH: require "distro" for python >= 3.8
  ENH: compat with python 3.8 which removed .dist -- try distro
  CLN: wtf: Remove unused (and duplicated) import
  DOC: wtf: Avoid double period in -S's description
  ENH: -D|--decor html_details -- to make it ready for pasting to github issue without clutter
  BF: assure bytes while giving to pyperclip upon its demand (on Py2)
  RF: move always present path + type "section" into "location" section, retain order of sections from cmdline
  RF: switch from nargs="*" to action=append for wtf -S
  ENH: wtf -S to specify which sections to query/display (by default -- all)
  MNT: Avoid invalid escape sequences in strings
  BF: export_to_figshare: Don't test identity of string literal
  BF(TST): do not assume user naiveness - treat any url-like looking path as a path
  BF: Check for /, \ or # in the username@hostname part while detecting SSHRI
  CHANGELOG.md: Add entry for gh-3374
  BF: revert back (remove) check for path being PathRI
  BF: list annex-transfer also in cmdline opt choice for "what"
  ...
yarikoptic added a commit that referenced this pull request May 28, 2019
0.11.5 (May 23, 2019) -- stability is not overrated

Should be faster and less buggy, with a few enhancements.

 Fixes

- [create-sibling][]  ([#3318][])
  - Siblings are no longer configured with a post-update hook unless a
    web interface is requested with `--ui`.
  - `git submodule update --init` is no longer called from the
    post-update hook.
  - If `--inherit` is given for a dataset without a superdataset, a
    warning is now given instead of raising an error.
- The internal command runner failed on Python 2 when its `env`
  argument had unicode values.  ([#3332][])
- The safeguard that prevents creating a dataset in a subdirectory
  that already contains tracked files for another repository failed on
  Git versions before 2.14.  For older Git versions, we now warn the
  caller that the safeguard is not active.  ([#3347][])
- A regression introduced in v0.11.1 prevented [save][] from committing
  changes under a subdirectory when the subdirectory was specified as
  a path argument.  ([#3106][])
- A workaround introduced in v0.11.1 made it possible for [save][] to
  do a partial commit with an annex file that has gone below the
  `annex.largefiles` threshold.  The logic of this workaround was
  faulty, leading to files being displayed as typechanged in the index
  following the commit.  ([#3365][])
- The resolve_path() helper confused paths that had a semicolon for
  SSH RIs.  ([#3425][])
- The detection of SSH RIs has been improved.  ([#3425][])

 Enhancements and new features

- The internal command runner was too aggressive in its decision to
  sleep.  ([#3322][])
- The "INFO" label in log messages now retains the default text color
  for the terminal rather than using white, which only worked well for
  terminals with dark backgrounds.  ([#3334][])
- A short flag `-R` is now available for the `--recursion-limit` flag,
  a flag shared by several subcommands.  ([#3340][])
- The authentication logic for [create-sibling-github][] has been
  revamped and now supports 2FA.  ([#3180][])
- New configuration option `datalad.ui.progressbar` can be used to
  configure the default backend for progress reporting ("none", for
  example, results in no progress bars being shown).  ([#3396][])
- A new progress backend, available by setting datalad.ui.progressbar
  to "log", replaces progress bars with a log message upon completion
  of an action.  ([#3396][])
- DataLad learned to consult the [NO_COLOR][] environment variable and
  the new `datalad.ui.color` configuration option when deciding to
  color output.  The default value, "auto", retains the current
  behavior of coloring output if attached to a TTY ([#3407][]).
- [clean][] now removes annex transfer directories, which is useful
  for cleaning up failed downloads. ([#3374][])
- [clone][] no longer refuses to clone into a local path that looks
  like a URL, making its behavior consistent with `git clone`.
  ([#3425][])
- [wtf][]
  - Learned to fall back to the `dist` package if `platform.dist`,
    which has been removed in the yet-to-be-release Python 3.8, does
    not exist.  ([#3439][])
  - Gained a `--section` option for limiting the output to specific
    sections and a `--decor` option, which currently knows how to
    format the output as GitHub's `<details>` section.  ([#3440][])

* tag '0.11.5':
  BF: make test for url download more reliable in cases where connection fails
  RF: remove stale commented out duecredit in setup.py.  It has now its own section
@yarikoptic yarikoptic added this to the 0.11.x milestone Jul 26, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants