-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reimplementation of drop/(uninstall)/remove #6111
Conversation
Codecov Report
@@ Coverage Diff @@
## master #6111 +/- ##
===========================================
- Coverage 89.82% 54.07% -35.75%
===========================================
Files 317 318 +1
Lines 42417 41835 -582
===========================================
- Hits 38100 22624 -15476
- Misses 4317 19211 +14894
Continue to review full report at Codecov.
|
21be7bb
to
1ef5cae
Compare
This comment has been minimized.
This comment has been minimized.
Good FS:
Crippled FS (hence by-passing the often inappropriate
3x slower, some test assumptions broken. |
Merge individual still-unique aspects into the new tests.
This type of subdataset removal is tested in `test_remove_subdataset_nomethod`, where is does not require network.
RF all usage of deprecated commands and arguments.
Consolidated the tests and ported for cross-platform compatibility. Now approximately same speed and identical near-complete coverage: Good FS
Crippled FS:
|
Co-authored-by: Adina Wagner <adina.wagner@t-online.de>
Not a huge issue, since it's not destroying data, but I wonder whether the availability check for revisions is ideal. Don't immediately see a better solution, but someone else might.
|
Dropping a regular subdirectory w/
|
Similar: Dropping an already dropped subds, doesn't return anything either. Shoudn't that be a Technically likely the same thing:
|
Doing this for this simple case is easy, for the general case not. How would you determine for any given input path whether it had absolutely no consequences -- keeping in mind:
The desire for this type of reporting originally led to Same here ...
How much time should we invest? An empty dir could be an uninstalled subdataset, or an empty dir. A non-existing path could be an uninstalled subdataset, or not exist at all. Do we report notneeded for first-level uninstalled datasets? But why not for next-level? -- they also do not exist. So generating these inconsequential results always require inspection, slow normal operations, and complicate the code, because we need to track numerous combinations when a report qualifies as "having said something already". The current implementation reports notneeded whenever it is cheap, or when required for backward compatibility. But it tries hard to avoid runtime costs for beautification.
What are you proposing? That on-drop the superdataset is inspected whether it can generate any candidate URL from which a repository can be cloned that is then inspected whether it has all the commits needed by the to-be-dropped subdataset? |
This feels good! I tried it out for a couple of different workflows/situations that I commonly used drop/uninstall/remove, and I have no complaints. :-) I just found the small typo in the docs for |
I might be wrong, but shouldn't such a case be discoverable right after determining (link doesn't work properly. Line 371 in that diff's Comment right above that Edit: Arguably, it's even |
# so we have paths constraints that prevent dropping the full dataset | ||
# there is nothing to do here, but to drop keys, which we must not | ||
# done | ||
return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something along these lines is what I mean. When do we end up here? AFAICS, when we have path(s) that are part of THIS dataset (as opposed to a subdataset) and we have what='datasets'
. So, phrasing it like the comment ("constraining the drop of THIS dataset") is one way, but "trying to drop subdataset(s) where there are none" is another.
May be I just don't see how we can end up here in a way that this message would be misleading. I don't know.
return | |
for p in paths: | |
yield dict(action="drop", | |
path=p, | |
status="impossible", | |
message=f"what=datasets was given, but no dataset can be dropped from {p}", | |
type="file") | |
return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am afraid this is not correct. It is perfectly possible that the paths you are reporting on have been originally used to drop datasets underneath it. impossible
would be wrong, because something might have been done already, and file
is a guess.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is perfectly possible that the paths you are reporting on have been originally used to drop datasets underneath it.
Hm. I fail to produce that, but thanks for the hint. Trying to figure that out.
FTR: I'm testing my suggestion. As for this:
No - too expensive, obviously. As I said: No immediate idea. Just stumbled upon it. If you don't see a solution either - it's fine with me as is. I agree it's not worth making things more expensive.
Agree with the latter. Wouldn't phrase the former exactly like that. Via python interface I'd consider |
Conclusion from chat: Let's put the exploration and improvements possibly coming from @bpoldrack's comments in a separate PR. This one is already huge, and the suggested behavior was also not exhibited by the implementation that is being replaced here |
Sits on top of #6110 and will grow further. For now just a demo of the state for the curious.
Please help with naming! We need labels for
--what
filecontent
?git annex drop --auto
:unwanted
?git annex drop --all
:allkeys
?all
?--reckless
availability
availability
(yes, same as above -- same recklessness different subject only. one matters for file drop, the other for dataset uninstall), @adswa gave example, when the two should be selectable individually, thoughmodification
git annex dead here
outcome:undead
Fixes uninstall must check whether a dataset has a non-local remote and all changes are pushed #1142
Fixes bring --kill back? #1282
Fixes remove --nocheck is not just "nocheck" but also "nodrop" #1823
Fixes -J support for datalad drop #1953
Fixes
remove
confuses #2229Fixes --all (or just --git-annex-options?) for datalad drop #2328
Fixes
remove --if-dirty
default handling is misleading #2655Fixes --no-drop for datalad remove #2673
Fixes Make uninstall uninstall top-level datasets #2967
Fixes remove() should announce
annex dead
#3887Fixes Even remove cannot remove without using
-d
way to specify #4097Fixes Just a suggestion: datalad remove (make default --if-dirty fail) #4115
Fixes remove: fails to remove in --recursive, demands --recursive option #4784
Fixes (Re)design of
remove
aspurge
#5842Fixes Dataset.get_superdataset() calls subdatasets() with result renderer #6123
TODO:
drop
implementationdrop
implementation--jobs
togit annex drop
-J support for datalad drop #1953uninstall
drop --what all --reckless availability
we should even attempt to drop all keys. We could simply wipe out the repo, and be done (I feel like we discussed this before). already possible to bypass withkill
git annex dead here
be pushed to the default remote (if any), or all remotes? What if one of them fails? What if it needs a merge first?No! If, and only if, the existence of a local annex was ever communicated elsewhere (most would not), an error message must inform users what to do to communicate its pending death. But not automatically inside
drop
.annex dead
#3887Same no! An error message should inform about this, and instruct what to do (i.e. datalad push), but not do anything automatically inside
drop
localsync()
before testing this, and then only report the corresponding branches. The adjusted ones are not meant to be pushed anyways.kill
uninstall
is a thin shim of a command arounddrop
, old tests pass with minimal adjustments for increased safetyremove
as a thin wrapper arounddrop
andsave
, and remove all old helper code.remove(what=...
) toremove(drop=...)
AnnotatePaths
there (publish
is built on it) Import AnnotatePaths datalad-deprecated#46reckless=kill
should maybe bekill-dataset
, it won't actually kill anything, when given a file path to drop, or actually affect file dropping too