Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UX: If a clean repo is dirty after a failed run, give clean-up hints #6112

Merged
merged 2 commits into from
Dec 14, 2021

Conversation

adswa
Copy link
Member

@adswa adswa commented Oct 25, 2021

This is an implementation of #2924.
In case where a run command failed in a formerly clean repo, and the repo is
left in a dirty state, log general hints to use 'git clean' and 'git reset'
at INFO level.
Closes #2924.

@adswa adswa changed the base branch from master to maint October 25, 2021 12:11
@adswa adswa added the semver-internal Changes only affect the internal API label Oct 25, 2021
@codecov
Copy link

codecov bot commented Oct 25, 2021

Codecov Report

Merging #6112 (a41f4da) into maint (ece1cbc) will decrease coverage by 2.42%.
The diff coverage is 77.89%.

❗ Current head a41f4da differs from pull request most recent head 9256143. Consider uploading reports for the commit 9256143 to get more accurate results
Impacted file tree graph

@@            Coverage Diff             @@
##            maint    #6112      +/-   ##
==========================================
- Coverage   90.23%   87.80%   -2.43%     
==========================================
  Files         312      312              
  Lines       42219    42274      +55     
==========================================
- Hits        38095    37120     -975     
- Misses       4124     5154    +1030     
Impacted Files Coverage Δ
datalad/core/local/create.py 97.05% <ø> (-0.74%) ⬇️
datalad/core/local/tests/test_create.py 92.51% <ø> (-7.49%) ⬇️
datalad/distributed/ora_remote.py 31.28% <0.00%> (ø)
datalad/distribution/tests/test_install.py 99.79% <ø> (-0.21%) ⬇️
datalad/distribution/update.py 98.15% <ø> (-0.01%) ⬇️
datalad/downloaders/s3.py 57.36% <0.00%> (-0.59%) ⬇️
datalad/interface/rerun.py 95.83% <ø> (-0.66%) ⬇️
datalad/interface/results.py 96.92% <ø> (-0.03%) ⬇️
datalad/metadata/extractors/image.py 19.35% <0.00%> (-77.42%) ⬇️
datalad/metadata/extractors/tests/test_image.py 24.00% <ø> (-76.00%) ⬇️
... and 148 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ece1cbc...9256143. Read the comment docs.

@bpoldrack
Copy link
Member

I know this is more about the issue you are trying to address than the PR itself, but:

That particular hint kind scares me! How can we possibly know whether it's okay to run git reset --hard let alone git clean with -dfx? This is assuming a lot with respect to untracked files, .gitignore, etc.
I don't think we should do that.

Rationale for suggesting those particular commands, @yarikoptic?

This is an implementation of datalad#2924.
In case where a run command failed in a formerly clean repo, and the repo is
left in a dirty state, log general hints to use 'git clean' and 'git reset'
at INFO level
@adswa
Copy link
Member Author

adswa commented Oct 26, 2021

I agree to Ben's notion that this could be dangerous, and the -x probably definitely should go. There is value in such a hint, I think (unlocked files left behind are certainly confusing, and a non-Git user would rarely know about reset or clean), but I'd also be fine with not at all hinting towards these commands and closing the original issue as a 'wont-fix'.

@bpoldrack
Copy link
Member

For completeness, a reproducer of undesired behavior:

❱ datalad create test                 
[INFO   ] Creating a new annex repo at /tmp/test 
[INFO   ] scanning for unlocked files (this may take some time) 
create(ok): /tmp/test (dataset)
(datalad-dev) ben@tree in /tmp
❱ cd test
(datalad-dev) ben@tree in /tmp/test on git:master
❱ echo some > afile                   
(datalad-dev) ben@tree in /tmp/test on git:master
❱ datalad save                        
add(ok): afile (file)                                                                                                                                                                                                           
save(ok): . (dataset)                                                                                                                                                                                                           
action summary:                                                                                                                                                                                                                 
  add (ok: 1)
  save (ok: 1)
(datalad-dev) ben@tree in /tmp/test on git:master
❱ touch untracked                     
(datalad-dev) ben@tree in /tmp/test on git:master
❱ echo ".git*\nuntracked" > .gitignore    
(datalad-dev) ben@tree in /tmp/test on git:master
❱ datalad run "echo new > newfile; exit 1"
[INFO   ] == Command start (output follows) ===== 
[INFO   ] == Command exit (modification check follows) ===== 
[INFO   ] The command had a non-zero exit code. If this is expected, you can save the changes with 'datalad save -d . -r -F .git/COMMIT_EDITMSG' 
[INFO   ] The commands 'git reset --hard' and/or 'git clean -dfx' could be used to get to a clean dataset state again. 
CommandError: 'echo new > newfile; exit 1' failed with exitcode 1 under /tmp/test
(datalad-dev) ben@tree in /tmp/test on git:master
❱ git clean -dfx                                                                                                                                                                                                            1 !
Removing .gitignore        <-----------------
Removing newfile
Removing untracked       <-----------------

@bpoldrack
Copy link
Member

I think pointing to git reset is fine, although wording should include the idea that it's to be run directly after - suggesting git reset --hard assumes that there are no other modifications apart from the ones caused by the run execution. There were none before, otherwise we would have failed earlier w/o --explicit - so, that's kinda fine to suggest, but modifications can come from elsewhere. And I think users tend to c&p such suggestions without much thinking about it. If they were aware what that really does/implies/assumes, they wouldn't need a hint to begin with.

@adswa
Copy link
Member Author

adswa commented Oct 26, 2021

It could become a more general hint. I struggle to find a short message*, but maybe sth like "The command failure may have left unintentional clutter behind. The commands 'git reset' and/or 'git clean' could be used to reset the dataset into a clean state again. Consult their man pages for more information"

@bpoldrack
Copy link
Member

The commands 'git reset' and/or 'git clean' could be used to reset the dataset into a clean state again. Consult their man pages for more information

Yes, I'd like something like this much better, than suggesting any particular thing to run that may or may not have side-effects under some circumstances.

@yarikoptic
Copy link
Member

That particular hint kind scares me! How can we possibly know whether it's okay to run git reset --hard let alone git clean with -dfx? This is assuming a lot with respect to untracked files, .gitignore, etc. I don't think we should do that.

Rationale for suggesting those particular commands, @yarikoptic?

;-) because they are the commands (well, ok -- may be clean without -x?) to get to the original clean state which run (without --explicit) had to be in to proceed (we fail to proceed if dirty, right?). "Mixed" reset (just git reset) wouldn't reset changes to the files, thus wouldn't be sufficient. And yes, I do appreciate the potential "danger" from running such commands without knowing what they do.

@bpoldrack
Copy link
Member

And yes, I do appreciate the potential "danger" from running such commands without knowing what they do.

Good, so we can likely agree on a wording that is a little less suggestive of blindly c&p'ing what datalad said. That's why I like @adswa's wording to point to the commands, but rather than saying "run this", "check this".

For the same reason (A full command given with a statement like "run this to achieve that"), I'd also not suggest -f for clean. If anything, then -i, so that someone c&p'ing w/o awareness would at least be asked for confirmation.
Ultimately, the point is, that - while those commands may be just right - they could also be very wrong. The only ones who can judge that, are users who know what (else) was done, why, what unsaved results may still matter, how git works, etc. Those are the ones who don't need such a hint. Everyone else will have a tendency to blindly trust that statement we make. So it better be something that's true in any case, not just in the assumed "usual case".

@adswa
Copy link
Member Author

adswa commented Nov 1, 2021

I have reworded the message to be a bit more careful

@adswa
Copy link
Member Author

adswa commented Nov 8, 2021

any opinions on this? We can vote:

  • 👎 close this PR altogether
  • 👍 merge with the current message
  • 😕 close this PR but make a note in Harmonize "hints" provided to the user #6114 to include a similar warning in an overhaul of how we present hints
  • 👀 reword the current warning further (please include a suggestion)

@bpoldrack
Copy link
Member

"The commands 'git reset --hard' and/or 'git clean -dfi' could be used to get to a clean dataset state again. Consult their man pages for more information."

Generally fine with that, but if we are to suggest concrete command options for the "usual/general" case, make it either git clean -dffi (two f) or git clean -di, I think. That's because the second f makes git clean consider currently untracked subdatasets for removal (but still ask b/c -i). Hence, either make a suggestion that allows to clean up everything or a suggestion that is particularly careful about not removing too much (no f - AFAIK there's no point in a single f if i is given).

So, my vote is on git clean -dffi, since creation of (to-be-added) subdatasets within a run call doesn't seem particularly outlandish to me.

@mih
Copy link
Member

mih commented Nov 10, 2021

Waiting for @bpoldrack to put conclusion into #6114, such that compliance of this PR can be evaluated

@bpoldrack
Copy link
Member

FTR: Wrote down the idea in #6114, but I don't think it makes sense to wait for an actual implementation.

So, apart from my preference for minimal rewording, can be merged from my POV.

@adswa adswa force-pushed the ux-2924 branch 2 times, most recently from f3e2149 to 4a97867 Compare December 8, 2021 12:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
semver-internal Changes only affect the internal API
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Give hints on how to revert/cleanup whenever "run" fails
4 participants