New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NF: AnnexRepo.call_annex*() #5163
Conversation
af19b78
to
9057546
Compare
Codecov Report
@@ Coverage Diff @@
## master #5163 +/- ##
==========================================
- Coverage 90.00% 89.72% -0.28%
==========================================
Files 301 301
Lines 42426 42413 -13
==========================================
- Hits 38184 38054 -130
- Misses 4242 4359 +117
Continue to review full report at Codecov.
|
There are failures in the |
Yes, saw that. Thx! I wanted to be clever, but wasn't. Will revert to the prev. setup. |
e8523a7
to
21fff19
Compare
This series looks really nice. I'm pretty wary about changing the behavior of things like $ python -c "from datalad.support.annexrepo import AnnexRepo; ar = AnnexRepo('.'); from pprint import pprint; pprint(ar.fsck([]))"
[{'command': 'fsck',
'dead': [],
'error-messages': [' Only 1 of 3 trustworthy copies exist of one',
' Back it up with git-annex copy.'],
'file': 'one',
'input': ['one'],
'key': 'SHA256E-s4--2c8b08da5ce60398e1f19af0e5dccc744df274b826abe585eaba68c525434806',
'note': 'checksum...',
'success': False,
'untrusted': []}]
$ python -c "from datalad.support.annexrepo import AnnexRepo; ar = AnnexRepo('.'); from pprint import pprint; pprint(ar.fsck([]))"
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/home/kyle/src/python/datalad/datalad/support/annexrepo.py", line 2708, in fsck
git_options=git_options,
File "/home/kyle/src/python/datalad/datalad/support/annexrepo.py", line 1073, in _call_annex_records
raise e
File "/home/kyle/src/python/datalad/datalad/support/annexrepo.py", line 999, in _call_annex_records
**kwargs,
File "/home/kyle/src/python/datalad/datalad/support/annexrepo.py", line 971, in _call_annex
**kwargs)
File "/home/kyle/src/python/datalad/datalad/cmd.py", line 518, in run
**results,
datalad.support.exceptions.CommandError: CommandError: 'git annex fsck --json --json-error-messages -c annex.dotfiles=true -c annex.retry=3' failed with exitcode 1 under /tmp/po-OUCQPk6 [err: 'git-annex: fsck: 1 failed'] [info keys: stdout_json]
Based on the added callers aside from |
Yes, I am uncertain about the best approach here, too. But if we take #5048 seriously and exercise it in full, all these methods must raise an exception. Few things that I know about this so far:
I have not settled on a good place to put such a kludge. I think most error handling that is still scattered around should move into
Yes, that matches my POV. Thx for your thoughts! |
If we were starting fresh, I would probably share that view, though I think always raising could be a bit awkard. For example, it seems very likely that a caller of As for whether it's worth changing the behavior now, I don't know.
Hmm, I don't see a better alternative than attaching them to the exception, so the only improvement that comes ot mind is tweaking
Yeah, it's of course easy for me to say that we should, but how to go about it does seem tricky...
How's the switch flipped in this case? Above you say core code shouldn't use it (and I agree, our code base shouldn't be emitting our own warnings), but if it's just a parameter on one of the I haven't come up with any workable ideas that wouldn't be essentially adding a parameter to each top-level method that calls I think a real alternative is to say the consistency isn't worth it at this point (in terms of coming up with the approach or making third-party code update their scripts), unless it's carried in separately with something like new core repo classes. Anyway, that is of course all just my two cents. Thanks for all of your work on this. |
Trying to get back to this project. Pushed a rebase on present master and addressed two of @kyleam's more recent comments. |
We discussed how the repo methods' error behavior should be (relative to the high-level API and considering established usage). However, we failed to document the consensus here or in the meeting notes. Does anyone remember? |
I think the consensus was something along the lines of "most of these repo methods should raise exceptions and that they don't can be considered a bug; in the few methods where we think inspection of the output is the main goal (perhaps things like whereis and fsck [*]), specific methods can catch and retain the current behavior". [*] I don't think there was agreement about whether these shouldn't raise. |
f2e90bd
to
bd7ae00
Compare
test_openfmri_pipeline2() fails in datalad/datalad#5163 (NF: AnnexRepo.call_annex*()) because get() now raises an exception if the git-annex-get call fails. https://github.com/datalad/datalad/runs/1517140012
Hopefully all of the remaining travis failures are addressed by the last push, and the -crawler failure should be covered by datalad/datalad-crawler#87. |
I love you too! |
test_openfmri_pipeline2() fails in datalad/datalad#5163 (NF: AnnexRepo.call_annex*()) because get() now raises an exception if the git-annex-get call fails. https://github.com/datalad/datalad/runs/1517140012
Re potential
Given this assessment, I concur with @kyleam that going with I would not touch As a next step it could make sense to formalize the progress reportin request ( |
And avoid custom calls and other workaround identified in dataladgh-4406.
There wasn't any benefit anymore and this way another normalize_paths usage can be avoided. The big question is, whether the implemented approach is a good general paradigm to approach the desired behavior of having low-level methods raise (see dataladgh-5048), while higher-level commands must abide to the `on_failure` instructions.
…nks obsolete The latter is now candidate for removal (last usage in _git_custom_command()) Thx @kyleam for the hint!
This should cover the remaining test failures on Travis: https://travis-ci.com/github/datalad/datalad/builds/207246916
From a more specific _call_annex_records()
There is no use case, so far, for a need of a different protocol vs. improving AnnexJsonProtocol.
Blind attempt, this is not tested on my machine, and seamingly untested in our CIs too.
I believe this is ready. |
I believe this was addressed a few commits back (RF: Move generic error detection into _call_annex()). [ci skip]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, thanks for the updates. Reading through again (not as carefully), I only spotted two minor docstring/comments issues (push incoming).
RF: Replace last _run_annex_command_json() call
Blind attempt, this is not tested on my machine, and seamingly untested
in our CIs too.
Thanks. This codepath is only triggered with fake dates. (I should add a test case that patches DATALAD_FAKE__DATES
.) Anyway, your blind substitutions look good and pass on my machine under DATALAD_FAKE__DATES=1
.
The --key functionality added in 4591591 (NF: addurls: Support creating content-less tree from known checksums, 2020-11-25) has two code paths, batch and non-batch, because batch commands are avoided when fake dates are enabled [1]. However, as pointed out by @mih in dataladgh-5163, the non-batch code path is not covered in the CI runs. Extend test_addurls_from_key() to cover datasets create with fake_dates=True. [1] 4d7ada2 (RF: annexrepo: Disable batching when fake dates are enabled, 2018-04-13)
Alrighty. This took a long time, but I think the outcome is quite nice and pleasantly mirrors the |
As of 46d78a8 (2020-11-18), AnnexRepo.get() uses _call_annex_records(), which unlike _run_annex_command_json(), raises a CommandError when git-annex exits with a non-zero status (dataladgh-5163). Update `datalad get` to catch the exception, extract the json records, and yield them as results. Thanks to @adswa for the test case. Fixes datalad#5260.
Analog to GitRepo.call_git*() and a requirement for #5152
Issues touched upon:
run_annex*
API in AnnexRepo #5161jobs=auto
#5167TODO
drop
command (and maybe helpers) to anticipate command error when callingAnnexRepo.drop()
. ATM @mih thinks that callingAnnexRepo._call_annex_records(['drop'])
is cleaner than going through the various hoops thatAnnexRepo.drop()
implies._annex_custom_command()
-> AnnexRepo._annex_custom_command() is being removed datalad-crawler#84_annex_custom_command()
_run_*annex_command*()
I can only see a single use outside -core:_call_annex_records()
CommandError.kwargs['stdout_json']
_call_annex_records()
into_call_annex()
Runner
and makecall_annex_records()
acall_annex_records_()
right away?