Skip to content

Change a key-value pair in drop result record #6625

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Apr 29, 2022
Merged

Conversation

mslw
Copy link
Contributor

@mslw mslw commented Apr 11, 2022

When used with what=(allkeys/datasets/all) and reckless != kill, drop created a result record in which a list of error messages coming from git-annex was placed in the field named error_messages. This led to the generic results renderer (which expects error_message and/or message instead) not showing the error message to the user, as described in #6577.

This PR rewrites the error messages into a string under res['error_message'] (if there are multiple messages, they are joined into one multiline string, following annexjson2result from datalad.interface.results). With that, results renderer behaves as expected.

Before (I'm mocking both note and error-messages produced by git-annex, although in the original issue the note was empty):

❱ datalad drop --what all -d bar --reckless availability
drop(error): . (key) [git-annex note here]

and after:

❱ datalad drop --what all -d bar --reckless availability
drop(error): . (key) [git-annex note here] [git-annex error message here]

Note. An alternative solution (taking #6577 literally) would be to equip the results renderer with the ability to understand a list of error messages, but I think the solution above is more in line with the rest of the code (error_messages is not used anywhere else, error-messages is, but usually to process git annex output, often converted into error_message as above), and good as an immediate UX fix.

Note 2. The added test is fairly specific - it tests directly for the thing I wanted to change (annex's error-messages being rewritten to result's error_message so that the renderer gets what it wants) but I'm not 100% sure if that's the thing to be tested in the long run.

Note 3. Should this PR be against master or maint?

TODO:

  • verify the old/new behaviour by temporarily patching the call to annex (_drop_allkeys / _call_annex_record_items) to always return an error
  • do the above in a formal test using patch context manager

Changelog

🐛 Bug Fixes

Replaced error_messages with error_message because the generic
results renderer (from datalad.interface.utils) uses error_message.

If there are multiple error messages, they will be joined into one
multi-line string (as done e.g. by annexjson2result from
datalad.interface.results).
@mslw mslw added the semver-patch Increment the patch version when merged label Apr 11, 2022
@@ -415,7 +415,9 @@ def _drop_dataset(ds, paths, what, reckless, recursive, recursion_limit, jobs):
res['message'] = message
error_messages = r.get('error-messages')
if error_messages:
res['error_messages'] = error_messages
res['error_message'] = '\n'.join(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick comment: I'd suggest +=, since both error_message and error_messages could be present. That may indicate there's a problem in the command implementation, but we'd probably not swallow information in that case.

Copy link
Contributor Author

@mslw mslw Apr 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what you mean by += - to include both the original error_messages list and the error_message string in the res dictionary? Or to append the string to res['error_message]? The latter wouldn't matter, because res is instantiated with a dict(...) moments earlier.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant to append, but you are right - here it doesn't matter. However, r could have an error_message already that we ignore here. That may not currently being the case, but that can change whenever things change at a lower-level function.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking at the code, indeed the error_message shouldn't end up being there. And indeed it might ;) if there is an assumption that it is not there:

Suggested change
res['error_message'] = '\n'.join(
assert 'error_message' not in res
res['error_message'] = '\n'.join(

to ensure that, so that if code changes, tests (if case is covered) would alert us that assumption is broken

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is esssentially saying that the following code:

d = dict(a=5)
# <imagine 7 lines of code with no reference to `d`>
d['b'] = 7

should become

d = dict(a=5)
# <imagine 7 lines of code with no reference to `d`>
assert 'b' not in d
d['b'] = 7

if this is to become a standard approach, we have to add a few thousand patch to the code base ASAP ;-)

The test expects that when calling drop with allkeys, potential error
messages reported by git-annex will be passed on to the drop's result
record (using 'error_message' key).
@codeclimate
Copy link

codeclimate bot commented Apr 13, 2022

Code Climate has analyzed commit cf2fd32 and detected 0 issues on this pull request.

View more on Code Climate.

@codecov
Copy link

codecov bot commented Apr 13, 2022

Codecov Report

Merging #6625 (cf2fd32) into master (c659a98) will increase coverage by 0.00%.
The diff coverage is 96.46%.

@@           Coverage Diff           @@
##           master    #6625   +/-   ##
=======================================
  Coverage   91.16%   91.17%           
=======================================
  Files         353      353           
  Lines       44438    44491   +53     
=======================================
+ Hits        40513    40564   +51     
- Misses       3925     3927    +2     
Impacted Files Coverage Δ
datalad/distribution/create_sibling.py 65.42% <ø> (ø)
datalad/support/sshconnector.py 87.19% <ø> (ø)
datalad/distribution/tests/test_create_sibling.py 77.80% <88.88%> (ø)
datalad/support/tests/test_annexrepo.py 97.87% <93.33%> (-0.05%) ⬇️
datalad/support/annexrepo.py 91.41% <96.87%> (+0.22%) ⬆️
datalad/support/gitrepo.py 90.81% <97.50%> (-0.16%) ⬇️
datalad/core/local/tests/test_save.py 97.97% <100.00%> (ø)
datalad/distributed/drop.py 97.68% <100.00%> (+0.46%) ⬆️
datalad/distributed/tests/test_drop.py 100.00% <100.00%> (ø)
datalad/tests/test_version.py 80.00% <100.00%> (ø)
... and 6 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4aff4a2...cf2fd32. Read the comment docs.

@mslw mslw changed the base branch from master to maint April 13, 2022 16:16
@mslw mslw marked this pull request as ready for review April 13, 2022 16:17
@mslw mslw requested review from bpoldrack and mih April 13, 2022 16:18
# https://github.com/datalad/datalad/issues/6577
@with_tempfile
def test_drop_allkeys_result_contains_annex_error_messages(path):
# when calling drop with allkeys, expect git-annex error
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually: what=allkeys is irrelevant from my POV. This may be, where it currently happens, but the change should generally ensure that we get whatever comes from annex regardless.

Copy link
Member

@bpoldrack bpoldrack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am kind of fine with this, but why not simply address what the issue suggests? Let generic_result_renderer also take error_messages into account? There may be more spots than just drop, where this could happen (now or in the future).

@bpoldrack
Copy link
Member

If you take my suggestion instead, then nothing changes in the result records - just in rendering. It can then go against maint, I think.

@@ -415,7 +415,9 @@ def _drop_dataset(ds, paths, what, reckless, recursive, recursion_limit, jobs):
res['message'] = message
error_messages = r.get('error-messages')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TL;DR: I think we would benefit from some AnnexRepo.annex_to_datalad_error @staticmethod helper or alike to be reused across all those uses.

uff... venting about my quality time to understand what we are talking about. We would benefit from typing the result record into some @dataclass and adding type checking... First I have spent time hunting for error-message (not error_message) to find none!

I did find "error-messages"
$> git grep -p "[\"']error-messages"   
CHANGELOG.md=only supported Python flavor.
CHANGELOG.md:  relay information from the "error-messages" field.  ([#3931][])
datalad/distributed/drop.py=def _drop_dataset(ds, paths, what, reckless, recursive, recursion_limit, jobs):
datalad/distributed/drop.py:            error_messages = r.get('error-messages')
datalad/distributed/tests/test_ora_http.py=def test_read_access(store_path, store_url, ds_path):
datalad/distributed/tests/test_ora_http.py:        assert_equal(r["error-messages"], [])
datalad/interface/results.py=def annexjson2result(d, ds, **kwargs):
datalad/interface/results.py:    if d.get('error-messages', None):
datalad/interface/results.py:        res['error_message'] = '\n'.join(m.strip() for m in d['error-messages'])
datalad/local/copy_file.py=def _place_filekey(finfo, str_src, dest, str_dest, dest_repo):
datalad/local/copy_file.py:                m for r in res for m in r.get('error-messages', [])),
datalad/runner/exception.py=def _format_json_error_messages(recs: List[Dict]):
datalad/runner/exception.py:            '\n'.join(r.get('error-messages', [])),
datalad/runner/tests/test_exception.py=def get_json_objects(object_count, message_count) -> List[Dict]:
datalad/runner/tests/test_exception.py:            "error-messages": [
datalad/support/annex_utils.py=def _fake_json_for_non_existing(paths, cmd):
datalad/support/annex_utils.py:             "error-messages": ["File unknown to git"]  # Note,
datalad/support/annexrepo.py=class AnnexRepo(GitRepo, RepoInterface):
datalad/support/annexrepo.py:                       for k in j if k != 'file' and k != 'error-messages'})
datalad/support/annexrepo.py:            if j.get('error-messages', None):
datalad/support/annexrepo.py:                rec['error_message'] = '\n'.join(m.strip() for m in j['error-messages'])
datalad/support/annexrepo.py:                    message='\n'.join(r['error-messages'])
datalad/support/annexrepo.py:                    if 'error-messages' in r else None,
datalad/support/annexrepo.py=class AnnexJsonProtocol(WitlessProtocol):
datalad/support/annexrepo.py:            for err_msg in action.pop('error-messages', []):
datalad/support/gitrepo.py=class GitRepo(CoreGitRepo):
datalad/support/gitrepo.py:                    message='\n'.join(r['error-messages'])
datalad/support/gitrepo.py:                    if 'error-messages' in r else None,
datalad/support/tests/test_annexrepo.py=def test_annex_drop(src, dst):
datalad/support/tests/test_annexrepo.py:    # CommandError has to pull the errors from the JSON record 'error-messages'
datalad/support/tests/test_annexrepo.py=def test_error_reporting(path):
datalad/support/tests/test_annexrepo.py:            'error-messages': ['File unknown to git'],
datalad/support/tests/test_annexrepo.py=def test_save_noperms(path):
datalad/support/tests/test_annexrepo.py:        assert_in('permission denied', res[0]['error-messages'][0])

confused, and re-reading spotted the difference in - vs _, so indeed there is error_message but really just few:

$> git grep -p "[\"']error_message[^s]"
datalad/interface/results.py=def get_status_dict(action=None, ds=None, path=None, type=None, logger=None,
datalad/interface/results.py:        d['error_message'] = error_message
datalad/interface/results.py=def annexjson2result(d, ds, **kwargs):
datalad/interface/results.py:        res['error_message'] = '\n'.join(m.strip() for m in d['error-messages'])
datalad/interface/utils.py=def generic_result_renderer(res):
datalad/interface/utils.py:                res['error_message'][0] % res['error_message'][1:]
datalad/interface/utils.py:                if isinstance(res['error_message'], tuple) else res[
datalad/interface/utils.py:                    'error_message']), ac.RED)
datalad/interface/utils.py:            if res.get('error_message', None) and res.get('status', None) != 'ok' else ''))
datalad/support/annexrepo.py=class AnnexRepo(GitRepo, RepoInterface):
datalad/support/annexrepo.py:                rec['error_message'] = '\n'.join(m.strip() for m in j['error-messages'])

and only then, looking at above duplications of conversions from error-messages to error_message that we are talking about duplicated handling of error-messages we receive from direct invocations of git-annex (now spread out through code base instead of handled uniformly within AnnexRepo.

Copy link
Contributor Author

@mslw mslw Apr 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking a closer look! Sorry if I didn't make it clearer about the spelling.

Let me just add that error_messages (underscore / plural) which started this issue, and which this PR changes into error_message, does not exist outside drop at all (below I'm looking at master, but usage in annexrepo has been since removed in maint), therefore I decided to change drop and not the renderer:

❱ git grep -p "[^_format_json_]error_messages"
datalad/distributed/drop.py=def _drop_dataset(ds, paths, what, reckless, recursive, recursion_limit, jobs):
datalad/distributed/drop.py:            error_messages = r.get('error-messages')
datalad/distributed/drop.py:            if error_messages:
datalad/distributed/drop.py:                res['error_messages'] = error_messages
datalad/support/annexrepo.py=class AnnexRepo(GitRepo, RepoInterface):
datalad/support/annexrepo.py:                       for k in j if k != 'file' and k != 'error_messages'})
datalad/support/annexrepo.py:            # change annex' `error_messages` into singular to match result

I think we would benefit from some AnnexRepo.annex_to_datalad_error @staticmethod helper or alike to be reused across all those uses

There is, in a way, annexjson2result in datalad/interface/results.py, which maybe could be reused?

Copy link
Member

@mih mih left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I approval for the current diff. It fixes a real problem, in a non-intrusive way. Many things could be done better, in many place and new issues should be filed. But here is a fix, it works, it is better than before. Nothing should block it for another two weeks.

@mih
Copy link
Member

mih commented Apr 28, 2022

Unless stopped, I will merge tomorrow morning CEST

@mih mih merged commit 3fc0f47 into datalad:maint Apr 29, 2022
@github-actions
Copy link

🚀 PR was released in 0.16.3 🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
released semver-patch Increment the patch version when merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

generic_result_renderer should not ignore error_messages
4 participants