Return value concept #1218

mih · 2017-01-25T07:45:07Z

discuss and redo the return value concept. It seems that we are moving towards a JSON-like concept where we would need to return a list of dicts that contain info on an item and the status/result of an action.

mih · 2017-01-26T06:33:26Z

@yarikoptic favors a system where the return values reflect the end result and not the completed process. That means, for example, that an identical re-invocation of install yields the same return values as the initial one, although on second run nothing was actually installed. This paradigm can be useful to write simple scripts that act upon the return values, and work regardless of the initial state on the filesystem.

The downside of such approach is that it becomes harder to not act on things that aren't new -- a use-case that seems as legitimate (e.g. kick of some processing for any new file in a dataset). Of course this could be done by some form of diff(), but so could the first case. Moreover, it can be rather difficult to anticipate what the results would have been initially. In order to mimic the output of a would-be git-annex get run we would need to (re-implement and ) perform its own tests (e.g. not reporting on files in git, ...)

The status quo is that there is absolutely no consensus. install is doing one thing, in other commands (e.g. add_sibling) we even verify that they do not report anything, if nothing was done.

One way would be a switch to more complex return values that indicate the kind of processing applied, as well as the status. Based on such return values, as result renderer could make a (configurable) decision what kind of return value is desired. We could distinguish between 'direct' results (i.e. ones that correspond to an explicit input argument), 'indirect' results (i.e. ones that happened to also become results, e.g. files that git-annex reports because we asked it to get everything in a directory), and 'none results' (stuff that is there, but wasn't a result of the invoked command, previously existing content, ....).

I am not very confident that such a solution is easy to implement in a consistent fashion, though.

bpoldrack · 2017-01-26T09:25:15Z

Well, I think we can have it all as far as concerns the CLI. If the python API returns detailed JSON-like dicts, we can have consistent entries for each item, indicating whether the item was indirectly addressed, whether it was actually processed, whether the state after the command "fulfills the promise" and so on. At that level we would have all the information. Then we would have (consistent) options for the result renderers to determine, what information will actually be spit out.
ATM I think the consistency depends on whether we find fitting general categories for these options/entries.

yarikoptic · 2017-01-26T16:32:05Z

may be inline with what is suggested above may be not -- what if, somewhat inline with git-annex, we return/generate dicts with consistent keys: object, success, note (optional additional information), performed_actions. e.g. for install

{ 'object': Dataset('path1'),
  'success': True,
  'performed_actions': ['install']
},
{ 'object': Dataset('path2'),
  'success': True,
  'performed_actions': []
},

here in the 2nd case that dataset already existed. although not sure if I would like this to be returned by default, so we might indeed to have it as an option for a call (again -- similar to git-annex's "--json")

bpoldrack · 2017-01-27T03:41:32Z

Basically in line, yes.
I'd just not make it too complicated: Python API always returns that beast and result_renderer takes options (which may include to output the entire thing via --json), to determine what will be the actual output.
Top level implementations therefore don't need to care for these options and return value within python is reliable, while we can have one place to form the output wrt options for CLI.

yarikoptic · 2017-02-10T14:19:24Z

Just a note on existing ActivityStats which is used to summarize activities done by the crawler ATM. It might become extended/used also to present stats for other actions

mih · 2017-02-17T13:55:29Z

Started playing with this. Here is my current concept:

each result is a dict, each command returns a sequence of results
we need to communicate the results to humans and to machines equally well: compose intelligible message and raise proper exceptions/error codes from the output
strictly stick to built-in types to avoid hassle whenever we need to serialize things later on

each result has the following fields:

action: label what was done, e.g. 'update', 'install'
path: absolute path the result is talking about
type: label such as 'dataset', 'file', 'sibling'
status: label to describe the state of the result wrt the desired result. Labels:
- 'OK': exactly what was desired
- 'incomplete': not everything, but something
- 'skipped': action not attempted
- 'error': action attempted and error-ed
'message': any annotation of the status targeting human consumption
'error': label identifying and exception to be raised
this one is tricky, as our exceptions cannot be constructed from a message and a class name alone. Moreover, we would need a new exception type (CompoundException) whenever the type of error is not homogeneous across results -- would complicate things (needlessly?)

Based on this information we would have generic result renderers that could be selected on a case by case basis (for scripting, JSON, debugging, ...). The renderer would also take care of raising the proper error depending in a general "mode" setting (fail on incomplete results, or not).

I am skeptical about having performed_actions = [] within a result dict as suggested above. I would prefer each action to be its own dict, otherwise we would need to implement result merging within each command that uses another one insight and needs to relay results. Initial result order should reflect serial aspects of order of execution. Sorting and merging for the purpose of reporting can be done by renderers.

@bpoldrack @yarikoptic Yes?

mih · 2017-02-20T16:39:25Z

We need to distinguish:

skipped, no need: nothing to do
skipped, prerequisites not met: wanted, but cannot.

bpoldrack · 2017-02-20T17:15:50Z

May be we need a status_detail additionally. This could hold the reason for skipping as well as additional details on 'incomplete' if available.

mih · 2017-02-20T17:57:44Z

It is not about detail, but the distinction of success (even due to a non-action) or failure due to the inability to perform.

…

On Feb 20, 2017 18:15, "Benjamin Poldrack" ***@***.***> wrote: May be we need a status_detail additionally. This could hold the reason for skipping as well as additional details on 'incomplete' if available. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#1218 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAIVH-jjPsWUQqyKWrLMBJCL6V2_sG7wks5recpGgaJpZM4LtL9A> .

bpoldrack · 2017-02-20T18:26:55Z

Well, success evaluation can consider both fields, so this isn't really a con. But I agree - it's different from details on 'incomplete' for example.

mih · 2017-02-21T09:29:49Z

To conclude: status needs four values:

ok: action performed, desired state reached
notneeded: action not attempted, desired state reached already
impossible: action not attempted, prerequisites not met
error: action attempted, desired state not reached

The first two to not lead to an error message/code. The two latter ones do.

mih · 2017-02-21T09:39:12Z

My plan is to introduce a new decorator that implements result inspection, filtering, and "error generation" to gradually introduce this feature for commands that were RF'ed to support it.

bpoldrack · 2017-02-21T09:47:21Z

Sounds good to me.

yarikoptic · 2017-02-21T12:27:29Z

Sorry, missed why incomplete was sacrificed?
It is largely an error, so may be too ease analysis we could prepend all success one with ok_ and all others with error_ or smth like that?

mih · 2017-02-21T12:29:49Z

"incomplete" is something you infer from the list of results. Any occurance of 'impossible', or 'error' implies incomplete.

yarikoptic · 2017-02-21T12:48:35Z

I request one file get, which gets interrupted - how do I infer incomplete? But may be we don't really need those levels of details... Or do we?

mih · 2017-02-21T12:53:14Z

You request one file, you did not get it (status for that get action and this file is error) -> any(status in ('impossible', 'error') -> True -> Incomplete.

…1218

mih · 2017-03-09T16:25:13Z

Closing this now. Much more info and code is in #1350 -- no way back.

mih added the question Issue asks a question rather than reporting a problem label Jan 25, 2017

This was referenced Jan 25, 2017

RF: publish #1203

Merged

get should report which files it failed to get. #1221

Closed

mih added a commit to mih/datalad that referenced this issue Feb 28, 2017

ENH: Rudimentary implementation of return value concept from datalad#…

18f4e10

…1218

mih added a commit to mih/datalad that referenced this issue Mar 1, 2017

ENH: Rudimentary implementation of return value concept from datalad#…

2f7d34e

…1218

mih added a commit to mih/datalad that referenced this issue Mar 1, 2017

ENH: Rudimentary implementation of return value concept from datalad#…

868b108

…1218

mih closed this as completed Mar 9, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Return value concept #1218

Return value concept #1218

mih commented Jan 25, 2017

mih commented Jan 26, 2017

bpoldrack commented Jan 26, 2017

yarikoptic commented Jan 26, 2017

bpoldrack commented Jan 27, 2017

yarikoptic commented Feb 10, 2017

mih commented Feb 17, 2017

mih commented Feb 20, 2017

bpoldrack commented Feb 20, 2017

mih commented Feb 20, 2017 via email

bpoldrack commented Feb 20, 2017

mih commented Feb 21, 2017

mih commented Feb 21, 2017

bpoldrack commented Feb 21, 2017

yarikoptic commented Feb 21, 2017

mih commented Feb 21, 2017

yarikoptic commented Feb 21, 2017

mih commented Feb 21, 2017

mih commented Mar 9, 2017

Return value concept #1218

Return value concept #1218

Comments

mih commented Jan 25, 2017

mih commented Jan 26, 2017

bpoldrack commented Jan 26, 2017

yarikoptic commented Jan 26, 2017

bpoldrack commented Jan 27, 2017

yarikoptic commented Feb 10, 2017

mih commented Feb 17, 2017

mih commented Feb 20, 2017

bpoldrack commented Feb 20, 2017

mih commented Feb 20, 2017 via email

bpoldrack commented Feb 20, 2017

mih commented Feb 21, 2017

mih commented Feb 21, 2017

bpoldrack commented Feb 21, 2017

yarikoptic commented Feb 21, 2017

mih commented Feb 21, 2017

yarikoptic commented Feb 21, 2017

mih commented Feb 21, 2017

mih commented Mar 9, 2017