Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Return value concept #1218

Closed
mih opened this issue Jan 25, 2017 · 18 comments
Closed

Return value concept #1218

mih opened this issue Jan 25, 2017 · 18 comments
Labels
question Issue asks a question rather than reporting a problem

Comments

@mih
Copy link
Member

mih commented Jan 25, 2017

discuss and redo the return value concept. It seems that we are moving towards a JSON-like concept where we would need to return a list of dicts that contain info on an item and the status/result of an action.

@mih mih added the question Issue asks a question rather than reporting a problem label Jan 25, 2017
@mih
Copy link
Member Author

mih commented Jan 26, 2017

@yarikoptic favors a system where the return values reflect the end result and not the completed process. That means, for example, that an identical re-invocation of install yields the same return values as the initial one, although on second run nothing was actually installed. This paradigm can be useful to write simple scripts that act upon the return values, and work regardless of the initial state on the filesystem.

The downside of such approach is that it becomes harder to not act on things that aren't new -- a use-case that seems as legitimate (e.g. kick of some processing for any new file in a dataset). Of course this could be done by some form of diff(), but so could the first case. Moreover, it can be rather difficult to anticipate what the results would have been initially. In order to mimic the output of a would-be git-annex get run we would need to (re-implement and ) perform its own tests (e.g. not reporting on files in git, ...)

The status quo is that there is absolutely no consensus. install is doing one thing, in other commands (e.g. add_sibling) we even verify that they do not report anything, if nothing was done.

One way would be a switch to more complex return values that indicate the kind of processing applied, as well as the status. Based on such return values, as result renderer could make a (configurable) decision what kind of return value is desired. We could distinguish between 'direct' results (i.e. ones that correspond to an explicit input argument), 'indirect' results (i.e. ones that happened to also become results, e.g. files that git-annex reports because we asked it to get everything in a directory), and 'none results' (stuff that is there, but wasn't a result of the invoked command, previously existing content, ....).

I am not very confident that such a solution is easy to implement in a consistent fashion, though.

@bpoldrack
Copy link
Member

Well, I think we can have it all as far as concerns the CLI. If the python API returns detailed JSON-like dicts, we can have consistent entries for each item, indicating whether the item was indirectly addressed, whether it was actually processed, whether the state after the command "fulfills the promise" and so on. At that level we would have all the information. Then we would have (consistent) options for the result renderers to determine, what information will actually be spit out.
ATM I think the consistency depends on whether we find fitting general categories for these options/entries.

@yarikoptic
Copy link
Member

may be inline with what is suggested above may be not -- what if, somewhat inline with git-annex, we return/generate dicts with consistent keys: object, success, note (optional additional information), performed_actions. e.g. for install

{ 'object': Dataset('path1'),
  'success': True,
  'performed_actions': ['install']
},
{ 'object': Dataset('path2'),
  'success': True,
  'performed_actions': []
},

here in the 2nd case that dataset already existed. although not sure if I would like this to be returned by default, so we might indeed to have it as an option for a call (again -- similar to git-annex's "--json")

@bpoldrack
Copy link
Member

Basically in line, yes.
I'd just not make it too complicated: Python API always returns that beast and result_renderer takes options (which may include to output the entire thing via --json), to determine what will be the actual output.
Top level implementations therefore don't need to care for these options and return value within python is reliable, while we can have one place to form the output wrt options for CLI.

@yarikoptic
Copy link
Member

Just a note on existing ActivityStats which is used to summarize activities done by the crawler ATM. It might become extended/used also to present stats for other actions

@mih
Copy link
Member Author

mih commented Feb 17, 2017

Started playing with this. Here is my current concept:

  • each result is a dict, each command returns a sequence of results
  • we need to communicate the results to humans and to machines equally well: compose intelligible message and raise proper exceptions/error codes from the output
  • strictly stick to built-in types to avoid hassle whenever we need to serialize things later on

each result has the following fields:

  • action: label what was done, e.g. 'update', 'install'
  • path: absolute path the result is talking about
  • type: label such as 'dataset', 'file', 'sibling'
  • status: label to describe the state of the result wrt the desired result. Labels:
    • 'OK': exactly what was desired
    • 'incomplete': not everything, but something
    • 'skipped': action not attempted
    • 'error': action attempted and error-ed
  • 'message': any annotation of the status targeting human consumption
  • 'error': label identifying and exception to be raised
    this one is tricky, as our exceptions cannot be constructed from a message and a class name alone. Moreover, we would need a new exception type (CompoundException) whenever the type of error is not homogeneous across results -- would complicate things (needlessly?)

Based on this information we would have generic result renderers that could be selected on a case by case basis (for scripting, JSON, debugging, ...). The renderer would also take care of raising the proper error depending in a general "mode" setting (fail on incomplete results, or not).

I am skeptical about having performed_actions = [] within a result dict as suggested above. I would prefer each action to be its own dict, otherwise we would need to implement result merging within each command that uses another one insight and needs to relay results. Initial result order should reflect serial aspects of order of execution. Sorting and merging for the purpose of reporting can be done by renderers.

@bpoldrack @yarikoptic Yes?

@mih
Copy link
Member Author

mih commented Feb 20, 2017

We need to distinguish:

skipped, no need: nothing to do
skipped, prerequisites not met: wanted, but cannot.

@bpoldrack
Copy link
Member

May be we need a status_detail additionally. This could hold the reason for skipping as well as additional details on 'incomplete' if available.

@mih
Copy link
Member Author

mih commented Feb 20, 2017 via email

@bpoldrack
Copy link
Member

Well, success evaluation can consider both fields, so this isn't really a con. But I agree - it's different from details on 'incomplete' for example.

@mih
Copy link
Member Author

mih commented Feb 21, 2017

To conclude: status needs four values:

  • ok: action performed, desired state reached
  • notneeded: action not attempted, desired state reached already
  • impossible: action not attempted, prerequisites not met
  • error: action attempted, desired state not reached

The first two to not lead to an error message/code. The two latter ones do.

@mih
Copy link
Member Author

mih commented Feb 21, 2017

My plan is to introduce a new decorator that implements result inspection, filtering, and "error generation" to gradually introduce this feature for commands that were RF'ed to support it.

@bpoldrack
Copy link
Member

Sounds good to me.

@yarikoptic
Copy link
Member

Sorry, missed why incomplete was sacrificed?
It is largely an error, so may be too ease analysis we could prepend all success one with ok_ and all others with error_ or smth like that?

@mih
Copy link
Member Author

mih commented Feb 21, 2017

"incomplete" is something you infer from the list of results. Any occurance of 'impossible', or 'error' implies incomplete.

@yarikoptic
Copy link
Member

I request one file get, which gets interrupted - how do I infer incomplete? But may be we don't really need those levels of details... Or do we?

@mih
Copy link
Member Author

mih commented Feb 21, 2017

You request one file, you did not get it (status for that get action and this file is error) -> any(status in ('impossible', 'error') -> True -> Incomplete.

@mih
Copy link
Member Author

mih commented Mar 9, 2017

Closing this now. Much more info and code is in #1350 -- no way back.

@mih mih closed this as completed Mar 9, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Issue asks a question rather than reporting a problem
Projects
None yet
Development

No branches or pull requests

3 participants