Fine grained error hanlding #141

bcoca · 2018-08-10T20:24:08Z

Proposal: on_error

Author:Brian Coca <@bcoca> IRC: bcoca

Date: 2018-08-88

Status: New
Proposal type: core design
Targeted release: future release
Estimated time to implement: weeks

Motivation

People demand more error handling than just 'task status'.

Problems

No good way to handle connection errors
undefined handling requires many '|default' uses
no way to handle syntax errors that occur at runtime

Solution proposal

New on_error keyword, that has a set of subkeys to indicate how to handle each type of errors depending on assigned values.

The default behaviour stays as it currently is (TODO: document each case in single spot).

    - command: /bin/{{ false }}
      on_error:
          task: ignore
          connection: fail_host
          undefined: ignore
          syntax: fail_play

keys

task: what to do when we encounter task failures, aka 'failed' result
connection: how to handle any connection errors
undefined: how to handle any 'undefined' var error from templating
syntax: how to handle any runtime sytnax error

values

ignore: leave task status as is, but keep host in play
fail_host: leave task status as is, set host status as failed, which removes it from rest of play execution
fail_task: set task status to 'failed', which also sets the host as failed and removes from rest of play execution
fail_play: end play now, for all hosts/tasks
fail_playbook: end play now and stop all playbook execution (in case there are subsequent plays that would run)

Optional

fail_hard: fail the task and host and DO NOT run any rescue attempts

Additional

Once this is working, deprecate ignore_errors as it really means 'ignore task failed status' and not 'all errors'. The new on_error: task: ignore provides the same function

The text was updated successfully, but these errors were encountered:

alikins · 2018-08-15T18:22:15Z

I'd like to see a variation of that that lets you hook up a 'error_handler' for a block/task etc ('notify_error' more or less). Would be especially useful if you could pass args to handlers. And if you could order notify_error by the order the notifies are listed in the block

ansible/ansible@devel...alikins:task_events was as close as I got to that. ie, not very far, but the goal was to add specific handler "callbacks"[1] for things like "task_result_conn_fail" event (the code in task_events at the moment only implements a generic "append_results" event callback that adds the task to the queue like normal. In theory, something like a connection error event callback could things like "retry" (requeue the task), or add a event to the queue that will get turned into an error 'notify'.

The worker could add any extra (serializeable) context or data it has to the TaskEvent it creates. In the above case, that info could make it's way to a 'error notify' handler for example. Could also implement partial updates this way by having worker emit 'update task events' that get handled.

bcoca · 2018-08-15T18:47:16Z

@alikins as much as i think such a system can be useful, i think it requires a programmer to understand how to use it.

The above is made with the knowledge that it does increase complexity for the play author but tries to keep simplicity and easy auditing in mind so non programmers can still 'read what happens'.

jwalzer · 2019-09-20T09:13:19Z

I would suggest having one additional optional Key all that serves as a shortcut for all the tasks. This enables users who didn't care before on specific error-types to stay as unspecific and simply "ignore-all" if its simply an unimportant task.
This will help adoption of the feature

JonRudolph · 2021-03-12T18:14:47Z

Bump
The "task: ignore" functionality is needed. We run playbooks to audit our environment. Check mode is lacking so we've had to write entirely new playbooks / roles to audit systems after the initial config. Ideally, the recap would show that x number of tasks failed but execution of tasks would not cease after that failure. "ignore_errors: true" kind of works but the recap does shows ignored instead of failed. Additionally, when using tower, you cannot trigger a notification on ignored, only success or failure. So we need it to fail without ceasing execution on that host. I realize this would be dangerous if you don't understand the implications are but the functionality is needed.

bcoca · 2021-03-12T19:49:54Z

@JonRudolph I had thought of an 'on_fail: summary' function or the like for those cases.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fine grained error hanlding #141

Fine grained error hanlding #141

bcoca commented Aug 10, 2018 •

edited

Loading

alikins commented Aug 15, 2018

bcoca commented Aug 15, 2018

jwalzer commented Sep 20, 2019

JonRudolph commented Mar 12, 2021

bcoca commented Mar 12, 2021

Fine grained error hanlding #141

Fine grained error hanlding #141

Comments

bcoca commented Aug 10, 2018 • edited Loading

Proposal: on_error

Motivation

Problems

Solution proposal

keys

values

Optional

Additional

alikins commented Aug 15, 2018

bcoca commented Aug 15, 2018

jwalzer commented Sep 20, 2019

JonRudolph commented Mar 12, 2021

bcoca commented Mar 12, 2021

bcoca commented Aug 10, 2018 •

edited

Loading