Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fine grained error hanlding #141

Open
bcoca opened this issue Aug 10, 2018 · 5 comments
Open

Fine grained error hanlding #141

bcoca opened this issue Aug 10, 2018 · 5 comments

Comments

@bcoca
Copy link
Member

bcoca commented Aug 10, 2018

Proposal: on_error

Author:Brian Coca <@bcoca> IRC: bcoca

Date: 2018-08-88

  • Status: New
  • Proposal type: core design
  • Targeted release: future release
  • Estimated time to implement: weeks

Motivation

People demand more error handling than just 'task status'.

Problems

  • No good way to handle connection errors
  • undefined handling requires many '|default' uses
  • no way to handle syntax errors that occur at runtime

Solution proposal

New on_error keyword, that has a set of subkeys to indicate how to handle each type of errors depending on assigned values.

The default behaviour stays as it currently is (TODO: document each case in single spot).

    - command: /bin/{{ false }}
      on_error:
          task: ignore
          connection: fail_host
          undefined: ignore
          syntax: fail_play

keys

  • task: what to do when we encounter task failures, aka 'failed' result
  • connection: how to handle any connection errors
  • undefined: how to handle any 'undefined' var error from templating
  • syntax: how to handle any runtime sytnax error

values

  • ignore: leave task status as is, but keep host in play
  • fail_host: leave task status as is, set host status as failed, which removes it from rest of play execution
  • fail_task: set task status to 'failed', which also sets the host as failed and removes from rest of play execution
  • fail_play: end play now, for all hosts/tasks
  • fail_playbook: end play now and stop all playbook execution (in case there are subsequent plays that would run)

Optional

  • fail_hard: fail the task and host and DO NOT run any rescue attempts

Additional

Once this is working, deprecate ignore_errors as it really means 'ignore task failed status' and not 'all errors'. The new on_error: task: ignore provides the same function

@alikins
Copy link

alikins commented Aug 15, 2018

I'd like to see a variation of that that lets you hook up a 'error_handler' for a block/task etc ('notify_error' more or less). Would be especially useful if you could pass args to handlers. And if you could order notify_error by the order the notifies are listed in the block

ansible/ansible@devel...alikins:task_events was as close as I got to that. ie, not very far, but the goal was to add specific handler "callbacks"[1] for things like "task_result_conn_fail" event (the code in task_events at the moment only implements a generic "append_results" event callback that adds the task to the queue like normal. In theory, something like a connection error event callback could things like "retry" (requeue the task), or add a event to the queue that will get turned into an error 'notify'.

The worker could add any extra (serializeable) context or data it has to the TaskEvent it creates. In the above case, that info could make it's way to a 'error notify' handler for example. Could also implement partial updates this way by having worker emit 'update task events' that get handled.

@bcoca
Copy link
Member Author

bcoca commented Aug 15, 2018

@alikins as much as i think such a system can be useful, i think it requires a programmer to understand how to use it.

The above is made with the knowledge that it does increase complexity for the play author but tries to keep simplicity and easy auditing in mind so non programmers can still 'read what happens'.

@jwalzer
Copy link

jwalzer commented Sep 20, 2019

I would suggest having one additional optional Key all that serves as a shortcut for all the tasks. This enables users who didn't care before on specific error-types to stay as unspecific and simply "ignore-all" if its simply an unimportant task.
This will help adoption of the feature

@JonRudolph
Copy link

Bump
The "task: ignore" functionality is needed. We run playbooks to audit our environment. Check mode is lacking so we've had to write entirely new playbooks / roles to audit systems after the initial config. Ideally, the recap would show that x number of tasks failed but execution of tasks would not cease after that failure. "ignore_errors: true" kind of works but the recap does shows ignored instead of failed. Additionally, when using tower, you cannot trigger a notification on ignored, only success or failure. So we need it to fail without ceasing execution on that host. I realize this would be dangerous if you don't understand the implications are but the functionality is needed.

@bcoca
Copy link
Member Author

bcoca commented Mar 12, 2021

@JonRudolph I had thought of an 'on_fail: summary' function or the like for those cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants