Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

new custodian flag / notify_on_failure #3760

Closed
wants to merge 3 commits into from

Conversation

anovis
Copy link
Contributor

@anovis anovis commented Apr 2, 2019

Allow users to specify a notify action that would get triggered on an exception while running a policy. Currently if there is a failure there is no way to still run any actions. Based partly on the discussion #3531 where I agree that since we don't know why the error arose it is best not to allow the user to run any action, but notify seems to be a reasonable response in most cases.

example policy would look like this

policies:
  - name: my-first-policy
    max-resources: 1
    notify-on-failure:
      type: 'notify'
      to: ['me']
      transport:
        type: sqs
        queue: https://sqs.us-east-1.amazonaws.com/12345678/test
        region: us-east-1
    resource: sns

@kapilt
Copy link
Collaborator

kapilt commented Apr 2, 2019

not really liking the syntax. There are already metrics for failures, that can be utilized for this use case. ie. set an alert on those if you want to be notified.

@anovis
Copy link
Contributor Author

anovis commented Apr 2, 2019

ah so in that case you would have external lambdas monitoring the cloudwatch metrics for errors and then send notifications. The use case I was thinking of is when an exception occurred (ie max resource limit or delete action didn't have permission) and the policy also had a notify action. In this case it would just error out where you would want to still send out a notify action. With the notify_on_failure the event passed to notify would be the exception that was triggered instead of the original event

@kapilt
Copy link
Collaborator

kapilt commented Apr 2, 2019

cloud watch alarms handle metric alerts, ie. you don't even need a lambda. re notify content, the reality is that your always going to want to go look at the actual logs / resource outputs to make sense of an error.

@anovis
Copy link
Contributor Author

anovis commented Apr 2, 2019

ok thanks. yeah that makes sense. ill go ahead an close this then. appreciate the comments!

@anovis anovis closed this Apr 2, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants