-
Notifications
You must be signed in to change notification settings - Fork 333
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make PagerDuty output idempotent; adds support for responder requests #935
Conversation
errors.append(error) | ||
|
||
# Add a note to the incident | ||
note = self._add_incident_note(incident_id, publication, rule_context) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where do we add this note
to?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This note is a small string of text that is added to the PagerDuty incident. It shows up on their UI
if responders and not isinstance(responders, list): | ||
responders = [responders] | ||
|
||
if responders: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are responders optional? meaning should we return False if not responder
or is that okay?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Responders are optional; it's a new feature so I need to maintain reverse compatibility.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great work on this and thanks a ton for the thorough documentation on the changes
to: @ryandeivert or @chunyong-lin
cc: @airbnb/streamalert-maintainers
Background
The
pagerduty-incident
code used to have a serious bug where, due to the way it would sequence multiple write methods non-atomically, it could result in a situation where a Pagerduty alert is sent but the output fails. This results in the alert retrying, creating another Pagerduty alert, then failing again... repeating ad infinitum.Changes
This PR introduces two concepts:
Reduced number of write calls
PagerDuty's code used to do the following:
POST /incidents
(Create an incident container)POST /events/enqueue
(Create an alert)GET /incidents?dedup_key=???
(Find the alert-incident)PUT /incidents/#/merge?incident_id=???
(Merge the alert-incident into the incident container)I found that this was redundant; the workflow is optimized now:
POST /events/enqueue
(Create an alert)GET /incidents?dedup_key=???
(Find the alert's incident)PUT /incidents
(Modify the alert's incident)Fewer API calls means the code is cleaner and easier to understand. Also it only has
POST
method, which reduces the number of non-idempotent write API calls.Idempotency
PagerDuty's
pagerduty-incident
output is now idempotent. ThePOST /events/enqueue
API callnow leverages a
dedup_key
that is equal to the Alert ID.The usage of a
dedup_key
ensures that any unique StreamAlert alert will only ever create a single PagerDuty alert. Subsequent retries will cause PagerDuty to return the previously created alert.Responder Requests
I added a new PagerDuty feature that allows the Alert to leverage Responder Requests. This is a neat feature that allows you to invite people other than the assignee to join in the PagerDuty.
It shows up like this on the PD UI.
All members of the response team get all notification (push notification/SMS/etc) related to the Alert except escalation notices. These stay with the assignee.
The awesome benefit here is that we can have PagerDuty alerts stick to a single escalation policy, but attach more watchers.
You can add responder requests to an alert using
context
:Testing
CI, Plus I tried
it on Stage A lot. I think
it bothered people