You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
At the moment, we have our SLIs (see Datadog SLI Dashboard) being displayed in a monitor in the office, but this does not necessarily mean that the operators (Test Pilot Pair) end up acting on problems that happen there, as the pair would need to keep looking at it.
By tying those SLIs with PagerDuty, we're then able to treat degradations in those numbers as triggers for action.
To better reflect the fact that an alert is generated, we should also update the way that the colors are set there (such that it's immediate for someone to see that there's an active incident).
In terms of "which thresholds to use", let's start with an SLO of daily 95 (which means 1h12m of downtime), and adjust that accordingly.
Acceptance criteria
With a daily SLI being hurt, a page is generated to PagerDuty
Have Datadog colors representing our objectives.
Thanks!
The text was updated successfully, but these errors were encountered:
Hey,
At the moment, we have our SLIs (see Datadog SLI Dashboard) being displayed in a monitor in the office, but this does not necessarily mean that the operators (Test Pilot Pair) end up acting on problems that happen there, as the pair would need to keep looking at it.
By tying those SLIs with PagerDuty, we're then able to treat degradations in those numbers as triggers for action.
To better reflect the fact that an alert is generated, we should also update the way that the colors are set there (such that it's immediate for someone to see that there's an active incident).
In terms of "which thresholds to use", let's start with an SLO of daily 95 (which means 1h12m of downtime), and adjust that accordingly.
Acceptance criteria
Thanks!
The text was updated successfully, but these errors were encountered: