Multiple AlertConditions proposal #240

Simon-Boyer · 2023-12-07T15:09:27Z

This proposal introduces the capability to compose AlertConditions in an AlertPolicy using and/or logic. This should allow users to define complex conditions composition, which in turn will enable using better SRE practices like multi-window burn-rate alerting

The change is designed to be backward compatible with the current specification.

Simon-Boyer · 2023-12-07T15:16:27Z

Discussion question: while writing this proposal I was not sure if it was best to use and/or logic or if we would prefer a different type. For example something like this:

apiVersion: openslo/v2alpha1
kind: CompositeAlertCondition
metadata:
  name: composite-condition
  displayName: Composite Condition
spec:
  operation: and # can be and/or
  conditions: # inline or using refs
  - alertConditionRef: # Ref to AlertCondition
  - compositeAlertConditionRef: # Ref to another CompositeAlertCondition

I slightly prefer the and/or keyword idea in the original proposal, but I am curious to hear opinions on this.

nieomylnieja · 2024-03-22T13:05:22Z

Coming back to this I think that at this stage having such logic would be an overkill for the spec implementations and thus spec itself. The AND/OR logic can be still achieved with multiple Alert Policies (OR between them) and multiple conditions for a single policy (AND between them).
Maybe we could just start with removing the limit for 1 condition per policy in v2alpha1?

Simon-Boyer · 2024-03-25T20:22:14Z

So taking the initial example in this PR:

expr: (
        job:slo_errors_per_request:ratio_rate1h{job="myjob"} > (14.4*0.001)
      and
        job:slo_errors_per_request:ratio_rate5m{job="myjob"} > (14.4*0.001)
      )
    or
      (
        job:slo_errors_per_request:ratio_rate6h{job="myjob"} > (6*0.001)
      and
        job:slo_errors_per_request:ratio_rate30m{job="myjob"} > (6*0.001)
      )
severity: page

We would translate it like this? This now assumes that multiple conditions are AND between them, which is not the case in the current spec (but is documented to be maximum 1 at the moment, so in theory it should not be conflicting with current tools).

apiVersion: openslo/v1
kind: AlertPolicy
metadata:
  name: High-Burn-Rate-ShortWindow
  displayName: High burn rate
spec:
  conditions:
  - kind: AlertCondition
    spec:
      condition:
        kind: burnrate
        threshold: 14.4
        lookbackWindow: -1h
        alertAfter: 5m
  - kind: AlertCondition
    spec:
      condition:
        kind: burnrate
        threshold: 14.4
        lookbackWindow: -1h
        alertAfter: 1m
---
apiVersion: openslo/v1
kind: AlertPolicy
metadata:
  name: High-Burn-Rate-LongWindow
  displayName: High burn rate
spec:
  conditions:
  - kind: AlertCondition
    spec:
      condition:
        kind: burnrate
        threshold: 6
        lookbackWindow: -6h
        alertAfter: 5m
  - kind: AlertCondition
    spec:
      condition:
        kind: burnrate
        threshold: 6
        lookbackWindow: -30m
        alertAfter: 5m

I do believe this makes sense, and the implementation in go (and most languages) would much simpler than the initial proposal. The only problem I could see is that, if you link both Alert policies to a notification, it would send 2 notifications for the same problem in an OR condition. But I think working on that problem will be easier than trying to fit a full and/or logic as before

fourstepper · 2024-05-29T11:27:10Z

Do you have some proposal for how we would deal with multiple AlertPolicies bound to a single SLO?

I.e. how would we make that OR happen? 😁

fourstepper · 2024-05-29T11:31:34Z

Anyway, that's kind of irrelevant in form of this PR and as @nieomylnieja mentioned, just allowing multiple conditions is already an improvement.

LGTM

fourstepper · 2024-05-29T11:34:39Z

One idea that we have as we are discussing this at work with @Hy3n4 is that we could make a field in the AlertPolicy that would define the operation we are looking to create from some template in the order of the conditions or conditionRefs as defined in the policy.

Something like:

(0 and 1) or (2 and 3)

... using zero-based indexing (we could do index + 1, idc)

What do you think?

weyert · 2024-05-29T16:38:34Z

My original idea was to move the alert conditions to a separate files so they could be referenced in the alert policy. Allowing for reuse of the alert conditions when needed. And then in the policy itself support a limited set of operations, and be able to reference to an alert condition.

But never though you would want to multiple groups of group of conditions. Interesting.

Simon-Boyer added 2 commits December 7, 2023 10:03

multiple alertconditions proposal

b9c9272

use openslo/v2alpha1 apiversion

644d4bf

Simon-Boyer mentioned this pull request Dec 7, 2023

Add multiple conditions with logic for AlertPolicy #218

Open

Simon-Boyer and others added 2 commits December 7, 2023 10:56

yaml formatting

84f667c

Merge branch 'main' into multiple-alert-conditions-218

00147ff

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multiple AlertConditions proposal #240

Multiple AlertConditions proposal #240

Simon-Boyer commented Dec 7, 2023

Simon-Boyer commented Dec 7, 2023

nieomylnieja commented Mar 22, 2024

Simon-Boyer commented Mar 25, 2024 •

edited

Loading

fourstepper commented May 29, 2024

fourstepper commented May 29, 2024

fourstepper commented May 29, 2024

weyert commented May 29, 2024 •

edited

Loading

Multiple AlertConditions proposal #240

Are you sure you want to change the base?

Multiple AlertConditions proposal #240

Conversation

Simon-Boyer commented Dec 7, 2023

Simon-Boyer commented Dec 7, 2023

nieomylnieja commented Mar 22, 2024

Simon-Boyer commented Mar 25, 2024 • edited Loading

fourstepper commented May 29, 2024

fourstepper commented May 29, 2024

fourstepper commented May 29, 2024

weyert commented May 29, 2024 • edited Loading

Simon-Boyer commented Mar 25, 2024 •

edited

Loading

weyert commented May 29, 2024 •

edited

Loading