Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple AlertConditions proposal #240

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

Simon-Boyer
Copy link

Ref: #218

This proposal introduces the capability to compose AlertConditions in an AlertPolicy using and/or logic. This should allow users to define complex conditions composition, which in turn will enable using better SRE practices like multi-window burn-rate alerting

The change is designed to be backward compatible with the current specification.

@Simon-Boyer
Copy link
Author

Discussion question: while writing this proposal I was not sure if it was best to use and/or logic or if we would prefer a different type. For example something like this:

apiVersion: openslo/v2alpha1
kind: CompositeAlertCondition
metadata:
  name: composite-condition
  displayName: Composite Condition
spec:
  operation: and # can be and/or
  conditions: # inline or using refs
  - alertConditionRef: # Ref to AlertCondition
  - compositeAlertConditionRef: # Ref to another CompositeAlertCondition

I slightly prefer the and/or keyword idea in the original proposal, but I am curious to hear opinions on this.

@nieomylnieja
Copy link
Member

Coming back to this I think that at this stage having such logic would be an overkill for the spec implementations and thus spec itself. The AND/OR logic can be still achieved with multiple Alert Policies (OR between them) and multiple conditions for a single policy (AND between them).
Maybe we could just start with removing the limit for 1 condition per policy in v2alpha1?

@Simon-Boyer
Copy link
Author

Simon-Boyer commented Mar 25, 2024

So taking the initial example in this PR:

expr: (
        job:slo_errors_per_request:ratio_rate1h{job="myjob"} > (14.4*0.001)
      and
        job:slo_errors_per_request:ratio_rate5m{job="myjob"} > (14.4*0.001)
      )
    or
      (
        job:slo_errors_per_request:ratio_rate6h{job="myjob"} > (6*0.001)
      and
        job:slo_errors_per_request:ratio_rate30m{job="myjob"} > (6*0.001)
      )
severity: page

We would translate it like this? This now assumes that multiple conditions are AND between them, which is not the case in the current spec (but is documented to be maximum 1 at the moment, so in theory it should not be conflicting with current tools).

apiVersion: openslo/v1
kind: AlertPolicy
metadata:
  name: High-Burn-Rate-ShortWindow
  displayName: High burn rate
spec:
  conditions:
  - kind: AlertCondition
    spec:
      condition:
        kind: burnrate
        threshold: 14.4
        lookbackWindow: -1h
        alertAfter: 5m
  - kind: AlertCondition
    spec:
      condition:
        kind: burnrate
        threshold: 14.4
        lookbackWindow: -1h
        alertAfter: 1m
---
apiVersion: openslo/v1
kind: AlertPolicy
metadata:
  name: High-Burn-Rate-LongWindow
  displayName: High burn rate
spec:
  conditions:
  - kind: AlertCondition
    spec:
      condition:
        kind: burnrate
        threshold: 6
        lookbackWindow: -6h
        alertAfter: 5m
  - kind: AlertCondition
    spec:
      condition:
        kind: burnrate
        threshold: 6
        lookbackWindow: -30m
        alertAfter: 5m

I do believe this makes sense, and the implementation in go (and most languages) would much simpler than the initial proposal. The only problem I could see is that, if you link both Alert policies to a notification, it would send 2 notifications for the same problem in an OR condition. But I think working on that problem will be easier than trying to fit a full and/or logic as before

@fourstepper
Copy link
Collaborator

Do you have some proposal for how we would deal with multiple AlertPolicies bound to a single SLO?

I.e. how would we make that OR happen? 😁

@fourstepper
Copy link
Collaborator

Anyway, that's kind of irrelevant in form of this PR and as @nieomylnieja mentioned, just allowing multiple conditions is already an improvement.

LGTM

@fourstepper
Copy link
Collaborator

One idea that we have as we are discussing this at work with @Hy3n4 is that we could make a field in the AlertPolicy that would define the operation we are looking to create from some template in the order of the conditions or conditionRefs as defined in the policy.

Something like:

(0 and 1) or (2 and 3) 

... using zero-based indexing (we could do index + 1, idc)

What do you think?

@weyert
Copy link
Collaborator

weyert commented May 29, 2024

My original idea was to move the alert conditions to a separate files so they could be referenced in the alert policy. Allowing for reuse of the alert conditions when needed. And then in the policy itself support a limited set of operations, and be able to reference to an alert condition.

But never though you would want to multiple groups of group of conditions. Interesting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants