Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vmalert: detect alerting rules which don't match any series at all #4039

Closed
hagen1778 opened this issue Mar 30, 2023 · 3 comments
Closed

vmalert: detect alerting rules which don't match any series at all #4039

hagen1778 opened this issue Mar 30, 2023 · 3 comments
Labels
enhancement New feature or request vmalert

Comments

@hagen1778
Copy link
Collaborator

hagen1778 commented Mar 30, 2023

Is your feature request related to a problem? Please describe

Alerting rules, in most of the cases are defined with some condition in expression. For example:

  • increase(http_requests_errors_total) > 0
  • changes(app_uptime_seconds) > 2

If the expression returns something in response, the alert will be triggered. If no response is received, then the alert remains inactive - it is assumed conditions in expression aren't met.

However, there is no way to tell whether it is comparison condition wasn't met or it is series selector returned nothing. It is very easy to make a typo in http_reqeusts_errors_total and such an expression will always return nothing.

Describe the solution you'd like

It would be great if vmalert could differentiate between expressions that do not match any series at all and expressions that do not meet conditions. With recently enhanced response from VM HTTP API it becomes possible - #3925

What can be done:

  1. vmalert can start parsing new field seriesFetched in response for alerting rules. vmalert shouldn't complain if the field is missing to remain compliant with Prometheus or older versions of VM.
  2. The value from this field can be used for the following enhancements:
    2.1. expose seriesFetched as a metric per alerting rule, in the same fashion as vmalert_alerting_rules_last_evaluation_samples. Such a metric can then be used for alerting purposes. For example, vmalert_alerting_rules_last_evaluation_series_fetched == 0 will identify rules which do not match any series.
    2.2. expose seriesFetched in vmalert's UI next to alerting rule. Rules with seriesFetched==0 can be marked as yellow/warn rules which need to be taken care of. UI should contain an explanation why the rule is marked so and what can be done.
  3. Changes should be consistent with alert's state object and object representation via API.
  4. The improvement is relevant for alerting rules mostly. Recording rules are expected to always produce samples in response, so "bad" recording rules can be easily identified via vmalert_recording_rules_last_evaluation_samples metric.

Describe alternatives you've considered

Tools like https://github.com/cloudflare/pint already do such a validation by running alerting rules without conditions. However, such checks are done only periodically: before applying the rule or when making changes to the rule. If the metric changes in a runtime - the problem will remain undetected.
Using https://github.com/cloudflare/pint is also an extra step to do. It would be great if such a feature was built-in in the ruler.

Additional information

No response

@hagen1778 hagen1778 added enhancement New feature or request vmalert labels Mar 30, 2023
@Haleygo
Copy link
Collaborator

Haleygo commented Mar 31, 2023

@hagen1778 I would love to take this task if you want :)

Will add Stats when parsePrometheusResponse and store it in latest ruleStateEntry, is it ok with you?
image
image

hagen1778 added a commit that referenced this issue Apr 25, 2023
vmalert starts to understand /query responses which contain object:
```
"stats":{"seriesFetched": "42"}
```
If object is present, vmalert parses it and populates a new field
`SeriesFetched`. This field is then used to populate the new metric
`vmalert_alerting_rules_last_evaluation_series_fetched` and to
display warnings in the vmalert's UI.

If response doesn't contain the new object (Prometheus or
VictoriaMetrics earlier than v1.90), then `SeriesFetched=nil`.
In this case, UI will contain no additional warnings.
And `vmalert_alerting_rules_last_evaluation_series_fetched` will
be set to `-1`. Negative value of the metric will help to compile
correct alerting rule in follow-up.

Thanks for the initial implementation to @Haleygo
See #4056

See #4039

Signed-off-by: hagen1778 <roman@victoriametrics.com>
hagen1778 added a commit that referenced this issue Apr 26, 2023
vmalert starts to understand /query responses which contain object:
```
"stats":{"seriesFetched": "42"}
```
If object is present, vmalert parses it and populates a new field
`SeriesFetched`. This field is then used to populate the new metric
`vmalert_alerting_rules_last_evaluation_series_fetched` and to
display warnings in the vmalert's UI.

If response doesn't contain the new object (Prometheus or
VictoriaMetrics earlier than v1.90), then `SeriesFetched=nil`.
In this case, UI will contain no additional warnings.
And `vmalert_alerting_rules_last_evaluation_series_fetched` will
be set to `-1`. Negative value of the metric will help to compile
correct alerting rule in follow-up.

Thanks for the initial implementation to @Haleygo
See #4056

See #4039

Signed-off-by: hagen1778 <roman@victoriametrics.com>
hagen1778 added a commit that referenced this issue Apr 26, 2023
vmalert starts to understand /query responses which contain object:
```
"stats":{"seriesFetched": "42"}
```
If object is present, vmalert parses it and populates a new field
`SeriesFetched`. This field is then used to populate the new metric
`vmalert_alerting_rules_last_evaluation_series_fetched` and to
display warnings in the vmalert's UI.

If response doesn't contain the new object (Prometheus or
VictoriaMetrics earlier than v1.90), then `SeriesFetched=nil`.
In this case, UI will contain no additional warnings.
And `vmalert_alerting_rules_last_evaluation_series_fetched` will
be set to `-1`. Negative value of the metric will help to compile
correct alerting rule in follow-up.

Thanks for the initial implementation to @Haleygo
See #4056

See #4039

Signed-off-by: hagen1778 <roman@victoriametrics.com>
hagen1778 added a commit that referenced this issue May 5, 2023
vmalert starts to understand /query responses which contain object:
```
"stats":{"seriesFetched": "42"}
```
If object is present, vmalert parses it and populates a new field
`SeriesFetched`. This field is then used to populate the new metric
`vmalert_alerting_rules_last_evaluation_series_fetched` and to
display warnings in the vmalert's UI.

If response doesn't contain the new object (Prometheus or
VictoriaMetrics earlier than v1.90), then `SeriesFetched=nil`.
In this case, UI will contain no additional warnings.
And `vmalert_alerting_rules_last_evaluation_series_fetched` will
be set to `-1`. Negative value of the metric will help to compile
correct alerting rule in follow-up.

Thanks for the initial implementation to @Haleygo
See #4056

See #4039

Signed-off-by: hagen1778 <roman@victoriametrics.com>
hagen1778 added a commit that referenced this issue May 8, 2023
#4198)

app/vmalert: detect alerting rules which don't match any series at all

vmalert starts to understand /query responses which contain object:
```
"stats":{"seriesFetched": "42"}
```
If object is present, vmalert parses it and populates a new field
`SeriesFetched`. This field is then used to populate the new metric
`vmalert_alerting_rules_last_evaluation_series_fetched` and to
display warnings in the vmalert's UI.

If response doesn't contain the new object (Prometheus or
VictoriaMetrics earlier than v1.90), then `SeriesFetched=nil`.
In this case, UI will contain no additional warnings.
And `vmalert_alerting_rules_last_evaluation_series_fetched` will
be set to `-1`. Negative value of the metric will help to compile
correct alerting rule in follow-up.

Thanks for the initial implementation to @Haleygo
See #4056

See #4039

Signed-off-by: hagen1778 <roman@victoriametrics.com>
@valyala
Copy link
Collaborator

valyala commented May 10, 2023

The feature has been implemented in the commit 5b8450f . See these docs for details. This feature will be included in the next release. In the mean time it is possible to build vmalert from this commit according to these instructions and verify whether the feature works as intended.

valyala pushed a commit that referenced this issue May 10, 2023
#4198)

app/vmalert: detect alerting rules which don't match any series at all

vmalert starts to understand /query responses which contain object:
```
"stats":{"seriesFetched": "42"}
```
If object is present, vmalert parses it and populates a new field
`SeriesFetched`. This field is then used to populate the new metric
`vmalert_alerting_rules_last_evaluation_series_fetched` and to
display warnings in the vmalert's UI.

If response doesn't contain the new object (Prometheus or
VictoriaMetrics earlier than v1.90), then `SeriesFetched=nil`.
In this case, UI will contain no additional warnings.
And `vmalert_alerting_rules_last_evaluation_series_fetched` will
be set to `-1`. Negative value of the metric will help to compile
correct alerting rule in follow-up.

Thanks for the initial implementation to @Haleygo
See #4056

See #4039

Signed-off-by: hagen1778 <roman@victoriametrics.com>
@valyala
Copy link
Collaborator

valyala commented May 18, 2023

FYI, vmalert exposes vmalert_alerting_rules_last_evaluation_series_fetched metric for probably incorrect alerting queries, which match zero time series, starting from v1.91.0. See these docs for details.

Closing the feature request as done.

@valyala valyala closed this as completed May 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request vmalert
Projects
None yet
Development

No branches or pull requests

3 participants