Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat. request: granular silence for <rule>.<value of query_key> #2777

Open
naseemkullah opened this issue May 1, 2020 · 11 comments
Open

feat. request: granular silence for <rule>.<value of query_key> #2777

naseemkullah opened this issue May 1, 2020 · 11 comments

Comments

@naseemkullah
Copy link

naseemkullah commented May 1, 2020

Hello,

we have been very happily using ElastAlert in our production kubernetes environment for almost two years.

We've been very basic with our approach, and have a catch all rule to alert whenever there is a structured log emitted with the level: error key value.

Recently an issue where noise emitted by a certain workload has led us to look into silencing alerts.

As per https://elastalert.readthedocs.io/en/latest/elastalert_status.html#silence it does not appear that match_body.x.y.z can be used as a parameter to target silences.

Could this please be confirmed? Would the approach then be to break up our catch all level: error alert into many granular alerts based on other fields of the emitted structured logs?

For reference our catch all error, which requires granular silencing (via match_body fields) is:

    name: "Error"
    type: any
    index: "*"
    filter:
    - term:
        level: "error"
    alert:
    - "slack"

An example of how an alert could (ideally, if possible) be targetted for silence:

image

@JasperJuergensen
Copy link

When you have a query_key in the rule silencing can be done based on the values of this query_key. For example if you have the query_key kubernetes.labels.app_kubernetes_io/component you can silence the rule only for a specific value in this field. Thius happens for the realert timeframe. If you want to do this by hand, you have to add a document to the elasticsearch silence index manually because this is not supported by the --silence option. The silence_key (in the elasticsearch document saved in the rule_name field) is then <rule_name>.<query_key_value>.

@naseemkullah
Copy link
Author

naseemkullah commented May 4, 2020

Thanks @JasperJuergensen!

With regards to this, I realized our logger does emit a field name (less verbose than using a kubernetes label metadatum such as kubernetes.labels.app_kubernetes_io/component) for the instrumented service which I will leverage as follows in the catch-all rule:

    name: "Error"
    type: any
    index: "*"
    filter:
    - term:
        level: "error"
    query_key: name
    alert:
    - "slack"

I will then confirm that I can granularly silence alert depending on the value of their name field at which point I will resolve this issue.

@naseemkullah
Copy link
Author

naseemkullah commented May 4, 2020

If you want to do this by hand, you have to add a document to the elasticsearch silence index manually because this is not supported by the --silence option. The silence_key (in the elasticsearch document saved in the rule_name field) is then <rule_name>.<query_key_value>.

How does one add a document manually?
Would it be feasible to allow --silence to work with <rule_name>.<query_key_value> ?

@JasperJuergensen
Copy link

To add a document manually you can use the Index API from elasticsearch (https://www.elastic.co/guide/en/elasticsearch/reference/current/getting-started-index.html). The document should have to following format:

{
    "exponent": 0,
    "rule_name": "<rule_name>.<query_key_value>",
    "@timestamp": "<current datetime>",
    "until": "<silence until timestamp>",
}

I think it would be a good feature to add the possibility to enter a query_key_value to the --silence option.

JasperJuergensen added a commit to JasperJuergensen/elastalert that referenced this issue May 5, 2020
This enables the user to silence a rule only for a specific query_key value and
not only the whole rule.
See elastalert Issue: Yelp#2777
@naseemkullah
Copy link
Author

Thanks @JasperJuergensen.
Realert + exponential realert on a per qk basis is worlds better than what we had prior. I've also added the instruction for manual doc creation to explicitly silence one rule.qk just to our wiki just in case (if this is worth adding to ElastAlert docs please let me know).

I see that you are already working on qk based silences! 🚀

@naseemkullah
Copy link
Author

Oops accidentally closed, will reopen in case this is used for tracking silence qk feature.

@naseemkullah naseemkullah reopened this May 5, 2020
@naseemkullah
Copy link
Author

naseemkullah commented May 5, 2020

Re: silence qk feature,

--silence_qk_value=foo will have to be used in combination with --rule which points to the rules yaml file?

Would this UX be feasible?

--silence_rule_qk(or a better name)=<rule>.<qk>

@naseemkullah naseemkullah changed the title Q: Silence based on a match_body field? feat. request: granular silence for <rule>.<value of query_key> May 5, 2020
@JasperJuergensen
Copy link

--silence_qk_value=foo will have to be used in combination with --rule which points to the rules yaml file?

Yes

Would this UX be feasible?

`--silence_rule_qk(or a better name)=.

I think this would make it a bit more complicated because internally the --rule option is needed to load the correct rule (and only this rule). Also the rule name can be very long with space while the filename can be autocompleted by the shell :D. And if there is only the --silence_rule_qk option neither the rule name nor the query key can contain a dot because otherwise you can't distinguish rule and qk.

@naseemkullah
Copy link
Author

naseemkullah commented May 7, 2020

Sounds good, thanks for explaining.
Another UX question:

If you have multiple rules in your yaml file, and want to silence only a subset (or even just one) of these rules for a given qk, but leave the other rules unsilenced for a given qk, would this be possible?

@JasperJuergensen
Copy link

If you have multiple rules in your yaml file, and want to silence only a subset (or even just one) of these rules for a given qk, but leave the other rules unsilenced for a given qk, would this be possible?

I don't think you can have multiple rules in one yaml file. At least not with the current FileRulesloader implementation. And in general the --rule option is not only the rule file name, but a rule identifier for the used rules loader. So the --rule option should always identify only one rule, which should prevent this problem from arising in the first place.

@naseemkullah
Copy link
Author

Thanks @JasperJuergensen my impression that rules were bunched into the same yaml was wrong and due to my lack of knowledge of elastalert internals, just confirmed (I set up 2 rules):

/opt/rules # ls
error.yaml  fatal.yaml

Thanks. Looking forward to qk silence feature! 🚀

JasperJuergensen added a commit to JasperJuergensen/elastalert that referenced this issue May 10, 2020
This enables the user to silence a rule only for a specific query_key value and
not only the whole rule.
See Issue Yelp#2777
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants