Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Policy to rate-limit action executions #3720

Open
armab opened this issue Sep 6, 2017 · 7 comments
Open

Policy to rate-limit action executions #3720

armab opened this issue Sep 6, 2017 · 7 comments

Comments

@armab
Copy link
Member

@armab armab commented Sep 6, 2017

According to https://docs.stackstorm.com/reference/policies.html, we have concurrency and retry policies.

And while concurrency is useful, it's not enough in some cases.

This example comes as a note from SREcon 17 Dublin where we configured demo: Twitter sensor to listen on all #srecon tweets and change LED colors on every tweet.
> Click here for photo
The problem is that sometimes there were too many tweets/per second and we definitely didn't want to change the LED colors with such a high rate. Here limiting executions/per time period could be very useful.

The proposal is to implement rate-limiting policy so we could limit the action executions based on time executions/second, executions/time interval and cancel everything that exceeded the rate limit window.

Proposed policy example

---
name: rate-limit-http-downloads
description: > Run max 3 'core.http' actions/second
               and cancel everything that exceeds the limit
enabled: true
resource_ref: core.http
# alternative: 
# policy_type: action.limit
policy_type: action.rate_limit
parameters:
    # maximum number of executions per time interval window
    threshold: 3
    # 100ms, 1s, 315s, 15m, 1h
    interval: 1s
    # cancel actions exceeding the 3 actions/s limit
    action: cancel

We'll need additionally to filter/limit the executions by input arguments, same as https://docs.stackstorm.com/reference/policies.html#action-concurrency-attr

I believe this is typical and pragmatic instrument that might be very popular in a wild.

@nmaludy

This comment has been minimized.

Copy link
Contributor

@nmaludy nmaludy commented Sep 6, 2017

+1 I've fielded several questions on Slack regard this feature request. Most of the use cases revolved around monitoring and auto-remediation. Example, a user wants to try and remedy a problem of a disk filling up, however the alert is flapping and injecting events into StackStorm at a high rate. In this case the programmer only wanted to react to the event if it was unique within the last X seconds. I'm probably over-simplifying the explanation here, but the general idea is the same.

@Kami

This comment has been minimized.

Copy link
Member

@Kami Kami commented Sep 6, 2017

I agree that something like that is needed, but that's actually quite a complex problem and a solution is non-trivial.

Imo, the best solution for that probably is event aggregation and triggering rules based on the aggregated events.

@lakshmi-kannan and I discussed something like that recently and we don't really see a simple solution. Right now events are stateless and introducing support for aggregate events would make them stateful which introduces many unique challenges (now we need to persist a count / time based window of events in memory and in some persistent storage, etc.).

Having said that, that's quite a common problem in the monitoring world (only alert on state changes and aggregate alerts) so we should draw some inspiration from there.

In fact, @lakshmi-kannan and I worked on Rackspace Monitoring product in the past which aimed to solve those problems. There we used a project called Esper which handled windowing and event-correlation for us.

We should also look into open-source projects which we can leverage for windowing and event-correlation because implementing that ourselves correctly is non-trivial in a distributed system.

@Kami

This comment has been minimized.

Copy link
Member

@Kami Kami commented Sep 6, 2017

In short - I think we should move / expand this into into a wider "Event correlation and aggregate events" issue / project.

@ananthaa-advisory

This comment has been minimized.

Copy link

@ananthaa-advisory ananthaa-advisory commented Sep 13, 2017

Yes,
I have Icinga monitoring set up with Slack and notifications to slack are missing some alerts.

Just came to know about rate limit policy with Slack and found no solution.

@safarsi

This comment has been minimized.

Copy link

@safarsi safarsi commented Jul 24, 2019

I have this logic implemented. I'd like to run it by someone more experienced and get some help to make a PR to StackStorm project if the solution is acceptable.

@armab

This comment has been minimized.

Copy link
Member Author

@armab armab commented Jul 24, 2019

@safarsi Are we talking about implementation as StackStorm policy within st2 core? https://docs.stackstorm.com/reference/policies.html

If yes, we'd be happy to accept a pull request.

@safarsi

This comment has been minimized.

Copy link

@safarsi safarsi commented Jul 24, 2019

@armab I probably have to change the way it's implemented to be able to include it in the policy logic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
5 participants
You can’t perform that action at this time.