Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Impose rate limits for task starts as pool feature #15082

Open
BeatlesMD opened this issue Mar 30, 2021 · 4 comments
Open

Impose rate limits for task starts as pool feature #15082

BeatlesMD opened this issue Mar 30, 2021 · 4 comments
Labels
area:Scheduler Scheduler or dag parsing Issues kind:feature Feature Requests

Comments

@BeatlesMD
Copy link

Description

As a pool feature, queue task starts such that task initiation is distributed over time according to a sliding window rate limit (may be more easily implemented as task initiation cool down within a pool).

Use case / motivation

APIs commonly will impose certain techniques to limit rate of requests (sliding window, fixed window, token bucket, leaky bucket, etc). While task retries may resolve the issue, all failures could potentially be avoided if there was a feature to match the endpoint's programatic behavior (I suggest sliding window as it has other benefits).

A sliding window rate limiter could also be used to stagger task/request initiation to a legacy system. There are a number of reasons why you may want to stagger requests to a legacy system, such as if the beginning portion of a request is the most resource intensive within the foreign system, or if the legacy system itself does not itself provide its own rate-limiting signals.

It may be more easily implemented as a cool down between a pooled task initiation and the next queued task start, but figured I'd frame the feature request to match other rate-limiting strategies and techniques commonly seen.

Related Issues

Potentially #8789 ?

@BeatlesMD BeatlesMD added the kind:feature Feature Requests label Mar 30, 2021
@boring-cyborg
Copy link

boring-cyborg bot commented Mar 30, 2021

Thanks for opening your first issue here! Be sure to follow the issue template!

@vikramkoka vikramkoka added the area:Scheduler Scheduler or dag parsing Issues label May 4, 2021
@rubenbriones
Copy link
Contributor

That would be really nice. I have multiple ETLs that scrape data from Free API that have a max rate limit requests by minute, and I haven't figure a way to implement this rate limit logic in Airflow.

I'm using Dynamic Task Mapping for generating 1000 tasks (to scrape 1000 different items data, from the same API), but i want to schedule them in a way that the rate limit is not surpassed.

Maybe do you think it is possible to implement this with a custom Deferred HttpOperator, that before sending the HTTP requests checks (via XCom) how many requests have been sent over the last X minutes to the http_conn_id requested? That new custom operator should update the XCom cache after making new requests. ¿That makes sense?

@potiuk
Copy link
Member

potiuk commented Jan 7, 2023

The problem (and difficulty) with implementing this one is that you need some central service that would coordinate the rate limits across mutliple parallel running tasks from - potentially - multiple nodes running such client code.

Implementing a time-sliding window with request rate by minute (or another time period is not something that could be easily done in "generic" way.

There are some projects that implement some generic "services" that can provide such capabilities (Global Distributed Client Side Rate Limiting) - for example https://github.com/youtube/doorman - and you might see there complexities involved in implementing such a solution.

@potiuk
Copy link
Member

potiuk commented Jan 7, 2023

But likely yeah, you can implement a "poor-man's version" described as you mentioned (but expect some problems - for example potentia starvation of some tasks) that you will have to deal with. Those aren't easy things to implement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:Scheduler Scheduler or dag parsing Issues kind:feature Feature Requests
Projects
None yet
Development

No branches or pull requests

4 participants