Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Concurrency control retry scheduling is very imprecise #1278

Open
bensheldon opened this issue Mar 9, 2024 · 0 comments
Open

Concurrency control retry scheduling is very imprecise #1278

bensheldon opened this issue Mar 9, 2024 · 0 comments

Comments

@bensheldon
Copy link
Owner

Concurrency Controls currently use retry_on's wait: polynomially_longer which calculates a longer and longer wait based on the number of previous execution attempts. This is probably not the best algorithm for the purpose because it means successive retry attempts get flung farther and farther into the future.

I was recently helping a developer who had enqueued several thousand jobs with a concurrency control of 4. Even when using the reschedule option, the next concurrency miss/retry would still schedule jobs into next week rather than starting over.

I think it can be smarter, though at the cost of either more database queries (trying to make a more precise guess) or putting more pressure on the queue by retrying at a simpler fixed rate.

My other thought on this, which I'll note here though maybe deserves it's own issue is it would be nice if concurrency controls (and throttling too) maintained queue order. I'm thinking it could be accomplished by adding another column to make a distinction between "when we want to try to attempt to run this again and check the concurrency conditions" (the current scheduled_at, which is what determines which job is dequeued next) and a separate "when we desired/hoped this was run at some point in the past but the concurrency conditions weren't met (which would be a new column that wouldn't be updated when the job is retried for a concurrency miss). That column would then be added to the concurrency checking logic to be like "there is enough concurrency and also this job is at the head of the line to run next"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Inbox
Development

No branches or pull requests

1 participant