Limit the number of concurrent dead link requests #4332
Labels
💻 aspect: code
Concerns the software code in the repository
🧰 goal: internal improvement
Improvement that benefits maintainers, not users
🟨 priority: medium
Not blocking but should be addressed soon
🧱 stack: api
Related to the Django API
Problem
The API has no upper limit on the number of concurrent dead link requests it will attempt to process. This can introduce a flow control issue for the ASGI API because if the worker tries to handle multiple significant requests that require lots of dead link checks, it can easily get overwhelmed and run out of capacity. Check out this issue for a similar problem: python/cpython#81407
Description
To solve this, we should limit the number of concurrent dead link requests for a given worker. We can do this using a semaphore that blocks the dead link requests if there are too many happening at a given time.
The biggest question is what the limit should be, but that's something we'd probably need to experiment with in production.
If it's possible to prevent duplicate requests for the same URL at a single time, that would be incredible, but it would require coordination with a centralised queue (redis?) and probably harder to implement than a simple maximum at first pass. Perhaps this is a good candidate for a follow-up.
Additional context
This is a complement to other API reliability features like abuse prevention measures like #4321 and #4324.
We should consider whether the same must be done for the thumbnails.
The text was updated successfully, but these errors were encountered: