Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Limit the number of concurrent dead link requests #4332

Closed
sarayourfriend opened this issue May 15, 2024 · 1 comment
Closed

Limit the number of concurrent dead link requests #4332

sarayourfriend opened this issue May 15, 2024 · 1 comment
Assignees
Labels
💻 aspect: code Concerns the software code in the repository 🧰 goal: internal improvement Improvement that benefits maintainers, not users 🟨 priority: medium Not blocking but should be addressed soon 🧱 stack: api Related to the Django API

Comments

@sarayourfriend
Copy link
Collaborator

Problem

The API has no upper limit on the number of concurrent dead link requests it will attempt to process. This can introduce a flow control issue for the ASGI API because if the worker tries to handle multiple significant requests that require lots of dead link checks, it can easily get overwhelmed and run out of capacity. Check out this issue for a similar problem: python/cpython#81407

Description

To solve this, we should limit the number of concurrent dead link requests for a given worker. We can do this using a semaphore that blocks the dead link requests if there are too many happening at a given time.

The biggest question is what the limit should be, but that's something we'd probably need to experiment with in production.

If it's possible to prevent duplicate requests for the same URL at a single time, that would be incredible, but it would require coordination with a centralised queue (redis?) and probably harder to implement than a simple maximum at first pass. Perhaps this is a good candidate for a follow-up.

Additional context

This is a complement to other API reliability features like abuse prevention measures like #4321 and #4324.

We should consider whether the same must be done for the thumbnails.

@sarayourfriend sarayourfriend added 🟧 priority: high Stalls work on the project or its dependents 💻 aspect: code Concerns the software code in the repository 🧰 goal: internal improvement Improvement that benefits maintainers, not users 🧱 stack: api Related to the Django API 🟨 priority: medium Not blocking but should be addressed soon and removed 🟧 priority: high Stalls work on the project or its dependents labels May 15, 2024
@sarayourfriend
Copy link
Collaborator Author

Closing as won't do, for now. @krysal pointed out in #4333 that we already limit the number of concurrent outbound requests per worker to 100, which is lower than the limit I would have chosen in the PR anyway.

There's probably something else going on here, but #4598 might also address any issues coming from the dead link by reducing the overall possibility of many-concurrent long-running requests from the dead link code paths.

@sarayourfriend sarayourfriend closed this as not planned Won't fix, can't repro, duplicate, stale Jul 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
💻 aspect: code Concerns the software code in the repository 🧰 goal: internal improvement Improvement that benefits maintainers, not users 🟨 priority: medium Not blocking but should be addressed soon 🧱 stack: api Related to the Django API
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

1 participant