-
-
Notifications
You must be signed in to change notification settings - Fork 5.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add middleware for request prioritization #33951
base: main
Are you sure you want to change the base?
Conversation
65193d7
to
f3096c1
Compare
This adds a middleware for overload protection, that is intended to help protect against malicious scrapers. It does this by via [`codel`](https://github.com/bohde/codel), which will perform the following: 1. Limit the number of in-flight requests to some user defined max 2. When in-flight requests have reached their max, begin queuing requests, with logged in requests having priority above logged out requests 3. Once a request has been queued for too long, it has a percentage chance to be rejected based upon how overloaded the entire system is. When a server experiences more traffic than it can handle, this has the effect of keeping latency low for logged in users, while rejecting just enough requests from logged out users to keep the service from being overloaded. Below are benchmarks showing a system at 100% capacity and 200% capacity in a few different configurations. The 200% capacity is shown to highlight an extreme case. I used [hey](https://github.com/rakyll/hey) to simulate the bot traffic: ``` hey -z 1m -c 10 "http://localhost:3000/rowan/demo/issues?state=open&type=all&labels=&milestone=0&project=0&assignee=0&poster=0&q=fix" ``` The concurrency of 10 was chosen from experiments where my local server began to experience higher latency. Results | Method | Concurrency | p95 latency | Successful RPS | Requests Dropped | |--------|--------|--------|--------|--------| | QoS Off | 10 | 0.2960s | 44 rps | 0% | | QoS Off | 20 | 0.5667s | 44 rps | 0%| | QoS On | 20 | 0.4409s | 48 rps | 10% | | QoS On 50% Logged In* | 10 | 0.3891s | 33 rps | 7% | | QoS On 50% Logged Out* | 10 | 2.0388s | 13 rps | 6% | Logged in users were given the additional parameter ` -H "Cookie: i_like_gitea=<session>`. Tests with `*` were run at the same time, representing a workload with mixed logged in & logged out users. Results are separated to show prioritization, and how logged in users experience a 100ms latency increase under load, compared to the 1.6 seconds logged out users see.
TBH, according to your test result, I do not see it is really useful ....... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think at first step, just distinguish between login and not is enough. and looks it can be added for the api router also.
We've been running this patch for awhile, and while it takes a bit of tuning to get the right settings for a given server it has alleviated a lot of outage concerns from waves of scrapers that don't respect robots.txt.
This was in my initial version we ran, but then users who are attempting to login can still experience lower priority traffic. Since the major source of traffic for public code hosting are scrapers trying to download content, deprioritizing those routes mitigates that concern. |
So the original intention is to fight with crawlers/spiders/robots? |
Yes, I mentioned this in my description
|
Sorry, missed that part. The "perform the following" part and "result" part attracted my attention ..... If the intention is to "protect against malicious scrapers", I think the Or like https://github.com/go-gitea/gitea/pull/33951/files#r2006668195 said, mark the routes with priority |
I think I have some ideas about how to handle these "route paths" clearly. Could I push some commits into your branch? |
Could you send a PR to my branch, or sketch it out in another commit? This would prevent merge conflicts on my branch |
Sorry, don't quite understand what it exactly means ....
We can use chi router's "RoutePattern", and make code testable. |
This adds a middleware for overload protection, that is intended to help protect against malicious scrapers. It does this via
codel
, which will perform the following:When a server experiences more traffic than it can handle, this has the effect of keeping latency low for logged in users, while rejecting just enough requests from logged out users to keep the service from being overloaded.
Below are benchmarks showing a system at 100% capacity and 200% capacity in a few different configurations. The 200% capacity is shown to highlight an extreme case. I used hey to simulate the bot traffic:
The concurrency of 10 was chosen from experiments where my local server began to experience higher latency.
Results
Logged in users were given the additional parameter
-H "Cookie: i_like_gitea=<session>
.Tests with
*
were run at the same time, representing a workload with mixed logged in & logged out users. Results are separated to show prioritization, and how logged in users experience a 100ms latency increase under load, compared to the 1.6 seconds logged out users see.