Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add max_connections limit to HTTP.listen() #647

Merged
merged 1 commit into from
Dec 30, 2020

Conversation

staticfloat
Copy link
Contributor

It is possible for the Task scheduler to become overwhelmed by too many
small tasks; it provides a better experience to simply refuse to create
more tasks and apply back-pressure through an accept() backlog.

It is possible for the Task scheduler to become overwhelmed by too many
small tasks; it provides a better experience to simply refuse to create
more tasks and apply back-pressure through an `accept()` backlog.
@codecov-io
Copy link

codecov-io commented Dec 17, 2020

Codecov Report

Merging #647 (97ff836) into master (edc5e28) will decrease coverage by 0.24%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #647      +/-   ##
==========================================
- Coverage   77.63%   77.39%   -0.25%     
==========================================
  Files          36       36              
  Lines        2325     2353      +28     
==========================================
+ Hits         1805     1821      +16     
- Misses        520      532      +12     
Impacted Files Coverage Δ
src/Servers.jl 66.87% <100.00%> (+0.64%) ⬆️
src/RetryRequest.jl 55.00% <0.00%> (-5.00%) ⬇️
src/Messages.jl 92.48% <0.00%> (-0.76%) ⬇️
src/ConnectionRequest.jl 51.76% <0.00%> (+3.43%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update edc5e28...97ff836. Read the comment docs.

@@ -229,6 +229,7 @@ function listen(f,
tcpisvalid::Function=tcp->true,
server::Union{Base.IOServer, Nothing}=nothing,
reuseaddr::Bool=false,
max_connections::Int=nolimit,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think there's a reasonable default we could use here? 10K? 100K? Enough so "benchmarkers" don't see a hit, but helps others not run into whatever you ran into by default?

Copy link
Contributor Author

@staticfloat staticfloat Dec 19, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's very dependent on relative load; the issue is old tasks getting starved for work, so it all depends on how many requests per second your server can service. If you can serve 10k requests per second, then 10k tasks is probably fine, but if you can only serve 100 requests per second, then 100 is what you want.

I've tried creating more adaptive methods (like yield()'ing immediately to push new tasks onto the back of scheduler's work queue) but nothing has performed quite as well as just limiting the number of tasks a priori.

I have a nice little benchmarking setup for PkgServer.jl that is based off of k6; once I have it published you can take a look at it and maybe we can make some nice synthetic benchmarks for HTTP.jl that others can use to find the realistic limits for their own applications.

It should be stated that the main use for this max_connections is for graceful degredation of service. Here's the k6 output for max_connections=nolimit:

1thread_vanilla_http091

And here's max_connections=500 (an experimentally-determined "good choice" for this hardware and workload):

1thread_500conn_http091

The first thing to notice is that the average requests per second (as eyeballed from the 2nd-to-the-left graph in the top row) are about the same, which makes sense; we're not changing the amount of throughput available. The next thing to notice is that the overall duration of the connections gets bounded at about 80ms in the second, whereas it grows quite rapidly in the first. This is shown by the lower mean http_req_duration (leftmost text box), as well as the much "tighter' spread between the min/95%/max lines in the center left graph. There are still some requests that get hung for some reasons (haven't tracked those down yet) but they're a very small percentage of requests.

The tradeoff is that errors start to happen a little earlier in the process now, as you can see in the "Errors per second" graph. That being said, the errors happen much quicker, and the rest of the clients maintain a much more responsive experience. But of course, knowing these limits will be completely application dependent.

@staticfloat staticfloat reopened this Dec 27, 2020
@quinnj quinnj merged commit 31fcf05 into JuliaWeb:master Dec 30, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants