-
-
Notifications
You must be signed in to change notification settings - Fork 6.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pathological performance when setting MAX_TOTAL_CONNECTIONS #2369
Comments
Okay. Can you spot the problem even without going to such a large numbers? It will be helpful if we can get locking ( Can you detect if there's a problem in connection reuse or if there's any particular time gaps between individual requests etc? |
Thanks. The issue basically disappears as the number of requests goes down. Activating
I've uploaded the whole file here. So libcurl actually sends a burst of requests, then apparently spends (too much) time doing stuff by iterating over pending requests, sends requests again, etc. This is confirmed on the nginx side, here's a strace fragment to get the precise timing when requests are received (with huge gaps between bursts):
|
Clearly there's a non-optimal handling of the connections that are "pending", waiting to be able to get sent. There's no separate list of pending connections, they're all in a single linked list and the ones that are pending are simply just set to state Can you try some brute-force experiments to see how the performance change if you for example only move a small set of connections from pending to active at a time (it doesn't really make sense to move a huge amount, but it's also not possible to give an easy fixed low number so let's just take one at first to see if you can see any performance difference): diff --git a/lib/multi.c b/lib/multi.c
index ca3a877eb..d3b2f0c1b 100644
--- a/lib/multi.c
+++ b/lib/multi.c
@@ -3072,10 +3072,11 @@ struct curl_llist *Curl_multi_pipelining_server_bl(struct Curl_multi *multi)
}
void Curl_multi_process_pending_handles(struct Curl_multi *multi)
{
struct curl_llist_element *e = multi->pending.head;
+ int i = 0;
while(e) {
struct Curl_easy *data = e->ptr;
struct curl_llist_element *next = e->next;
@@ -3085,10 +3086,12 @@ void Curl_multi_process_pending_handles(struct Curl_multi *multi)
/* Remove this node from the list */
Curl_llist_remove(&multi->pending, e, NULL);
/* Make sure that the handle will be processed soonish. */
Curl_expire(data, 0, EXPIRE_RUN_NOW);
+ if(i++ > 5)
+ break;
}
e = next; /* operate on next handle */
}
} |
The performance is now much better, and most importantly, linear with the number of requests. Doing 1000 requests with 5 max connections (verbose disabled) took ~700ms (before the patch), now takes 200ms. With 10k requests, from 70 seconds to 2100ms. I still get hundreds of consecutive:
but there's no apparent gap of requests (as received by nginx). The total throughput is still much slower than |
libcurl is indeed much more flexible so we could allow that to explain some performance loss compared against a tool like h2load, but I don't think such a huge difference can be explained by this. This is rather caused by a bug/inefficiency that we can and should fix. I think the plain fact that you could improve performance a lot by the hack I suggested shows that we could get a lot more speed by making the algorithm for moving connections from Some ideas for this:
|
I think we can get a pretty good boost with even simpler means. I'm working on a patch, stay tuned! |
Sounds great! |
My main concern is that it is "easy" to make a huge improvement for plain old HTTP/1 connections but with h2 and multiplexing things are more complex as then hundreds of transfers could be come possible in one go. Your example code is awesome and I'm using that to run tests. I intend to make the test slightly more complicated by also getting files over HTTP/2 to make sure my changes don't inadvertently punish such transfers. |
When a transfer is requested to get done and it is put in the pending queue when limited by number of connections, total or per-host, libcurl would previously very aggressively retry *ALL* pending transfers to get them transferring. That was very time consuming. By reducing the aggressiveness in how pending are being retried, we waste MUCH less time on putting transfers back into pending again. Reported-by: Cyril B Fixes #2369
I ran your test code against my localhost with 5000 transfers, and #2383 as it works currently changes the time it takes to complete from 30 seconds to less than one... |
It improves performance significantly for 5000 transfers even over HTTPS + HTTP/2 in my tests using the nghttpd server on localhost. h2load still outperforms libcurl with a notable margin, and it serves as an excellent "bar" to compare with but we have a much more complicated setup with more conditions and more complex rules which explains some or most of the remaining differences. This patch is really the low hanging fruit. To further improve things I think all transfers that are put in PENDING state should each have a clear route on how/when they can be put back to CONNECT so that we can move the right transfers and the right amount each time. And this needs to be done without looping through all currently added transfers. If others want to try this PR out before I merge it, please feel free. I'll hold off merge a few days at least. |
When a transfer is requested to get done and it is put in the pending queue when limited by number of connections, total or per-host, libcurl would previously very aggressively retry *ALL* pending transfers to get them transferring. That was very time consuming. By reducing the aggressiveness in how pending are being retried, we waste MUCH less time on putting transfers back into pending again. Reported-by: Cyril B Fixes #2369
Using my same benchmarking program, I now get a segfault:
|
silly me, fixed now |
Thanks, it works fine for me and the performance improvement is similar to your initial (crude) patch. |
libcurl version: 7.58.0
Operating system: Archlinux (Linux 4.15.5).
I'm trying to use libcurl to do as many requests as possible to a local server (similarly to load testing tools, although that's not my purpose). I've used the multi-uv.c example and slightly simplified it to not save files to disk and accept the number of requests to do as argument (full source).
The local server is nginx serving a tiny static page. It's not the bottleneck.
Doing 1000 requests takes about ~160ms:
For comparison, h2load takes a similar amount of time, although using less CPU:
Obviously, reusing connections should improve performance. Confirmed with h2load:
However, adding:
to
multi-uv.c
actually degrades performance a lot:Even worse, the performance degradation is not linear at all. Let's do more requests in total (still limiting to 10 connections):
Am I doing something wrong here? missing an obvious optimization?
Here's what profiling with
perf
returns (truncated):The text was updated successfully, but these errors were encountered: