Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pathological performance when setting MAX_TOTAL_CONNECTIONS #2369

Closed
cbay opened this issue Mar 7, 2018 · 13 comments
Closed

Pathological performance when setting MAX_TOTAL_CONNECTIONS #2369

cbay opened this issue Mar 7, 2018 · 13 comments

Comments

@cbay
Copy link

cbay commented Mar 7, 2018

libcurl version: 7.58.0
Operating system: Archlinux (Linux 4.15.5).

I'm trying to use libcurl to do as many requests as possible to a local server (similarly to load testing tools, although that's not my purpose). I've used the multi-uv.c example and slightly simplified it to not save files to disk and accept the number of requests to do as argument (full source).

The local server is nginx serving a tiny static page. It's not the bottleneck.

Doing 1000 requests takes about ~160ms:

$ time ./multi-uv 1000
./multi-uv 1000  0.08s user 0.06s system 89% cpu 0.162 total

For comparison, h2load takes a similar amount of time, although using less CPU:

$ time h2load -c 1000 -n 1000 --h1 http://127.0.0.1:8080/hello.html >/dev/null
h2load -c 1000 -n 1000 --h1 http://127.0.0.1:8080/hello.html  0.02s user 0.07s system 51% cpu 0.166 total

Obviously, reusing connections should improve performance. Confirmed with h2load:

$ time h2load -c 10 -n 1000 --h1 http://127.0.0.1:8080/hello.html >/dev/null
h2load -c 10 -n 1000 --h1 http://127.0.0.1:8080/hello.html > /dev/null  0.01s user 0.02s system 70% cpu 0.037 total

However, adding:

curl_multi_setopt(curl_handle, CURLMOPT_MAX_TOTAL_CONNECTIONS, 10L);

to multi-uv.c actually degrades performance a lot:

$ time ./multi-uv 1000
./multi-uv 1000  0.35s user 0.09s system 90% cpu 0.485 total

Even worse, the performance degradation is not linear at all. Let's do more requests in total (still limiting to 10 connections):

$ time ./multi-uv 2000
./multi-uv 2000  1.38s user 0.18s system 95% cpu 1.629 total
$ time ./multi-uv 5000
./multi-uv 5000  8.28s user 0.99s system 98% cpu 9.367 total

Am I doing something wrong here? missing an obvious optimization?

Here's what profiling with perf returns (truncated):

    81.56%     0.00%  multi-uv     multi-uv                [.] curl_multi_socket_action
            |
            ---curl_multi_socket_action
               |          
                --81.51%--multi_socket
                          |          
                          |--73.87%--multi_runsingle
                          |          |          
                          |          |--65.54%--Curl_connect
                          |          |          |          
                          |          |           --65.10%--create_conn
                          |          |                     |          
                          |          |                     |--11.41%--ConnectionExists
                          |          |                     |          |          
                          |          |                     |          |--6.37%--Curl_conncache_find_bundle
                          |          |                     |          |          |          
                          |          |                     |          |          |--4.86%--hashkey
                          |          |                     |          |          |          |          
                          |          |                     |          |          |           --4.78%--curl_msnprintf
                          |          |                     |          |          |                     |          
                          |          |                     |          |          |                      --4.70%--curl_mvsnprintf
                          |          |                     |          |          |                                |          
                          |          |                     |          |          |                                 --4.44%--dprintf_formatf
                          |          |                     |          |          |                                           |          
                          |          |                     |          |          |                                           |--1.30%--dprintf_Pass1
                          |          |                     |          |          |                                           |          
                          |          |                     |          |          |                                            --1.10%--addbyter
                          |          |                     |          |          |          
                          |          |                     |          |           --1.43%--Curl_hash_pick
                          |          |                     |          |                     |          
                          |          |                     |          |                      --0.89%--Curl_hash_str
                          |          |                     |          |          
                          |          |                     |           --1.97%--extract_if_dead
                          |          |                     |          
                          |          |                     |--7.11%--conn_free
                          |          |                     |          |          
                          |          |                     |          |--2.31%--curl_dofree
                          |          |                     |          |          |          
                          |          |                     |          |           --1.07%--curl_memlog
                          |          |                     |          |          
                          |          |                     |          |--1.58%--Curl_free_primary_ssl_config
                          |          |                     |          |          |          
                          |          |                     |          |           --0.84%--curl_dofree
                          |          |                     |          |          
                          |          |                     |           --0.52%--Curl_ssl_close
                          |          |                     |          
                          |          |                     |--6.40%--Curl_conncache_extract_oldest
                          |          |                     |          |          
                          |          |                     |           --4.44%--Curl_hash_next_element
                          |          |                     |          
                          |          |                     |--5.98%--Curl_conncache_find_bundle
                          |          |                     |          |          
                          |          |                     |          |--4.51%--hashkey
                          |          |                     |          |          |          
                          |          |                     |          |           --4.41%--curl_msnprintf
                          |          |                     |          |                     |          
                          |          |                     |          |                      --4.33%--curl_mvsnprintf
                          |          |                     |          |                                |          
                          |          |                     |          |                                 --4.07%--dprintf_formatf
                          |          |                     |          |                                           |          
                          |          |                     |          |                                           |--1.19%--dprintf_Pass1
                          |          |                     |          |                                           |          
                          |          |                     |          |                                            --0.96%--addbyter
                          |          |                     |          |          
                          |          |                     |           --1.30%--Curl_hash_pick
                          |          |                     |                     |          
                          |          |                     |                      --0.92%--Curl_hash_str
                          |          |                     |          
                          |          |                     |--5.96%--parseurlandfillconn
                          |          |                     |          |          
                          |          |                     |          |--1.99%--findprotocol
                          |          |                     |          |          
                          |          |                     |           --1.08%--Curl_dedotdotify
                          |          |                     |          
                          |          |                     |--4.32%--allocate_conn
                          |          |                     |          
                          |          |                     |--4.01%--create_conn_helper_init_proxy
                          |          |                     |          |          
                          |          |                     |           --2.53%--detect_proxy
                          |          |                     |                     |          
                          |          |                     |                     |--1.06%--Curl_strcasecompare
                          |          |                     |                     |          |          
                          |          |                     |                     |           --0.51%--Curl_raw_toupper
                          |          |                     |                     |          
                          |          |                     |                      --0.77%--curl_getenv
                          |          |                     |          
                          |          |                     |--3.88%--setup_connection_internals
                          |          |                     |          |          
                          |          |                     |          |--2.26%--Curl_free_request_state
                          |          |                     |          |          |          
                          |          |                     |          |           --1.45%--cfree@GLIBC_2.2.5
                          |          |                     |          |          
                          |          |                     |           --1.40%--Curl_http_setup_conn
                          |          |                     |          
                          |          |                     |--3.03%--Curl_clone_primary_ssl_config
                          |          |                     |          |          
                          |          |                     |          |--2.05%--__strlen_avx2
                          |          |                     |          |          
                          |          |                     |           --0.53%--curl_dostrdup
                          |          |                     |          
                          |          |                     |--1.20%--parse_remote_port
                          |          |                     |          
                          |          |                     |--0.88%--set_login
                          |          |                     |          |          
                          |          |                     |           --0.54%--curl_dostrdup
                          |          |                     |          
                          |          |                     |--0.80%--curl_dostrdup
                          |          |                     |          
                          |          |                     |--0.64%--curl_dofree
                          |          |                     |          
                          |          |                     |--0.61%--fix_hostname
                          |          |                     |          
                          |          |                     |--0.60%--cfree@GLIBC_2.2.5
                          |          |                     |          
                          |          |                      --0.55%--__strlen_avx2
                          |          |          
                          |           --4.74%--Curl_multi_process_pending_handles
                          |                     |          
                          |                     |--2.32%--Curl_expire
                          |                     |          |          
                          |                     |           --1.37%--multi_deltimeout
                          |                     |          
                          |                      --0.99%--mstate
                          |                                |          
                          |                                 --0.61%--Curl_infof
                          |          
                          |--2.80%--Curl_splaygetbest
                          |          |          
                          |           --2.24%--Curl_splay
                          |          
                          |--1.85%--add_next_timeout
                          |          
                          |--0.74%--sigpipe_ignore
                          |          
                          |--0.74%--__libc_sigaction
                          |          
                           --0.62%--0xffffffffab800158
@bagder
Copy link
Member

bagder commented Mar 9, 2018

Okay. Can you spot the problem even without going to such a large numbers? It will be helpful if we can get locking (CURLOPT_VERBOSE) enabled so that we can figure out some of libcurl's decisions and flow but doing that for a thousand transfers might be a little extreme. Also, enabling verbose logging will slow down operations a bit so there's a risk it changes behavior somewhat.

Can you detect if there's a problem in connection reuse or if there's any particular time gaps between individual requests etc?

@cbay
Copy link
Author

cbay commented Mar 10, 2018

Thanks. The issue basically disappears as the number of requests goes down.

Activating CURLOPT_VERBOSE does shed some light on what happens. I did 1000 requests with MAX_TOTAL_CONNECTIONS set to 5, here's what I get (I obviously had to truncate the log to cut the snippet size):

* STATE: INIT => CONNECT handle 0x1383148; line 1392 (connection #-5000)
* Added connection 0. The cache now contains 1 members
*   Trying 127.0.0.1...
* TCP_NODELAY set
* STATE: CONNECT => WAITCONNECT handle 0x1383148; line 1444 (connection #0)
* Connected to 127.0.0.1 (127.0.0.1) port 8080 (#0)
* STATE: WAITCONNECT => SENDPROTOCONNECT handle 0x1383148; line 1561 (connection #0)
* Marked for [keep alive]: HTTP default
* STATE: SENDPROTOCONNECT => DO handle 0x1383148; line 1579 (connection #0)
> GET /hello.html HTTP/1.1
Host: 127.0.0.1:8080
Accept: */*

* STATE: DO => DO_DONE handle 0x1383148; line 1658 (connection #0)
* STATE: DO_DONE => WAITPERFORM handle 0x1383148; line 1783 (connection #0)
* STATE: WAITPERFORM => PERFORM handle 0x1383148; line 1799 (connection #0)
* STATE: INIT => CONNECT handle 0x138c418; line 1392 (connection #-5000)
* Found bundle for host 127.0.0.1: 0x1371ae8 [serially]
* Added connection 1. The cache now contains 2 members
* Hostname 127.0.0.1 was found in DNS cache

[4 more requests]



* STATE: INIT => CONNECT handle 0x13b0f58; line 1392 (connection #-5000)
* Found bundle for host 127.0.0.1: 0x1371ae8 [can pipeline]
* No connections available in cache
* No connections available.
* STATE: CONNECT => CONNECT_PEND handle 0x13b0f58; line 1411 (connection #-5000)
* STATE: INIT => CONNECT handle 0x236ad68; line 1392 (connection #-5000)
* Found bundle for host 127.0.0.1: 0x22fdae8 [can pipeline]
* No connections available in cache
* No connections available.
* STATE: CONNECT => CONNECT_PEND handle 0x236ad68; line 1411 (connection #-5000)

[998 more]



* Expire cleared
* Expire cleared
* Expire cleared
* Expire cleared
* HTTP 1.1 or later with persistent connection, pipelining supported
< HTTP/1.1 200 OK
* Server nginx/1.12.2 is not blacklisted
< Server: nginx/1.12.2
< Date: Sat, 10 Mar 2018 14:35:28 GMT
< Content-Type: text/html
< Content-Length: 12
< Last-Modified: Tue, 06 Mar 2018 09:59:36 GMT
< Connection: keep-alive
< ETag: "5a9e6688-c"
< Accept-Ranges: bytes
<
* STATE: CONNECT_PEND => CONNECT handle 0x2361a98; line 3044 (connection #-5000)
* STATE: CONNECT_PEND => CONNECT handle 0x236ad68; line 3044 (connection #-5000)

[988 more]



* STATE: PERFORM => DONE handle 0x230f148; line 1968 (connection #0)
* multi_done
* Connection #0 to host 127.0.0.1 left intact
* HTTP 1.1 or later with persistent connection, pipelining supported
< HTTP/1.1 200 OK
* Server nginx/1.12.2 is not blacklisted
< Server: nginx/1.12.2
< Date: Sat, 10 Mar 2018 14:35:28 GMT
< Content-Type: text/html
< Content-Length: 12
< Last-Modified: Tue, 06 Mar 2018 09:59:36 GMT
< Connection: keep-alive
< ETag: "5a9e6688-c"
< Accept-Ranges: bytes
< 
* STATE: PERFORM => DONE handle 0x2318418; line 1968 (connection #1)
* multi_done
* Connection #1 to host 127.0.0.1 left intact
* STATE: PERFORM => DONE handle 0x23216e8; line 1968 (connection #2)
* multi_done
* Connection #2 to host 127.0.0.1 left intact
* STATE: PERFORM => DONE handle 0x232a9b8; line 1968 (connection #3)
* multi_done
* Connection #3 to host 127.0.0.1 left intact
* STATE: PERFORM => DONE handle 0x23587c8; line 1968 (connection #4)
* multi_done
* Connection #4 to host 127.0.0.1 left intact
* Found bundle for host 127.0.0.1: 0x22fdae8 [can pipeline]
* Re-using existing connection! (#0) with host 127.0.0.1
* Connected to 127.0.0.1 (127.0.0.1) port 8080 (#0)
* STATE: CONNECT => DO handle 0x2361a98; line 1437 (connection #0)
> GET /hello.html HTTP/1.1
Host: 127.0.0.1:8080
Accept: */*

* STATE: DO => DO_DONE handle 0x2361a98; line 1658 (connection #0)
* STATE: DO_DONE => WAITPERFORM handle 0x2361a98; line 1783 (connection #0)
* STATE: WAITPERFORM => PERFORM handle 0x2361a98; line 1799 (connection #0)
* Found bundle for host 127.0.0.1: 0x22fdae8 [can pipeline]
* Re-using existing connection! (#1) with host 127.0.0.1
* Connected to 127.0.0.1 (127.0.0.1) port 8080 (#1)
* STATE: CONNECT => DO handle 0x236ad68; line 1437 (connection #1)
> GET /hello.html HTTP/1.1
Host: 127.0.0.1:8080
Accept: */*

[...]

* Found bundle for host 127.0.0.1: 0x22fdae8 [can pipeline]
* No connections available in cache
* No connections available.
* STATE: CONNECT => CONNECT_PEND handle 0x238f8a8; line 1411 (connection #-5000)
* Found bundle for host 127.0.0.1: 0x22fdae8 [can pipeline]
* No connections available in cache
* No connections available.
* STATE: CONNECT => CONNECT_PEND handle 0x2398b78; line 1411 (connection #-5000)
* Found bundle for host 127.0.0.1: 0x22fdae8 [can pipeline]
* No connections available in cache
* No connections available.

[...]

I've uploaded the whole file here.

So libcurl actually sends a burst of requests, then apparently spends (too much) time doing stuff by iterating over pending requests, sends requests again, etc. This is confirmed on the nginx side, here's a strace fragment to get the precise timing when requests are received (with huge gaps between bursts):

15:44:45.696664 recvfrom(5, "GET /hello.html HTTP/1.1\r\nHost: "..., 1024, 0, NULL, NULL) = 63
15:44:45.697168 recvfrom(10, "GET /hello.html HTTP/1.1\r\nHost: "..., 1024, 0, NULL, NULL) = 63
15:44:45.697487 recvfrom(11, "GET /hello.html HTTP/1.1\r\nHost: "..., 1024, 0, NULL, NULL) = 63
15:44:45.697813 recvfrom(12, "GET /hello.html HTTP/1.1\r\nHost: "..., 1024, 0, NULL, NULL) = 63
15:44:45.698077 recvfrom(13, "GET /hello.html HTTP/1.1\r\nHost: "..., 1024, 0, NULL, NULL) = 63
15:44:45.752294 recvfrom(5, "GET /hello.html HTTP/1.1\r\nHost: "..., 1024, 0, NULL, NULL) = 63
15:44:45.752638 recvfrom(10, "GET /hello.html HTTP/1.1\r\nHost: "..., 1024, 0, NULL, NULL) = 63
15:44:45.752893 recvfrom(11, "GET /hello.html HTTP/1.1\r\nHost: "..., 1024, 0, NULL, NULL) = 63
15:44:45.753184 recvfrom(12, "GET /hello.html HTTP/1.1\r\nHost: "..., 1024, 0, NULL, NULL) = 63
15:44:45.753415 recvfrom(13, "GET /hello.html HTTP/1.1\r\nHost: "..., 1024, 0, NULL, NULL) = 63
15:44:45.790672 recvfrom(5, "GET /hello.html HTTP/1.1\r\nHost: "..., 1024, 0, NULL, NULL) = 63
15:44:45.791051 recvfrom(10, "GET /hello.html HTTP/1.1\r\nHost: "..., 1024, 0, NULL, NULL) = 63
15:44:45.791276 recvfrom(11, "GET /hello.html HTTP/1.1\r\nHost: "..., 1024, 0, NULL, NULL) = 63
15:44:45.791492 recvfrom(12, "GET /hello.html HTTP/1.1\r\nHost: "..., 1024, 0, NULL, NULL) = 63
15:44:45.791714 recvfrom(13, "GET /hello.html HTTP/1.1\r\nHost: "..., 1024, 0, NULL, NULL) = 63
15:44:45.822852 recvfrom(5, "GET /hello.html HTTP/1.1\r\nHost: "..., 1024, 0, NULL, NULL) = 63
15:44:45.823178 recvfrom(10, "GET /hello.html HTTP/1.1\r\nHost: "..., 1024, 0, NULL, NULL) = 63
15:44:45.823371 recvfrom(11, "GET /hello.html HTTP/1.1\r\nHost: "..., 1024, 0, NULL, NULL) = 63
15:44:45.823563 recvfrom(12, "GET /hello.html HTTP/1.1\r\nHost: "..., 1024, 0, NULL, NULL) = 63
15:44:45.823882 recvfrom(13, "GET /hello.html HTTP/1.1\r\nHost: "..., 1024, 0, NULL, NULL) = 63

@bagder
Copy link
Member

bagder commented Mar 11, 2018

Clearly there's a non-optimal handling of the connections that are "pending", waiting to be able to get sent.

There's no separate list of pending connections, they're all in a single linked list and the ones that are pending are simply just set to state CURLM_STATE_CONNECT_PEND. The function Curl_multi_process_pending_handles is clearly not really effective when there's 5 active and 995 pending.

Can you try some brute-force experiments to see how the performance change if you for example only move a small set of connections from pending to active at a time (it doesn't really make sense to move a huge amount, but it's also not possible to give an easy fixed low number so let's just take one at first to see if you can see any performance difference):

diff --git a/lib/multi.c b/lib/multi.c
index ca3a877eb..d3b2f0c1b 100644
--- a/lib/multi.c
+++ b/lib/multi.c
@@ -3072,10 +3072,11 @@ struct curl_llist *Curl_multi_pipelining_server_bl(struct Curl_multi *multi)
 }
 
 void Curl_multi_process_pending_handles(struct Curl_multi *multi)
 {
   struct curl_llist_element *e = multi->pending.head;
+  int i = 0;
 
   while(e) {
     struct Curl_easy *data = e->ptr;
     struct curl_llist_element *next = e->next;
 
@@ -3085,10 +3086,12 @@ void Curl_multi_process_pending_handles(struct Curl_multi *multi)
       /* Remove this node from the list */
       Curl_llist_remove(&multi->pending, e, NULL);
 
       /* Make sure that the handle will be processed soonish. */
       Curl_expire(data, 0, EXPIRE_RUN_NOW);
+      if(i++ > 5)
+        break;
     }
 
     e = next; /* operate on next handle */
   }
 }

@cbay
Copy link
Author

cbay commented Mar 12, 2018

The performance is now much better, and most importantly, linear with the number of requests. Doing 1000 requests with 5 max connections (verbose disabled) took ~700ms (before the patch), now takes 200ms. With 10k requests, from 70 seconds to 2100ms.

I still get hundreds of consecutive:

* STATE: CONNECT => CONNECT_PEND handle 0x2e11e28; line 1411 (connection #-5000)
* Found bundle for host 127.0.0.1: 0x22faae8 [can pipeline]
* No connections available in cache
* No connections available.

but there's no apparent gap of requests (as received by nginx).

The total throughput is still much slower than h2load (which does 10k requests in 250ms), but maybe that's inherent to the flexibility of libcurl.

@bagder
Copy link
Member

bagder commented Mar 12, 2018

libcurl is indeed much more flexible so we could allow that to explain some performance loss compared against a tool like h2load, but I don't think such a huge difference can be explained by this. This is rather caused by a bug/inefficiency that we can and should fix.

I think the plain fact that you could improve performance a lot by the hack I suggested shows that we could get a lot more speed by making the algorithm for moving connections from CURLM_STATE_CONNECT_PEND to CURLM_STATE_CONNECT much smarter.

Some ideas for this:

  1. Avoid the scanning-through-all (wanted) transfers completely. It's dumb and doesn't scale.
  2. Keep a list of pending transfers per "bundle" (host) so that when one transfer to that host is done, one and just one transfer from the queue would be lifted from it and change state. The list should be in FIFO order.
  3. Keep another list for pending transfers that are limited to to "max total connections", and each time an existing connection gets disconnected, extract a transfer from this list and change state.

@bagder
Copy link
Member

bagder commented Mar 13, 2018

I think we can get a pretty good boost with even simpler means. I'm working on a patch, stay tuned!

@cbay
Copy link
Author

cbay commented Mar 13, 2018

Sounds great!

@bagder
Copy link
Member

bagder commented Mar 13, 2018

My main concern is that it is "easy" to make a huge improvement for plain old HTTP/1 connections but with h2 and multiplexing things are more complex as then hundreds of transfers could be come possible in one go. Your example code is awesome and I'm using that to run tests. I intend to make the test slightly more complicated by also getting files over HTTP/2 to make sure my changes don't inadvertently punish such transfers.

bagder added a commit that referenced this issue Mar 14, 2018
When a transfer is requested to get done and it is put in the pending
queue when limited by number of connections, total or per-host, libcurl
would previously very aggressively retry *ALL* pending transfers to get
them transferring. That was very time consuming.

By reducing the aggressiveness in how pending are being retried, we
waste MUCH less time on putting transfers back into pending again.

Reported-by: Cyril B
Fixes #2369
@bagder
Copy link
Member

bagder commented Mar 15, 2018

I ran your test code against my localhost with 5000 transfers, and #2383 as it works currently changes the time it takes to complete from 30 seconds to less than one...

@bagder
Copy link
Member

bagder commented Mar 15, 2018

It improves performance significantly for 5000 transfers even over HTTPS + HTTP/2 in my tests using the nghttpd server on localhost.

h2load still outperforms libcurl with a notable margin, and it serves as an excellent "bar" to compare with but we have a much more complicated setup with more conditions and more complex rules which explains some or most of the remaining differences.

This patch is really the low hanging fruit. To further improve things I think all transfers that are put in PENDING state should each have a clear route on how/when they can be put back to CONNECT so that we can move the right transfers and the right amount each time. And this needs to be done without looping through all currently added transfers.

If others want to try this PR out before I merge it, please feel free. I'll hold off merge a few days at least.

bagder added a commit that referenced this issue Mar 15, 2018
When a transfer is requested to get done and it is put in the pending
queue when limited by number of connections, total or per-host, libcurl
would previously very aggressively retry *ALL* pending transfers to get
them transferring. That was very time consuming.

By reducing the aggressiveness in how pending are being retried, we
waste MUCH less time on putting transfers back into pending again.

Reported-by: Cyril B
Fixes #2369
@cbay
Copy link
Author

cbay commented Mar 15, 2018

Using my same benchmarking program, I now get a segfault:

Program received signal SIGSEGV, Segmentation fault.
0x0000000000428bf5 in process_pending_handles (multi=0x6c7bd8) at multi.c:3074
3074      struct Curl_easy *data = e->ptr;

@bagder
Copy link
Member

bagder commented Mar 15, 2018

silly me, fixed now

@cbay
Copy link
Author

cbay commented Mar 16, 2018

Thanks, it works fine for me and the performance improvement is similar to your initial (crude) patch.

@bagder bagder closed this as completed in 7f9ce08 Mar 16, 2018
@lock lock bot locked as resolved and limited conversation to collaborators Jun 14, 2018
This issue was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Development

No branches or pull requests

2 participants