Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTTP/2 connection reuse doesn't respect maximum concurrent streams #4779

Closed
kunalekawde opened this issue Jan 4, 2020 · 15 comments
Closed

HTTP/2 connection reuse doesn't respect maximum concurrent streams #4779

kunalekawde opened this issue Jan 4, 2020 · 15 comments

Comments

@kunalekawde
Copy link

@kunalekawde kunalekawde commented Jan 4, 2020

I did this

Ran HTTP/2 parallel transfers using this example: https://gist.github.com/kunalekawde/1c6494b0aa4bf818ab0c42bb42497f5d. More details in https://curl.haxx.se/mail/lib-2019-12/0073.html.
I tried with following fix: https://gist.github.com/kunalekawde/5cdfe5164b85d42c2b5ab5cdc9f70944 but after some calls run, it resulted in core dump.

(gdb) bt
#0  0x00007fbf16ff70fe in ConnectionExists (data=0x225f330, needle=0x22495e0, usethis=0x7ffd4e4e7ab8, force_reuse=0x7ffd4e4e7ab7, waitpipe=0x7ffd4e4e7ab6) at url.c:1198
#1  0x00007fbf16ffbf94 in create_conn (data=0x225f330, in_connect=0x7ffd4e4e7b30, async=0x7ffd4e4e7ba6) at url.c:3596
#2  0x00007fbf16ffc6e9 in Curl_connect (data=0x225f330, asyncp=0x7ffd4e4e7ba6, protocol_done=0x7ffd4e4e7ba5) at url.c:3847
#3  0x00007fbf16fc1c4b in multi_runsingle (multi=0x1d976c0, now=..., data=0x225f330) at multi.c:1511
#4  0x00007fbf16fc41e1 in multi_socket (multi=0x1d976c0, checkall=false, s=23, ev_bitmask=0, running_handles=0x7ffd4e4e7ed0) at multi.c:2691
#5  0x00007fbf16fc4964 in curl_multi_socket_action (multi=0x1d976c0, s=23, ev_bitmask=0, running_handles=0x7ffd4e4e7ed0) at multi.c:2814
(gdb) p check->data
$7 = (struct Curl_easy *) 0x22e8ec0
(gdb) p check->data->multi
Cannot access memory at address 0x22e8f80

url.c

1197       if(CONN_INUSE(check) && check->data &&
1198          (check->data->multi != needle->data->multi))

I expected the following

Connections to be re-used based on some upper limit - max concurrent streams.

curl/libcurl version

7.67.0

@bagder
Copy link
Member

@bagder bagder commented Jan 4, 2020

I'm confused. You seem to report a crash but why does the title mention "upper limit" ?

@bagder
Copy link
Member

@bagder bagder commented Jan 4, 2020

The crash is only because of your patch? So what exactly is the problem with the unmodified libcurl and your example?

@kunalekawde
Copy link
Author

@kunalekawde kunalekawde commented Jan 5, 2020

The crash is only because of your patch? So what exactly is the problem with the unmodified libcurl and your example?

Unmodified libcurl 7.67.0 has the issue for connection re-use. A fix for same was committed to master with #4732 . This fix doesn't work as same connection is being re-used continuously for parallel streams without hitting limit of max concurrent connections, indicated same here: https://curl.haxx.se/mail/lib-2019-12/0073.html.
Later, as I checked, the probable issue was in checking the limit as indicated here: https://curl.haxx.se/mail/lib-2019-12/0074.html and tried with above patch(yes, crash is only with the patch) but still there is issue of core dump.

@bagder bagder changed the title HTTP/2 connection reuse - upper limit HTTP/2 connection reuse doesn't respect maximum concurrent streams Jan 5, 2020
@bagder
Copy link
Member

@bagder bagder commented Jan 5, 2020

So this title is more appropriate?

@kunalekawde
Copy link
Author

@kunalekawde kunalekawde commented Jan 5, 2020

So this title is more appropriate?

yes.

bagder added a commit that referenced this issue Jan 5, 2020
Reported-by: Kunal Ekawde
Fixes #4779
@bagder bagder added regression and removed crash labels Jan 5, 2020
@bagder
Copy link
Member

@bagder bagder commented Jan 5, 2020

does #4784 make things better?

@kunalekawde
Copy link
Author

@kunalekawde kunalekawde commented Jan 5, 2020

does #4784 make things better?

Got this crash during one of the run. All call run model is same - parallel HTTP/2 transfers.

#0  0x00007f6fd72630e4 in ConnectionExists (data=0x2b403f0, needle=0x2b5bf40, usethis=0x7ffc393d0468, force_reuse=0x7ffc393d0467, waitpipe=0x7ffc393d0466) at url.c:1196
#1  0x00007f6fd7267f7a in create_conn (data=0x2b403f0, in_connect=0x7ffc393d04e0, async=0x7ffc393d0556) at url.c:3594
#2  0x00007f6fd72686cf in Curl_connect (data=0x2b403f0, asyncp=0x7ffc393d0556, protocol_done=0x7ffc393d0555) at url.c:3845
#3  0x00007f6fd722dc4b in multi_runsingle (multi=0x247a850, now=..., data=0x2b403f0) at multi.c:1511
#4  0x00007f6fd72301e1 in multi_socket (multi=0x247a850, checkall=false, s=-1, ev_bitmask=0, running_handles=0x7ffc393d076c) at multi.c:2691
#5  0x00007f6fd7230964 in curl_multi_socket_action (multi=0x247a850, s=-1, ev_bitmask=0, running_handles=0x7ffc393d076c) at multi.c:2814
(gdb) p check->data
$1 = (struct Curl_easy *) 0x2bb1f00
(gdb) p check->data->multi
Cannot access memory at address 0x2bb1fc0

I also got below once - but this should be different issue and shall check from application side first.

#0  0x00007f32e0587feb in raise () from /lib64/libc.so.6
#1  0x00007f32e05725c1 in abort () from /lib64/libc.so.6
#2  0x00007f32e0572491 in __assert_fail_base.cold.0 () from /lib64/libc.so.6
#3  0x00007f32e0580752 in __assert_fail () from /lib64/libc.so.6
#4  0x00007f32e836f292 in Curl_resolver_is_resolved (conn=0x2204210, entry=0x7fffa7bc3f50) at asyn-thread.c:600
#5  0x00007f32e8389e14 in Curl_resolv_check (conn=0x2204210, dns=0x7fffa7bc3f50) at hostip.c:1026
#6  0x00007f32e8364ee2 in multi_runsingle (multi=0x19134a0, now=..., data=0x2225810) at multi.c:1581
#7  0x00007f32e83671e1 in multi_socket (multi=0x19134a0, checkall=false, s=-1, ev_bitmask=0, running_handles=0x7fffa7bc417c) at multi.c:2691
#8  0x00007f32e8367964 in curl_multi_socket_action (multi=0x19134a0, s=-1, ev_bitmask=0, running_handles=0x7fffa7bc417c) at multi.c:2814
#4  0x00007f32e836f292 in Curl_resolver_is_resolved (conn=0x2204210, entry=0x7fffa7bc3f50) at asyn-thread.c:600
600	asyn-thread.c: No such file or directory.
(gdb) p conn->async
$7 = {
  hostname = 0x0,
  port = 80,
  dns = 0x0,
  status = 0,
  os_specific = 0x0,
  done = 1
}
@bagder
Copy link
Member

@bagder bagder commented Jan 5, 2020

Got this crash during one of the run.

Can you clarify what you ran then? What libcurl version and patch and what application code?

@kunalekawde
Copy link
Author

@kunalekawde kunalekawde commented Jan 6, 2020

Got this crash during one of the run.

Can you clarify what you ran then? What libcurl version and patch and what application code?

libcurl version: 7.67.0 with above fix and #4732
I've been trying to reproduce with simple example but seeing some other behavior.

Using the example: https://gist.github.com/kunalekawde/6a5d7646fca9d82c50e1e31147a85686

Case 1:
This is mix of HTTP/2 along with few HTTP/1.1 transfers. Here, I see my test server which handle HTTP/1.1 and 2.0 is not behaving properly. Can you try this ?

Case 2:
With a small change(comment line 168 to 171) in above example to have only HTTP/2.0 transfers, there are 2 things:

  1. Max concurrent connections limit considered to check is the server sent value for first set / limit hit, subsequently, its the client side value set.
  2. On slightly more number of calls --> ./http2-download 600 , I see client stops the http/2 transfers if its run back to back.

I'm still trying to reproduce the crash reported earlier using this example.

@bagder
Copy link
Member

@bagder bagder commented Jan 6, 2020

I ran the example with plain version 2 (not prior knowledge, I don't have any such servers around) and it seems to work for me. I can't see any crash or misbehavior.

As for what max concurrency to use in that check, I think maybe we should use the minimum value of CURLMOPT_MAX_CONCURRENT_STREAMS and the server's set MAX_CONCURRENT_STREAMS config value. What do think?

@kunalekawde
Copy link
Author

@kunalekawde kunalekawde commented Jan 6, 2020

I ran the example with plain version 2 (not prior knowledge, I don't have any such servers around) and it seems to work for me. I can't see any crash or misbehavior.

So, mix of HTTP/1.1 and plain HTTP/2.0 transfers to same server worked fine? or was this with only HTTP/2 plain transfers ?

As for what max concurrency to use in that check, I think maybe we should use the minimum value of CURLMOPT_MAX_CONCURRENT_STREAMS and the server's set MAX_CONCURRENT_STREAMS config value. What do think?

Yes, this sounds reasonable.

bagder added a commit that referenced this issue Jan 9, 2020
A regression made the code use 'multiplexed' as a boolean instead of the
counter it is intended to be.

Also, respect the CURLMOPT_MAX_CONCURRENT_STREAMS value in the same
check.

Reported-by: Kunal Ekawde
Fixes #4779
@bagder
Copy link
Member

@bagder bagder commented Jan 9, 2020

(lets discuss the PR in the PR, see #4784)

@kunalekawde
Copy link
Author

@kunalekawde kunalekawde commented Jan 9, 2020

On the trap reported earlier:
Its getting reproduced during integrations tests where it traps here:
1193 if(check->data && (check->data->multi != needle->data->multi))

(gdb) p check->data
$1 = (struct Curl_easy *) 0x2bb1f00
(gdb) p check->data->multi
Cannot access memory at address 0x2bb1fc0

Do you see anything of concern here ?

1366   if(chosen) {
1367     /* mark it as used before releasing the lock */
1368     chosen->data = data; /* own it! */
1369     Curl_conncache_unlock(data);
1370     *usethis = chosen;
1371     return TRUE; /* yes, we found one to use! */
1372   }

chosen->data is set here and again here:
In
3604 reuse_conn(conn, conn_temp);

data is updated
3208 conn->data = old_conn->data;

@bagder
Copy link
Member

@bagder bagder commented Jan 9, 2020

I would recommend you to base further tests on 7.68.0 before spending time on this. In particular ee263de changed some logic in this regard.

@bagder bagder closed this in 9607532 Jan 13, 2020
@kunalekawde
Copy link
Author

@kunalekawde kunalekawde commented Jan 13, 2020

yeah so far I didnt observe the trap on 7.68.0 there were application issues due to which couldn't proceed will the all the tests, probably #ee263de could have fixed. Thanks

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

2 participants
You can’t perform that action at this time.