Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

--http3 memory leaks when trying a non-listening service #5447

Closed
bagder opened this issue May 25, 2020 · 3 comments
Closed

--http3 memory leaks when trying a non-listening service #5447

bagder opened this issue May 25, 2020 · 3 comments
Assignees

Comments

@bagder
Copy link
Member

@bagder bagder commented May 25, 2020

  • Memory leaks detected by ASAN with either ngtcp2 or quiche on error paths (no listening service): curl -v --http3 https://localhost
$ valgrind --leak-check=full ./src/curl --http3 https://localhost
==660561== HEAP SUMMARY:
==660561==     in use at exit: 23,855 bytes in 79 blocks
==660561==   total heap usage: 1,628 allocs, 1,549 frees, 410,461 bytes allocated
==660561== 
==660561== 2,710 (184 direct, 2,526 indirect) bytes in 1 blocks are definitely lost in loss record 77 of 79
==660561==    at 0x483677F: malloc (vg_replace_malloc.c:309)
==660561==    by 0x4890CD0: alloc (alloc.rs:84)
==660561==    by 0x4890CD0: exchange_malloc (alloc.rs:206)
==660561==    by 0x4890CD0: new<quiche::Config> (boxed.rs:123)
==660561==    by 0x4890CD0: quiche_config_new (ffi.rs:86)
==660561==    by 0x1B7799: Curl_quic_connect (quiche.c:168)
==660561==    by 0x1BF7AC: singleipconnect (connect.c:1259)
==660561==    by 0x1BFA79: Curl_connecthost (connect.c:1347)
==660561==    by 0x15E546: Curl_setup_conn (url.c:3881)
==660561==    by 0x15E6AB: Curl_connect (url.c:3926)
==660561==    by 0x17F7AC: multi_runsingle (multi.c:1639)
==660561==    by 0x1810EF: curl_multi_perform (multi.c:2361)
==660561==    by 0x154853: easy_transfer (easy.c:603)
==660561==    by 0x154A7C: easy_perform (easy.c:693)
==660561==    by 0x154AC6: curl_easy_perform (easy.c:712)
==660561== 

Originally posted by @Lekensteyn in #5443 (comment)

@bagder bagder self-assigned this May 25, 2020
bagder added a commit that referenced this issue May 25, 2020
Addresses the quiche side of #5447
Reported-by: Peter Wu
bagder added a commit that referenced this issue May 25, 2020
Addresses the ngtcp2 side of #5447
bagder added a commit that referenced this issue May 25, 2020
Addresses the quiche side of #5447
Reported-by: Peter Wu
Closes #5450
@Lekensteyn
Copy link
Member

@Lekensteyn Lekensteyn commented May 25, 2020

I've been trying to understand why this leaks. Analysis for curl -v --http3 https://localhost follows.

Call stack (as reported by AddressSanitizer):

#19 0x7f167a9db6c5 in quiche_config_new src/ffi.rs:85
#20 0x7f167b171921 in Curl_quic_connect lib/vquic/quiche.c:168
#21 0x7f167af3db8b in singleipconnect lib/connect.c:1259
#22 0x7f167af3ebc3 in Curl_connecthost lib/connect.c:1347
#23 0x7f167afd3a9e in Curl_setup_conn lib/url.c:3871
#24 0x7f167aff6af5 in Curl_once_resolved lib/hostip.c:1083
#25 0x7f167b0693e8 in multi_runsingle lib/multi.c:1722
#26 0x7f167b06e30e in curl_multi_perform lib/multi.c:2361
#27 0x7f167af6d24e in easy_transfer lib/easy.c:603
#28 0x7f167af6db31 in easy_perform lib/easy.c:693
#29 0x7f167af6dc09 in curl_easy_perform lib/easy.c:712
  1. Curl_quic_connect allocates qs->cfg = quiche_config_new where qs = &conn->hequic[sockindex].
  2. Curl_quic_connect returns CURL_EOK (0), so the "Immediate connect fail" condition is not triggered and hence Curl_closesocket(conn, sockfd) is also not executed.
  3. singleipconnect propagates/returns success.
  4. Curl_connecthost propagates/returns success. Unrelated, but probably worth mentioning: it clears the happy eyeballs timer (Curl_expire) but that is probably not good because at that point it cannot know whether the connection succeeded or not!
  5. Curl_setup_conn propagates/returns success.
  6. Curl_once_resolved propagates/returns success.
  7. multi_runsingle changes state via multistate(data, CURLM_STATE_WAITCONNECT)
  8. multi_runsingle enters state CURLM_STATE_WAITCONNECT:

    curl/lib/multi.c

    Lines 1779 to 1804 in ad829b2

    case CURLM_STATE_WAITCONNECT:
    /* awaiting a completion of an asynch TCP connect */
    DEBUGASSERT(data->conn);
    result = Curl_is_connected(data->conn, FIRSTSOCKET, &connected);
    if(connected && !result) {
    #ifndef CURL_DISABLE_HTTP
    if((data->conn->http_proxy.proxytype == CURLPROXY_HTTPS &&
    !data->conn->bits.proxy_ssl_connected[FIRSTSOCKET]) ||
    Curl_connect_ongoing(data->conn)) {
    multistate(data, CURLM_STATE_WAITPROXYCONNECT);
    break;
    }
    #endif
    rc = CURLM_CALL_MULTI_PERFORM;
    multistate(data, data->conn->bits.tunnel_proxy?
    CURLM_STATE_WAITPROXYCONNECT:
    CURLM_STATE_SENDPROTOCONNECT);
    }
    else if(result) {
    /* failure detected */
    Curl_posttransfer(data);
    multi_done(data, result, TRUE);
    stream_error = TRUE;
    break;
    }
    break;
  9. Curl_is_connected is called, ends up calling Curl_quic_is_connected.
  10. progress_ingress prints * quiche: recv() unexpectedly returned -1 (errno: 111, socket 3) and Curl_quic_is_connected returns CURLE_RECV_ERROR. PR #5450 choses to clean qs here.
  11. Curl_is_connected reports * connect to ::1 port 443 failed: Connection refused and calls trynextip for 127.0.0.1. Another leak occurs (trynextip -> singleipconnect -> Curl_quic_connect, repeat step 1 again).
  12. trynextip ends up returning success. Even though we do not know whether the connection succeeded.
  13. Curl_is_connected returns, and we are back at multi_runsingle, and then return to curl_multi_perform, and then to easy_transfer.
  14. easy_transfer calls curl_multi_poll which returns CURLM_OK and then calls curl_multi_perform again.
  15. multi_runsingle is called again. We are in data->mstate = CURLM_STATE_WAITCONNECT, so again Curl_is_connected is called again (similar to step 7) which fails. Now it does not retry another connection, so it hits the error case (during this test I actually ran into CURLE_OPERATION_TIMEDOUT because I am cheating with a debugger, but normally I would probably end up with Connection refused on localhost):

    curl/lib/multi.c

    Lines 1797 to 1803 in ad829b2

    else if(result) {
    /* failure detected */
    Curl_posttransfer(data);
    multi_done(data, result, TRUE);
    stream_error = TRUE;
    break;
    }
  16. multi_done(data, result, TRUE) is called.
  17. Curl_disconnect is called:
    res2 = Curl_disconnect(data, conn, premature);
  18. At the end of Curl_disconnect, conn_shutdown(conn) is called. Wouldn't this be a more appropriate place to clean up conn->hequic[i] here (where i is 0 and 1 because it is apparently used for IPv4/IPv6 connection attempts)? QUIC is a transport, so this seems to be the perfect place to clean it up.

Other notes: ((struct connectdata *)conn)->handler is for HTTPS.

Forgive me if I made any mistake in my analysis, it has been very manual and I lack deep insights in the internals.

@bagder
Copy link
Member Author

@bagder bagder commented May 26, 2020

The leak happens because process_ingress returns error and when that happens, the conn->hequic[sockindex] struct contains two allocated pointers that aren't freed. #5450 fixes that.

At the end of Curl_disconnect, conn_shutdown(conn) is called. Wouldn't this be a more appropriate place to clean up conn->hequic[i] here (where i is 0 and 1 because it is apparently used for IPv4/IPv6 connection attempts)? QUIC is a transport, so this seems to be the perfect place to clean it up.

Sure, that would be another way to do it but that would require us to add another library-internal function for quic that we don't have now, so I figured it was simpler to fix this in the functions we already have.

bagder added a commit that referenced this issue May 26, 2020
Addresses the quiche side of #5447
Reported-by: Peter Wu
Closes #5450
@bagder bagder closed this in 96a822f May 26, 2020
@Lekensteyn
Copy link
Member

@Lekensteyn Lekensteyn commented May 26, 2020

Sure, that would be another way to do it but that would require us to add another library-internal function for quic that we don't have now, so I figured it was simpler to fix this in the functions we already have.

From an API design perspective it seems more logical to me to do it in conn_shutdown since it matches other transport-related functionality. It's quite hard at the moment to reason about the flow at the moment, if for some reason another call is added later it could run into a use-after-free. I hope it does not come back and bite us later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

2 participants
You can’t perform that action at this time.