Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use SSL_poll to determine writeability of QUIC streams #15909

Closed
wants to merge 6 commits into from

Conversation

nhorman
Copy link
Contributor

@nhorman nhorman commented Jan 3, 2025

This discussion:
openssl/openssl#23339 (comment)

Specifically item number 2 (Send Blocking) was raised by the curl team, noting that SSL_want_write returning false was not a good indicator of when a stream is writeable. The suggestion in that discussion was to use SSL_poll with an SSL_POLL_EVENT_W flag instead, as that is a proper indication of when an SSL_object will allow writing without blocking.

While ssl_want_write updates its state based on the last error encountered (implying a need to retry an operation to update the last_error state again), SSL_poll checks stream buffer status during the call, giving it more up to date information on request. This is the method used by our guide demos (quic-hq-interop specifically), and it works well.

This change has been run through the curl test suite, and shown to pass all tests. However, given the initial problem description I'm not sure if there is a test case that explicitly checks for blocking and unblocking of streams. As such some additional testing may be warranted.

This discussion:
openssl/openssl#23339 (comment)

Specifically item number 2 (Send Blocking) was raised by the curl team,
noting that SSL_want_write returning false was not a good indicator of
when a stream is writeable.  The suggestion in that discussion was to
use SSL_poll with an SSL_POLL_EVENT_W flag instead, as that is a proper
indication of when an SSL_object will allow writing without blocking.

While ssl_want_write updates its state based on the last error
encountered (implying a need to retry an operation to update the last_error
state again), SSL_poll checks stream buffer status during the call,
giving it more up to date information on request.  This is the method
used by our guide demos (quic-hq-interop specifically), and it works
well.

This change has been run through the curl test suite, and shown to pass
all tests.  However, given  the initial problem description I'm not sure
if there is a test case that explicitly checks for blocking and
unblocking of streams.  As such some additional testing may be
warranted.
@github-actions github-actions bot added the HTTP/3 h3 or quic related label Jan 3, 2025
@bagder bagder requested a review from icing January 3, 2025 17:35
To avoid needless constant reallocation of heap when polling for write
blocking status, move the poll_list/curl_list to the context struct for
iterative reuse, and realloc it if the process list of Curl_easy items
is larger than the current size.  This minimizes the need for
re-allocation in the steady state
lib/vquic/curl_osslq.c Outdated Show resolved Hide resolved
lib/vquic/curl_osslq.c Outdated Show resolved Hide resolved
@testclutch
Copy link

Analysis of PR #15909 at b7df9a1f:

Test 526 failed, which has NOT been flaky recently, so there could be a real issue in this PR.

Generated by Testclutch

Copy link
Contributor

@icing icing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested the PR, the current curl master with OpenSSL 3.4.0 and ngtcp2+quictls in our scorecard with a Caddy server. I see no improvements in handling large uploads.

Upload 50 times 100MB

QUIC       Serial     Parallel
ossl+PR     50 MB/s    92 MB/s
ossl        49 MB/s    94 MB/s
ngtcp2     203 MB/s   206 MB/s

I see a slight negative trend on many, small transfers.

5000 GETs of 10KB resource, 1/25/300 in parallel

QUIC       P-1       P-25      P-300
ossl+PR    752 r/s   1371 r/s  1367 r/s
ossl       813 r/s   1478 r/s  1417 r/s
ngtcp2     903 r/s   1869 r/s  1956 r/s

Also, worthy of note: openssl quic uses a lot of memory when large transfers are done one after the other, but not so in parallel. This is unrelated to the PR, just a general problem with OpenSSL's QUIC:

Upload 50 times 100MB

Memory      Serial     Parallel
ossl+PR     291MB       24MB
ossl        286MB       23MB
ngtcp2       11MB       17MB

Conclusion: there might be scenarios where this PR is beneficial. In the tests measured, there is not a significant drop. So, I am neutral to adopting this PR or not as a first step into a better implementation.


for(idx_count = 0; idx_count < poll_count && result_count > 0;
idx_count++) {
if(ctx->poll_items[idx_count].revents && SSL_POLL_EVENT_W) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably be the bit test & instead of the logical and &&.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, yes, it definitely should, thank you.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, can you share the details of your setup, so I can try to reproduce here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you use configure, add --with-test-caddy=<path-to-caddy>, build curl, then run:

>  (cd tests && python3 http/scorecard.py h3)

There might be some python packages you'd need, like pytest and friends.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the test seems somewhat busted (at least on my Fedora 41 installation). Building works fine, but running:

python3 http/scorecard.py h3
ERROR:testenv.httpd:failed to start httpd: ExecResult[code=1, exception=None, args=['/usr/sbin/apachectl', '-d', '/home/nhorman/git/curl/tests/http/gen/apache', '-f', '/home/nhorman/git/curl/tests/http/gen/apache/conf/test.conf', '-k', 'start'], stdout=['Passing arguments to httpd using apachectl is no longer supported.', 'You can only start/stop/restart httpd using this script.', 'To pass extra arguments to httpd, see the httpd.service(8)', 'man page.'], stderr=[]]
CRITICAL:testenv.httpd:stopping httpd failed: ExecResult[code=1, exception=None, args=['/usr/sbin/apachectl', '-d', '/home/nhorman/git/curl/tests/http/gen/apache', '-f', '/home/nhorman/git/curl/tests/http/gen/apache/conf/test.conf', '-k', 'stop'], stdout=['Passing arguments to httpd using apachectl is no longer supported.', 'You can only start/stop/restart httpd using this script.', 'To pass extra arguments to httpd, see the httpd.service(8)', 'man page.'], stderr=[]]
Traceback (most recent call last):
  File "/home/nhorman/git/curl/tests/http/scorecard.py", line 859, in <module>
    main()
    ~~~~^^
  File "/home/nhorman/git/curl/tests/http/scorecard.py", line 805, in main
    assert httpd.start()
           ~~~~~~~~~~~^^
AssertionError

looks like apache 2.4.62 doesn't support passing the options your python script wants to use. Suggestions?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I have no suggestions of Fedora decides to change the apachectl scripts that the Apache httpd project ships. I have no Fedora installation. The scorecard.py works for me on debian, for example.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nhorman I made #16000 that no longer uses apachectldue to its borkedness on several distros. Instead it now uses httpd or apache2 directly. That should allow you to run scorecard on your system, I believe.

@bagder
Copy link
Member

bagder commented Jan 7, 2025

It is certainly unfortunate that this PR seems to rather make certain operations slower. I feared that, as we concluded the poll() API to scale badly already when we discussed the c10k problem, over twenty years ago.

I still believe this PR is a step forward in that it avoids busy-loop risks. Busy-loops that in some edge cases probably can be troublesome.

@bagder bagder closed this in 957eb24 Jan 8, 2025
@bagder
Copy link
Member

bagder commented Jan 8, 2025

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
HTTP/3 h3 or quic related
Development

Successfully merging this pull request may close these issues.

5 participants