RequestServer: Spin even _faster_ #24513

alimpfard · 2024-06-01T18:18:50Z

Through the power of coroutines, we shall ~~rule the world~~ make RS (almost; t.hanks t.hakis!) as fast as firefox in handling requests! (🔜 stats)

In this PR:

Hacky garbage code!
Revert Revert!
Starring random LibCrypto optimisation!
Threading woes multipled by async woes!
Bad parenting practices! (adopt_coroutine to create orphans!)

NOT in this PR:

Bugs
Good syntax
CxByte's sanity.

(rough) perf test

Setup

49 small files, accessible through sub{0..9}.cxbyte.me/loadtest/N.bin.
50 fetch requests, all concurrent, to a preselected random subdomain (one failing to ensure clean failures).

Accessible on https://cxbyte.me/loadtest/test.html.

(rough) Results

WARNING: Not statistically valid, only individual tests have been done.

	Cold Cache (ms)	Hot Cache (ms)
RS (master)	1356	846
RS (this PR)	863	813
Firefox	625	423

TODO:

better statistics
commit cleanup
worse code

DanShaders

Only looked at "wip" commit. To be honest, I still think we should learn how to saturate one thread before doing multithreading—I hope we are not yet at the point where objective technical improvements have to have a user-facing impact to be merged.

Userland/Libraries/LibCore/EventLoop.h

DanShaders · 2024-06-01T18:59:30Z

Userland/Libraries/LibCore/EventLoop.h

@@ -116,4 +146,15 @@ auto run_async_in_new_event_loop(T&& function)
    return coro.await_resume();
 }

+template<typename T>
+requires(IsSpecializationOf<InvokeResult<T&>, Coroutine>)
+auto run_async_in_current_event_loop(T&& function)


Let's make this foot-gun to at least do a deferred invoke of the provided function. Or even better, steal "LibCore: Allow scheduling events for the current iteration of event loop" from my coro-decompress branch and schedule the callback for the current iteration.

hmm...its only user is Inside a deferred_invoke, sure, I'll make it do a deferred_invoke.

Nope, that can't be a deferred_invoke, the whole point is to wait for the damn thing.

To begin with, I don't see why you think this is any bigger a footgun than the other one...?

Userland/Libraries/LibTLS/Socket.cpp

Userland/Services/RequestServer/ConnectionCache.h

alimpfard · 2024-06-01T19:45:00Z

learn how to saturate one thread before doing multithreading

Further propagating coro is a significant paradigm shift for the code it modifies, imo; one I'm not sure if I want to spread into the codebase (yet?)

alimpfard · 2024-06-11T15:40:31Z

New: More fasterer, now down to ~660ms.

Userland/Libraries/LibCore/File.h

AK/Coroutine.h

DanShaders · 2024-06-14T02:39:35Z

Userland/Libraries/LibIPC/Message.cpp

@@ -75,8 +76,8 @@ ErrorOr<void> MessageBuffer::transfer_message(Core::LocalSocket& socket)
            if (auto error = maybe_nwritten.release_error(); error.is_errno()) {
                // FIXME: This is a hacky way to at least not crash on large messages
                // The limit of 100 writes is arbitrary, and there to prevent indefinite spinning on the EventLoop
-                if (error.code() == EAGAIN && writes_done < 100) {
-                    sched_yield();
+                if ((error.code() == EAGAIN || error.code() == EMSGSIZE) && writes_done < 100) {


unix(7) and/or pipe(7) doesn't document EMSGSIZE error. Do you have a reproducer for it?

Yeah, it happens on yakos

[EMSGSIZE] The socket requires that message be sent atomically, and the size of the message to be sent makes this impossible. IOV_MAX.

This is present on linux as well.

If the message is too long to pass atomically through the underlying protocol, the error EMSGSIZE is returned, and the message is not transmitted.

Oh, okay. I don't think retrying because of EMSGSIZE will be a good strategy but whatever 🤷

at least macos returns EMSGSIZE because of buffer limitations, in this case it was doing that for a message of size 16.

Maybe we should split the message up instead, but we might as well just drop the atomic message requirement (if we even have that) instead

These were showing up on profiles as quite hot (>5%), we can avoid all the unnecessary assertions by doing them once in advance and using pointers for the rest of the function. This makes AES-GCM about 6% faster.

This simple change makes AES-GCM 5% faster :^)

This formatter just prints the object out as a pointer.

Previously this just yielded the time slice, which blocked the event loop until the other side read some data from the socket. This "fixed" impl is still equally bad, but at least it doesn't block the event loop.

This allows RS to start connections in parallel without actively waiting for any possible handshakes. Doing so gives us a nearly-3x speedup on the average connection latency.

We no longer use blocking reads and writes, yielding a nice performance boost and a significantly cleaner (and more readable) implementation :^)

DanShaders reviewed Jun 1, 2024

View reviewed changes

alimpfard force-pushed the rs-coro branch 8 times, most recently from 4015c99 to dc88559 Compare June 5, 2024 08:51

alimpfard marked this pull request as ready for review June 5, 2024 10:11

alimpfard requested a review from timschumi as a code owner June 5, 2024 10:11

github-actions bot added the 👀 pr-needs-review PR needs review from a maintainer or community member label Jun 5, 2024

alimpfard force-pushed the rs-coro branch 4 times, most recently from f3c9e70 to a797df7 Compare June 11, 2024 15:39

alimpfard commented Jun 11, 2024

View reviewed changes

Userland/Libraries/LibCore/File.h Outdated Show resolved Hide resolved

alimpfard force-pushed the rs-coro branch 4 times, most recently from 10259b9 to 219f784 Compare June 13, 2024 15:49

DanShaders reviewed Jun 14, 2024

View reviewed changes

AK/Coroutine.h Outdated Show resolved Hide resolved

DanShaders reviewed Jun 14, 2024

View reviewed changes

alimpfard added 5 commits June 19, 2024 13:16

LibCrypto: Remove select individual Vector/Span::at() calls

dcc89d5

These were showing up on profiles as quite hot (>5%), we can avoid all the unnecessary assertions by doing them once in advance and using pointers for the rest of the function. This makes AES-GCM about 6% faster.

LibCrypto: Milk some more performance out of galois_multiply()

9dd626b

This simple change makes AES-GCM 5% faster :^)

AK: Add a formatter for OwnPtr<T>

9b96b3b

This formatter just prints the object out as a pointer.

LibCore: Add a couple coroutine-related helper functions

6ca344b

LibIPC: Actually yield to the event loop when EMSGSIZE or EAGAIN

c674ce9

Previously this just yielded the time slice, which blocked the event loop until the other side read some data from the socket. This "fixed" impl is still equally bad, but at least it doesn't block the event loop.

alimpfard added 6 commits June 19, 2024 13:16

RequestServer+LibCore+LibTLS: Use coroutines for socket connection

f7c2d88

This allows RS to start connections in parallel without actively waiting for any possible handshakes. Doing so gives us a nearly-3x speedup on the average connection latency.

Meta: Update CI emsdk to latest

37667ad

Contrib/VideoPlayerSDL: Disable on emscripten

43d392b

LibJS: Make emsdk's clang happy

3a72304

LibCore: Fix some thread-related memory/object leaks

2fa268a

LibHTTP+LibCore+RequestServer: Use async streams for HTTP

7e775a1

We no longer use blocking reads and writes, yielding a nice performance boost and a significantly cleaner (and more readable) implementation :^)

alimpfard force-pushed the rs-coro branch from 219f784 to 7e775a1 Compare June 19, 2024 11:18

alimpfard merged commit a711bb6 into SerenityOS:master Jun 19, 2024
11 checks passed

github-actions bot removed the 👀 pr-needs-review PR needs review from a maintainer or community member label Jun 19, 2024

ADKaster mentioned this pull request Jun 19, 2024

LibCore+LibIPC: Cherry-pick some LibCore and LibIPC fixes from serenity LadybirdBrowser/ladybird#213

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RequestServer: Spin even _faster_ #24513

RequestServer: Spin even _faster_ #24513

alimpfard commented Jun 1, 2024 •

edited

Loading

DanShaders left a comment

DanShaders Jun 1, 2024

alimpfard Jun 1, 2024

alimpfard Jun 4, 2024

alimpfard commented Jun 1, 2024

alimpfard commented Jun 11, 2024

DanShaders Jun 14, 2024 •

edited

Loading

alimpfard Jun 14, 2024

alimpfard Jun 14, 2024

DanShaders Jun 14, 2024

alimpfard Jun 14, 2024

alimpfard Jun 14, 2024 •

edited

Loading

RequestServer: Spin even _faster_ #24513

RequestServer: Spin even _faster_ #24513

Conversation

alimpfard commented Jun 1, 2024 • edited Loading

(rough) perf test

Setup

(rough) Results

DanShaders left a comment

Choose a reason for hiding this comment

DanShaders Jun 1, 2024

Choose a reason for hiding this comment

alimpfard Jun 1, 2024

Choose a reason for hiding this comment

alimpfard Jun 4, 2024

Choose a reason for hiding this comment

alimpfard commented Jun 1, 2024

alimpfard commented Jun 11, 2024

DanShaders Jun 14, 2024 • edited Loading

Choose a reason for hiding this comment

alimpfard Jun 14, 2024

Choose a reason for hiding this comment

alimpfard Jun 14, 2024

Choose a reason for hiding this comment

DanShaders Jun 14, 2024

Choose a reason for hiding this comment

alimpfard Jun 14, 2024

Choose a reason for hiding this comment

alimpfard Jun 14, 2024 • edited Loading

Choose a reason for hiding this comment

alimpfard commented Jun 1, 2024 •

edited

Loading

DanShaders Jun 14, 2024 •

edited

Loading

alimpfard Jun 14, 2024 •

edited

Loading