Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RequestServer: Spin even _faster_ #24513

Merged
merged 11 commits into from
Jun 19, 2024
Merged

Conversation

alimpfard
Copy link
Member

@alimpfard alimpfard commented Jun 1, 2024

Through the power of coroutines, we shall rule the world make RS (almost; t.hanks t.hakis!) as fast as firefox in handling requests! (🔜 stats)

In this PR:

  • Hacky garbage code!
  • Revert Revert!
  • Starring random LibCrypto optimisation!
  • Threading woes multipled by async woes!
  • Bad parenting practices! (adopt_coroutine to create orphans!)

NOT in this PR:

  • Bugs
  • Good syntax
  • CxByte's sanity.

(rough) perf test

Setup

49 small files, accessible through sub{0..9}.cxbyte.me/loadtest/N.bin.
50 fetch requests, all concurrent, to a preselected random subdomain (one failing to ensure clean failures).

Accessible on https://cxbyte.me/loadtest/test.html.

(rough) Results

WARNING: Not statistically valid, only individual tests have been done.

Cold Cache (ms) Hot Cache (ms)
RS (master) 1356 846
RS (this PR) 863 813
Firefox 625 423

TODO:

  • better statistics
  • commit cleanup
  • worse code

Copy link
Contributor

@DanShaders DanShaders left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only looked at "wip" commit. To be honest, I still think we should learn how to saturate one thread before doing multithreading—I hope we are not yet at the point where objective technical improvements have to have a user-facing impact to be merged.

Userland/Libraries/LibCore/EventLoop.h Outdated Show resolved Hide resolved
@@ -116,4 +146,15 @@ auto run_async_in_new_event_loop(T&& function)
return coro.await_resume();
}

template<typename T>
requires(IsSpecializationOf<InvokeResult<T&>, Coroutine>)
auto run_async_in_current_event_loop(T&& function)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make this foot-gun to at least do a deferred invoke of the provided function. Or even better, steal "LibCore: Allow scheduling events for the current iteration of event loop" from my coro-decompress branch and schedule the callback for the current iteration.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm...its only user is Inside a deferred_invoke, sure, I'll make it do a deferred_invoke.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, that can't be a deferred_invoke, the whole point is to wait for the damn thing.

To begin with, I don't see why you think this is any bigger a footgun than the other one...?

Userland/Libraries/LibTLS/Socket.cpp Outdated Show resolved Hide resolved
Userland/Services/RequestServer/ConnectionCache.h Outdated Show resolved Hide resolved
@alimpfard
Copy link
Member Author

learn how to saturate one thread before doing multithreading

Further propagating coro is a significant paradigm shift for the code it modifies, imo; one I'm not sure if I want to spread into the codebase (yet?)

@alimpfard alimpfard force-pushed the rs-coro branch 8 times, most recently from 4015c99 to dc88559 Compare June 5, 2024 08:51
@alimpfard alimpfard marked this pull request as ready for review June 5, 2024 10:11
@alimpfard alimpfard requested a review from timschumi as a code owner June 5, 2024 10:11
@github-actions github-actions bot added the 👀 pr-needs-review PR needs review from a maintainer or community member label Jun 5, 2024
@alimpfard alimpfard force-pushed the rs-coro branch 4 times, most recently from f3c9e70 to a797df7 Compare June 11, 2024 15:39
@alimpfard
Copy link
Member Author

New: More fasterer, now down to ~660ms.

@alimpfard alimpfard force-pushed the rs-coro branch 4 times, most recently from 10259b9 to 219f784 Compare June 13, 2024 15:49
AK/Coroutine.h Outdated Show resolved Hide resolved
@@ -75,8 +76,8 @@ ErrorOr<void> MessageBuffer::transfer_message(Core::LocalSocket& socket)
if (auto error = maybe_nwritten.release_error(); error.is_errno()) {
// FIXME: This is a hacky way to at least not crash on large messages
// The limit of 100 writes is arbitrary, and there to prevent indefinite spinning on the EventLoop
if (error.code() == EAGAIN && writes_done < 100) {
sched_yield();
if ((error.code() == EAGAIN || error.code() == EMSGSIZE) && writes_done < 100) {
Copy link
Contributor

@DanShaders DanShaders Jun 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unix(7) and/or pipe(7) doesn't document EMSGSIZE error. Do you have a reproducer for it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it happens on yakos

[EMSGSIZE]         The socket requires that message be sent atomically, and the size of the message to be sent makes this impossible.  IOV_MAX.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is present on linux as well.

If the message is too long to pass atomically through the underlying protocol, the error EMSGSIZE is returned, and the message is not transmitted.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, okay. I don't think retrying because of EMSGSIZE will be a good strategy but whatever 🤷

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

at least macos returns EMSGSIZE because of buffer limitations, in this case it was doing that for a message of size 16.

Copy link
Member Author

@alimpfard alimpfard Jun 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should split the message up instead, but we might as well just drop the atomic message requirement (if we even have that) instead

These were showing up on profiles as quite hot (>5%), we can avoid all
the unnecessary assertions by doing them once in advance and using
pointers for the rest of the function.
This makes AES-GCM about 6% faster.
This simple change makes AES-GCM 5% faster :^)
This formatter just prints the object out as a pointer.
Previously this just yielded the time slice, which blocked the event
loop until the other side read some data from the socket.
This "fixed" impl is still equally bad, but at least it doesn't block
the event loop.
This allows RS to start connections in parallel without actively waiting
for any possible handshakes.
Doing so gives us a nearly-3x speedup on the average connection latency.
We no longer use blocking reads and writes, yielding a nice performance
boost and a significantly cleaner (and more readable) implementation :^)
@alimpfard alimpfard merged commit a711bb6 into SerenityOS:master Jun 19, 2024
11 checks passed
@github-actions github-actions bot removed the 👀 pr-needs-review PR needs review from a maintainer or community member label Jun 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants