-
-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RequestServer: Spin even _faster_ #24513
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only looked at "wip" commit. To be honest, I still think we should learn how to saturate one thread before doing multithreading—I hope we are not yet at the point where objective technical improvements have to have a user-facing impact to be merged.
@@ -116,4 +146,15 @@ auto run_async_in_new_event_loop(T&& function) | |||
return coro.await_resume(); | |||
} | |||
|
|||
template<typename T> | |||
requires(IsSpecializationOf<InvokeResult<T&>, Coroutine>) | |||
auto run_async_in_current_event_loop(T&& function) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's make this foot-gun to at least do a deferred invoke of the provided function. Or even better, steal "LibCore: Allow scheduling events for the current iteration of event loop" from my coro-decompress branch and schedule the callback for the current iteration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm...its only user is Inside a deferred_invoke, sure, I'll make it do a deferred_invoke.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope, that can't be a deferred_invoke, the whole point is to wait for the damn thing.
To begin with, I don't see why you think this is any bigger a footgun than the other one...?
Further propagating coro is a significant paradigm shift for the code it modifies, imo; one I'm not sure if I want to spread into the codebase (yet?) |
4015c99
to
dc88559
Compare
f3c9e70
to
a797df7
Compare
New: More fasterer, now down to ~660ms. |
10259b9
to
219f784
Compare
@@ -75,8 +76,8 @@ ErrorOr<void> MessageBuffer::transfer_message(Core::LocalSocket& socket) | |||
if (auto error = maybe_nwritten.release_error(); error.is_errno()) { | |||
// FIXME: This is a hacky way to at least not crash on large messages | |||
// The limit of 100 writes is arbitrary, and there to prevent indefinite spinning on the EventLoop | |||
if (error.code() == EAGAIN && writes_done < 100) { | |||
sched_yield(); | |||
if ((error.code() == EAGAIN || error.code() == EMSGSIZE) && writes_done < 100) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unix(7) and/or pipe(7) doesn't document EMSGSIZE error. Do you have a reproducer for it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it happens on yakos
[EMSGSIZE] The socket requires that message be sent atomically, and the size of the message to be sent makes this impossible. IOV_MAX.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is present on linux as well.
If the message is too long to pass atomically through the underlying protocol, the error EMSGSIZE is returned, and the message is not transmitted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, okay. I don't think retrying because of EMSGSIZE will be a good strategy but whatever 🤷
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
at least macos returns EMSGSIZE because of buffer limitations, in this case it was doing that for a message of size 16.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should split the message up instead, but we might as well just drop the atomic message requirement (if we even have that) instead
These were showing up on profiles as quite hot (>5%), we can avoid all the unnecessary assertions by doing them once in advance and using pointers for the rest of the function. This makes AES-GCM about 6% faster.
This simple change makes AES-GCM 5% faster :^)
This formatter just prints the object out as a pointer.
Previously this just yielded the time slice, which blocked the event loop until the other side read some data from the socket. This "fixed" impl is still equally bad, but at least it doesn't block the event loop.
This allows RS to start connections in parallel without actively waiting for any possible handshakes. Doing so gives us a nearly-3x speedup on the average connection latency.
We no longer use blocking reads and writes, yielding a nice performance boost and a significantly cleaner (and more readable) implementation :^)
Through the power of coroutines, we shall
rule the worldmake RS (almost; t.hanks t.hakis!) as fast as firefox in handling requests! (🔜 stats)In this PR:
adopt_coroutine
to create orphans!)NOT in this PR:
(rough) perf test
Setup
49 small files, accessible through
sub{0..9}.cxbyte.me/loadtest/N.bin
.50
fetch
requests, all concurrent, to a preselected random subdomain (one failing to ensure clean failures).Accessible on
https://cxbyte.me/loadtest/test.html
.(rough) Results
WARNING: Not statistically valid, only individual tests have been done.
TODO: