-
-
Notifications
You must be signed in to change notification settings - Fork 290
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf: improve PHP thread management #898
Conversation
I will give them a go. I haven't looked at the PR here, but from my experiments, most of the time was spent initializing PHP. I was looking into making "workers" that implement the most basic behavior (totally reset state after the request since nothing needs to be saved)—much like how FPM works. I have a rough PoC that doesn't quite work yet, but I don't feel like it is too far from working. |
@withinboredom if we speak about the same thing, it's what this patch does! |
83914ed
to
c5676af
Compare
Just got the latest of this branch: workers: 22,660 rps |
This is basically what I am running:
where |
Do you have the numbers without this patch? |
from workers: 22,535 rps |
IIRC, the main issue is that in non-worker mode, it spawns a totally new thread, inits PHP, inits the request, executes the request, shutsdown PHP full, then kills the thread. What we want: new request inits the request in PHP, executes the request, then just shutdown the request and fully reset state. |
I may be missing something, but it's already what FrankenPHP is doing (even without this patch). |
It's what it was originally doing, for sure, but it looks like an accidental change. |
I'm not sure to follow @withinboredom 😅 |
Ah, nope, that wasn't actually the problem! I remember that being something I originally thought was the issue... I just checked out my Example output: # we should expect to see interleaving here, but it is strictly sequential
2024/07/05 06:59:25.150 WARN Executing request {"total": 32, "elapsed": 0.000000697}
2024/07/05 06:59:25.150 WARN finished execution {"total": 31, "elapsed": 0.015291472}
2024/07/05 06:59:25.150 WARN Executing request {"total": 32, "elapsed": 0.000000786}
2024/07/05 06:59:25.151 WARN finished execution {"total": 31, "elapsed": 0.015279653}
2024/07/05 06:59:25.151 WARN Executing request {"total": 32, "elapsed": 0.000000712}
2024/07/05 06:59:25.151 WARN finished execution {"total": 31, "elapsed": 0.015311099} Essentially, in worker mode, request handling is multi-threaded. Each worker processes a request independently, so if you have 16 workers, there are 16 threads processing requests. With cgi-mode, there is only a single-threaded consumer of the request channel, which creates a bottleneck, no matter what you set In other words, IIRC, there is only a single manager thread calling
|
But this patch should change exactly that! It's not the case? I struggle to show CPU usage per thread on Mac (I'm not even sure that it's possible). I'll use a Linux box to see 😅 |
Added to my original comment above, which might help explain more about what I meant. |
@withinboredom In this patch, I moved the call to |
c5676af
to
a2908e9
Compare
Looks like it may do the trick... And it appears to. I threw it up on a wordpress site: nginx + fpm (badly tuned fwiw): 860 rps So, it seems to be g2g |
Thanks for the report! Looks great. Remaining issue: find why the tests are failing (they aren't locally)! |
Test failures look to affect only workers. We likely have a race condition somewhere. |
I can't get it to fail locally on AMD or INTEL x64. If I had to hazard a guess, there is likely a race condition with sending the worker script file to the C side (i.e., it never starts the workers). An easy way to test that would be to add some output to the worker script so you can see that it started up. |
1b1e096
to
284ee23
Compare
I finally found the problem: It's currently not a problem (even if a bad practice) because we used |
Alternative to #837.
The main idea is to start a fixed number of threads and handle requests in these threads using an infinite loop.
This fixes the backpressure problem identified in #836, simplifies the code, and removes the need for the
C-Thread-Pool
library.Fixes #836. Closes #837.
@withinboredom could you run your benchmarks on this PR? I'm curious about the results.