Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: improve PHP thread management #898

Merged
merged 25 commits into from
Jul 9, 2024
Merged

perf: improve PHP thread management #898

merged 25 commits into from
Jul 9, 2024

Conversation

dunglas
Copy link
Owner

@dunglas dunglas commented Jul 3, 2024

Alternative to #837.

The main idea is to start a fixed number of threads and handle requests in these threads using an infinite loop.
This fixes the backpressure problem identified in #836, simplifies the code, and removes the need for the C-Thread-Pool library.

Fixes #836. Closes #837.

@withinboredom could you run your benchmarks on this PR? I'm curious about the results.

@dunglas dunglas requested a review from withinboredom July 3, 2024 13:11
@dunglas dunglas changed the title refactor: prevent a useless allocation perf: improve PHP thread management Jul 3, 2024
@withinboredom
Copy link
Collaborator

I will give them a go.

I haven't looked at the PR here, but from my experiments, most of the time was spent initializing PHP. I was looking into making "workers" that implement the most basic behavior (totally reset state after the request since nothing needs to be saved)—much like how FPM works. I have a rough PoC that doesn't quite work yet, but I don't feel like it is too far from working.

@dunglas
Copy link
Owner Author

dunglas commented Jul 4, 2024

@withinboredom if we speak about the same thing, it's what this patch does!

@dunglas dunglas force-pushed the refactor/php-threads branch from 83914ed to c5676af Compare July 4, 2024 08:03
@withinboredom
Copy link
Collaborator

Just got the latest of this branch:

workers: 22,660 rps
non-workers: 2,118 rps

@withinboredom
Copy link
Collaborator

This is basically what I am running:

k6 run script.js --no-connection-reuse -d 15s 

where script.js is essentially the load test in the repo and I configure frankenphp with/without workers and run it.

@dunglas
Copy link
Owner Author

dunglas commented Jul 4, 2024

Do you have the numbers without this patch?

@withinboredom
Copy link
Collaborator

from main:

workers: 22,535 rps
non-workers: 2,175 rps

@withinboredom
Copy link
Collaborator

IIRC, the main issue is that in non-worker mode, it spawns a totally new thread, inits PHP, inits the request, executes the request, shutsdown PHP full, then kills the thread. What we want: new request inits the request in PHP, executes the request, then just shutdown the request and fully reset state.

@dunglas
Copy link
Owner Author

dunglas commented Jul 4, 2024

I may be missing something, but it's already what FrankenPHP is doing (even without this patch).
This patch just moves the call to Go to fetch a pending request from the main manager thread to each thread handling PHP requests.

@withinboredom
Copy link
Collaborator

It's what it was originally doing, for sure, but it looks like an accidental change.

@dunglas
Copy link
Owner Author

dunglas commented Jul 4, 2024

I'm not sure to follow @withinboredom 😅

@withinboredom
Copy link
Collaborator

withinboredom commented Jul 5, 2024

Ah, nope, that wasn't actually the problem! I remember that being something I originally thought was the issue... I just checked out my fix/slow-cgi-mode branch and ran the load test. OK, so here's the actual problem that this branch shows.

Example output:

# we should expect to see interleaving here, but it is strictly sequential
2024/07/05 06:59:25.150 WARN    Executing request       {"total": 32, "elapsed": 0.000000697}
2024/07/05 06:59:25.150 WARN    finished execution      {"total": 31, "elapsed": 0.015291472}
2024/07/05 06:59:25.150 WARN    Executing request       {"total": 32, "elapsed": 0.000000786}
2024/07/05 06:59:25.151 WARN    finished execution      {"total": 31, "elapsed": 0.015279653}
2024/07/05 06:59:25.151 WARN    Executing request       {"total": 32, "elapsed": 0.000000712}
2024/07/05 06:59:25.151 WARN    finished execution      {"total": 31, "elapsed": 0.015311099}

Essentially, in worker mode, request handling is multi-threaded. Each worker processes a request independently, so if you have 16 workers, there are 16 threads processing requests. With cgi-mode, there is only a single-threaded consumer of the request channel, which creates a bottleneck, no matter what you set num_threads to.

In other words, IIRC, there is only a single manager thread calling go_fetch_request, then distributing that work to multiple threads. The way to fix the perf issue is to merge go_fetch_request with go_fetch_and_execute, add a method for "waiting until done" such that the C code looks more like this:

for (num treads)
  thpool_add_work(thpool, go_fetch_and_execute, NULL);

go_wait_for_done(); // only returns once we are shutting down

@dunglas
Copy link
Owner Author

dunglas commented Jul 5, 2024

But this patch should change exactly that! It's not the case? I struggle to show CPU usage per thread on Mac (I'm not even sure that it's possible). I'll use a Linux box to see 😅

@withinboredom
Copy link
Collaborator

Added to my original comment above, which might help explain more about what I meant.

@dunglas
Copy link
Owner Author

dunglas commented Jul 5, 2024

@withinboredom In this patch, I moved the call to go_fetch_request from the single manager thread to each consumer thread. This should fix this problem. Could you tell me if this works on your box?

@dunglas dunglas force-pushed the refactor/php-threads branch from c5676af to a2908e9 Compare July 5, 2024 09:44
@withinboredom
Copy link
Collaborator

withinboredom commented Jul 5, 2024

Looks like it may do the trick...

And it appears to. I threw it up on a wordpress site:

nginx + fpm (badly tuned fwiw): 860 rps
this branch: 2551 rps
main: 1831 rps

So, it seems to be g2g

@dunglas
Copy link
Owner Author

dunglas commented Jul 5, 2024

Thanks for the report! Looks great. Remaining issue: find why the tests are failing (they aren't locally)!

@dunglas
Copy link
Owner Author

dunglas commented Jul 6, 2024

Test failures look to affect only workers. We likely have a race condition somewhere.

@withinboredom
Copy link
Collaborator

I can't get it to fail locally on AMD or INTEL x64. If I had to hazard a guess, there is likely a race condition with sending the worker script file to the C side (i.e., it never starts the workers). An easy way to test that would be to add some output to the worker script so you can see that it started up.

@dunglas dunglas force-pushed the refactor/php-threads branch from 1b1e096 to 284ee23 Compare July 8, 2024 14:21
@dunglas
Copy link
Owner Author

dunglas commented Jul 8, 2024

I finally found the problem: frankenphp_server_context.worker_ready wasn't explicitly initialized.

It's currently not a problem (even if a bad practice) because we used calloc(). However, in this patch, I switched from calloc() to malloc() (micro-optiom that is maybe useless btw). And obviously, as worker_ready wasn't explicitly initialized, sometimes the default value was not 0.

@dunglas dunglas merged commit 0500ebc into main Jul 9, 2024
43 checks passed
@dunglas dunglas deleted the refactor/php-threads branch July 9, 2024 07:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Low performance in cgi-mode
2 participants