perf: improve PHP thread management #898

dunglas · 2024-07-03T13:11:11Z

Alternative to #837.

The main idea is to start a fixed number of threads and handle requests in these threads using an infinite loop.
This fixes the backpressure problem identified in #836, simplifies the code, and removes the need for the C-Thread-Pool library.

Fixes #836. Closes #837.

@withinboredom could you run your benchmarks on this PR? I'm curious about the results.

withinboredom · 2024-07-03T16:21:51Z

I will give them a go.

I haven't looked at the PR here, but from my experiments, most of the time was spent initializing PHP. I was looking into making "workers" that implement the most basic behavior (totally reset state after the request since nothing needs to be saved)—much like how FPM works. I have a rough PoC that doesn't quite work yet, but I don't feel like it is too far from working.

dunglas · 2024-07-04T07:53:21Z

@withinboredom if we speak about the same thing, it's what this patch does!

withinboredom · 2024-07-04T08:15:02Z

Just got the latest of this branch:

workers: 22,660 rps
non-workers: 2,118 rps

withinboredom · 2024-07-04T08:17:47Z

This is basically what I am running:

k6 run script.js --no-connection-reuse -d 15s

where script.js is essentially the load test in the repo and I configure frankenphp with/without workers and run it.

dunglas · 2024-07-04T08:20:25Z

Do you have the numbers without this patch?

withinboredom · 2024-07-04T08:36:33Z

from main:

workers: 22,535 rps
non-workers: 2,175 rps

withinboredom · 2024-07-04T08:39:40Z

IIRC, the main issue is that in non-worker mode, it spawns a totally new thread, inits PHP, inits the request, executes the request, shutsdown PHP full, then kills the thread. What we want: new request inits the request in PHP, executes the request, then just shutdown the request and fully reset state.

dunglas · 2024-07-04T08:51:35Z

I may be missing something, but it's already what FrankenPHP is doing (even without this patch).
This patch just moves the call to Go to fetch a pending request from the main manager thread to each thread handling PHP requests.

withinboredom · 2024-07-04T09:21:54Z

It's what it was originally doing, for sure, but it looks like an accidental change.

dunglas · 2024-07-04T13:16:20Z

I'm not sure to follow @withinboredom 😅

withinboredom · 2024-07-05T07:05:34Z

Ah, nope, that wasn't actually the problem! I remember that being something I originally thought was the issue... I just checked out my fix/slow-cgi-mode branch and ran the load test. OK, so here's the actual problem that this branch shows.

Example output:

# we should expect to see interleaving here, but it is strictly sequential
2024/07/05 06:59:25.150 WARN    Executing request       {"total": 32, "elapsed": 0.000000697}
2024/07/05 06:59:25.150 WARN    finished execution      {"total": 31, "elapsed": 0.015291472}
2024/07/05 06:59:25.150 WARN    Executing request       {"total": 32, "elapsed": 0.000000786}
2024/07/05 06:59:25.151 WARN    finished execution      {"total": 31, "elapsed": 0.015279653}
2024/07/05 06:59:25.151 WARN    Executing request       {"total": 32, "elapsed": 0.000000712}
2024/07/05 06:59:25.151 WARN    finished execution      {"total": 31, "elapsed": 0.015311099}

Essentially, in worker mode, request handling is multi-threaded. Each worker processes a request independently, so if you have 16 workers, there are 16 threads processing requests. With cgi-mode, there is only a single-threaded consumer of the request channel, which creates a bottleneck, no matter what you set num_threads to.

In other words, IIRC, there is only a single manager thread calling go_fetch_request, then distributing that work to multiple threads. The way to fix the perf issue is to merge go_fetch_request with go_fetch_and_execute, add a method for "waiting until done" such that the C code looks more like this:

for (num treads)
  thpool_add_work(thpool, go_fetch_and_execute, NULL);

go_wait_for_done(); // only returns once we are shutting down

dunglas · 2024-07-05T07:17:42Z

But this patch should change exactly that! It's not the case? I struggle to show CPU usage per thread on Mac (I'm not even sure that it's possible). I'll use a Linux box to see 😅

withinboredom · 2024-07-05T07:21:20Z

Added to my original comment above, which might help explain more about what I meant.

dunglas · 2024-07-05T09:11:08Z

@withinboredom In this patch, I moved the call to go_fetch_request from the single manager thread to each consumer thread. This should fix this problem. Could you tell me if this works on your box?

withinboredom · 2024-07-05T16:36:48Z

Looks like it may do the trick...

And it appears to. I threw it up on a wordpress site:

nginx + fpm (badly tuned fwiw): 860 rps
this branch: 2551 rps
main: 1831 rps

So, it seems to be g2g

dunglas · 2024-07-05T16:40:52Z

Thanks for the report! Looks great. Remaining issue: find why the tests are failing (they aren't locally)!

dunglas · 2024-07-06T20:08:06Z

Test failures look to affect only workers. We likely have a race condition somewhere.

withinboredom · 2024-07-06T22:15:13Z

I can't get it to fail locally on AMD or INTEL x64. If I had to hazard a guess, there is likely a race condition with sending the worker script file to the C side (i.e., it never starts the workers). An easy way to test that would be to add some output to the worker script so you can see that it started up.

testdata/_executor.php

worker.go

dunglas · 2024-07-08T22:25:01Z

I finally found the problem: frankenphp_server_context.worker_ready wasn't explicitly initialized.

It's currently not a problem (even if a bad practice) because we used calloc(). However, in this patch, I switched from calloc() to malloc() (micro-optiom that is maybe useless btw). And obviously, as worker_ready wasn't explicitly initialized, sometimes the default value was not 0.

dunglas requested a review from withinboredom July 3, 2024 13:11

dunglas changed the title ~~refactor: prevent a useless allocation~~ perf: improve PHP thread management Jul 3, 2024

dunglas force-pushed the refactor/php-threads branch from 83914ed to c5676af Compare July 4, 2024 08:03

dunglas force-pushed the refactor/php-threads branch from c5676af to a2908e9 Compare July 5, 2024 09:44

withinboredom approved these changes Jul 5, 2024

View reviewed changes

withinboredom reviewed Jul 7, 2024

View reviewed changes

testdata/_executor.php Outdated Show resolved Hide resolved

dunglas added 6 commits July 8, 2024 16:21

perf: improve PHP thread management

9ab0a55

cleanup

df95da5

free

e329e5f

malloc instead of calloc

d88f33e

cs

fc1c7b1

feat: set thread name

9343850

dunglas added 14 commits July 8, 2024 16:21

thread name

698a542

cs

af1f203

buffer

e325b25

more buffer

e8d8868

debug, wip

584c131

debug

7235fc7

push

7dff659

debug

0136657

debug

58febaf

test

eafa2f6

test

c47bfbe

test

4b15e1e

test

4fe529e

debug

284ee23

dunglas force-pushed the refactor/php-threads branch from 1b1e096 to 284ee23 Compare July 8, 2024 14:21

dunglas added 2 commits July 8, 2024 17:20

debug

062e0f8

debug

d477a68

dunglas commented Jul 8, 2024

View reviewed changes

worker.go Outdated Show resolved Hide resolved

dunglas added 3 commits July 8, 2024 19:32

debug

76ce476

debug again

71f3e0e

fix and clean

249a92a

dunglas merged commit 0500ebc into main Jul 9, 2024
43 checks passed

dunglas deleted the refactor/php-threads branch July 9, 2024 07:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: improve PHP thread management #898

perf: improve PHP thread management #898

dunglas commented Jul 3, 2024

withinboredom commented Jul 3, 2024

dunglas commented Jul 4, 2024

withinboredom commented Jul 4, 2024

withinboredom commented Jul 4, 2024

dunglas commented Jul 4, 2024

withinboredom commented Jul 4, 2024

withinboredom commented Jul 4, 2024

dunglas commented Jul 4, 2024 •

edited

Loading

withinboredom commented Jul 4, 2024

dunglas commented Jul 4, 2024

withinboredom commented Jul 5, 2024 •

edited

Loading

dunglas commented Jul 5, 2024

withinboredom commented Jul 5, 2024

dunglas commented Jul 5, 2024

withinboredom commented Jul 5, 2024 •

edited

Loading

dunglas commented Jul 5, 2024

dunglas commented Jul 6, 2024

withinboredom commented Jul 6, 2024

dunglas commented Jul 8, 2024

perf: improve PHP thread management #898

perf: improve PHP thread management #898

Conversation

dunglas commented Jul 3, 2024

withinboredom commented Jul 3, 2024

dunglas commented Jul 4, 2024

withinboredom commented Jul 4, 2024

withinboredom commented Jul 4, 2024

dunglas commented Jul 4, 2024

withinboredom commented Jul 4, 2024

withinboredom commented Jul 4, 2024

dunglas commented Jul 4, 2024 • edited Loading

withinboredom commented Jul 4, 2024

dunglas commented Jul 4, 2024

withinboredom commented Jul 5, 2024 • edited Loading

dunglas commented Jul 5, 2024

withinboredom commented Jul 5, 2024

dunglas commented Jul 5, 2024

withinboredom commented Jul 5, 2024 • edited Loading

dunglas commented Jul 5, 2024

dunglas commented Jul 6, 2024

withinboredom commented Jul 6, 2024

dunglas commented Jul 8, 2024

dunglas commented Jul 4, 2024 •

edited

Loading

withinboredom commented Jul 5, 2024 •

edited

Loading

withinboredom commented Jul 5, 2024 •

edited

Loading