Restart worker after configurable number of requests #280

mathieudz · 2023-10-30T09:33:33Z

As PHP was not designed for long-running processes, memory leaks are not uncommon as they usually aren't an issue. Introducing long-running processes with tools like frankenphp, memory leaks become problematic.
In my case, my application consumes around 1 GB per hour. As mitigation I have scheduled a zero-downtime restart of the application every 8 hours. However, it'd be nice if frankenphp would provide such functionality out of the box. php pm (from where I migrated) provides such a setting.

dunglas · 2023-10-30T11:49:57Z

This can usually be handled directly in the worker script. Here is an example with Laravel Octane: https://github.com/laravel/octane/pull/764/files#diff-5891b628d054a2240185344524af788b4e613e0d24b82ee3b5a52d174ac0daf7R30

As this can easily be done userland, I would prefer to add this to the documentation instead of "bloating" the C or Go code. WDYT?

mathieudz · 2023-10-30T12:07:33Z

There's an argument to be made for either solution. On the one hand you want to keep frankenphp minimal and most PHP developers don't know go, on the other hand most applications will probably need something for mitigating memory leaks.
Since you prefer deferring it to the worker scripts, I'll have a look at implementing a similar solution in https://github.com/php-runtime/runtime/blob/main/src/frankenphp-symfony/src/Runner.php .

mathieudz · 2023-10-30T15:04:09Z

I'm testing this change: php-runtime/frankenphp-symfony@main...mathieudz:frankenphp-symfony:max-requests
.. but memory still seems to be growing.
Any idea how I can check if the environment variable is read correctly & workers are regularly restarted?

dunglas · 2023-10-30T15:08:40Z

If the debug mode is enabled (set the global debug Caddy directive), you should see this in the logs every time the service restarts: https://github.com/dunglas/frankenphp/blob/main/worker.go#L101

dunglas · 2023-10-30T15:19:14Z

@mathieudz I just tested your code locally and it appears to work (I added the env var in the .env file of my Symfony demo project). Keep in mind that by default FrankenPHP starts many workers, and the count is per worker.

dunglas · 2023-10-30T15:21:15Z

By the way, you should probably use a runtime option instead of a raw en var.

mathieudz · 2023-10-30T16:08:52Z

OK, my problem was that I provided an environment variable to my docker container instead of adding a var to .env, it works fine now.
These "runtime options" are located in index.php and thus hardcoded? I'd like to keep it configurable, to avoid having to create new images when tweaking settings.
I didn't have to enable debug mode, as the message is level error
It would be nice to be able to differentiate between intended worker termination (like this), and unintended (others). The current message "unexpectated termination" sounds like something went wrong.
Should I create a merge request for the main runtime git repository?

mathieudz · 2023-10-30T16:50:33Z

The workers get restarted regularly (e.g. every 100 requests, running with 16 workers), but the memory usage is still increasing over time ☹️

EvanEdwards · 2023-10-30T17:09:19Z

Here's two cents from a different perspective, technical considerations set aside for a moment:

I realize that memory leaks are not the only reason to restart, but a maximum memory setting for the worker would be PHP-ish. It would be in line with the various similar settings for maximum POST size, memory usage, timeout, etc.

A timeout and reset-on-max-memory-use setting "feels PHP" to me. Ideally resetting after the current workers finish, which probably points to having a threshold to terminate and a second, higher threshold to hard kill.

mathieudz · 2023-10-30T18:50:57Z

@EvanEdwards I guess that we could check the PHP memory usage with memory_get_usage after the explicit garbage collection against a configurable threshold at https://github.com/php-runtime/runtime/blob/1c6ac3dad61b35e126541190232b497ed934c170/src/frankenphp-symfony/src/Runner.php#L45
Makes me think: what value should you expect here if there are no memory leaks? Almost zero?

dunglas · 2023-10-30T21:22:31Z

@mathieudz I'm not able to reproduce locally.

Could you try the following things:

taking several profiles at different point would help a lot. To do so you can follow these steps: https://mercure.rocks/docs/hub/debug (the environment variable is CADDY_GLOBAL_OPTIONS instead of GLOBAL_OPTIONS, and if you're using Docker you'll also have to set CADDY_ADMIN to :2019)
testing if the problem also occurs in non-worker mode

Also, do you get OOM errors? If it's the not the case, that may be normal behavior, because Go (as PHP) is language using a garbage collector, but unlike PHP it will give back the memory to the system only when the system needs it (Chrome as a similar behavior for instance). A profile will help identifying if it's Go or PHP that has the memory, and if it's intended or not.

If you need help to profile, I'll be glad to help, don't hesitate to contact me privately (mail, Symfony Slack, Twitter DM...).

mathieudz · 2023-10-31T20:33:40Z

@dunglas Thanks, I'll look into it when I have some time.
In the meanwhile, my frankenphp process that runs 16 workers with each max 500 MB memory uses almost 20 GB after running for 19 hours. If this is really the normal behavior for go, then it is really bad, as it will end up hogging all memory, leaving nothing anymore for the disk cache, which is important for the PostgreSQL on the same server.
Also, I don't want to experiment with OOM as this is running on production. I haven't reproduced this on other environments yet.
This is total memory usage over 19 hours, 18 hours since last restart of frankenphp:

withinboredom · 2023-11-01T00:09:54Z

I'm running two instances. I restarted one to see what the memory difference was. Fresh-ish: 23mb (already up to 178mb serving very low traffic), running 4 days: 350mb. I'm not sure how to debug an already running instance of frankenphp. From what I can tell by dumping memory from /proc/PID/mem, inspecting it, and looking at actual memory usage... I think we might have a memory leak via not cleaning up something, somewhere, properly.

Ran a simple script to look for non-zero bytes in /proc/PID/mem, based on non-file mapped allocations in /proc/PID/maps:

54/178 mb used (fresh)

106/485 mb used (old) -- yes it grew by over a hundred megs in the last 20 mins...

Something fishy is going on.

dunglas · 2023-11-01T09:23:01Z

@withinboredom would you be able to do a profile (see instructions above)? Also, are you using the worker mode?

dunglas · 2023-11-01T09:27:18Z

If the problem is in C, we can do something like this to see what's going on: https://kirshatrov.com/posts/finding-memory-leak-in-cgo

mathieudz · 2023-11-01T12:10:22Z

@dunglas See attached zip file for the results of http://localhost:2019/debug/pprof/heap of my production environment.
heap.zip

After the start
After 15 minutes
After 30 minutes

KennedyTedesco · 2023-11-01T14:16:56Z

Just a side note: I feel that cgi.populateEnv() might be a good place to look for a bottleneck. Sadly, I don't have any Go experience, but I'll see if I can find anything there.

withinboredom · 2023-11-01T16:42:08Z

@dunglas -- these are docker containers, not worker mode. I'll see if I can get one running in debug mode. FWIW, I will probably be able to change the env while running. I assume Go knows that env vars are mutable and non-thread-safe... but there's one way to find out :evil-laugh:

dunglas · 2023-11-16T23:46:54Z

We could try this to detect the leak, if it hasn't been fixed in RC3: https://clang.llvm.org/docs/LeakSanitizer.html

mathieudz · 2023-11-18T17:50:16Z

Since the memory is still leaking with the memory patches of a few days ago (it crashing when reaching ulimit limits), I attach the pprof dumps + a history of memory usage (vmem). It's jumping up and down, but still a very clear linear long-term evolution is visible.
Frankenphp runs in worker mode with 16 workers. Workers are terminated after 10,000 requests.
heap20231118.zip

dunglas · 2023-11-18T18:36:27Z

Thanks for the profiles. Could you also sample the C heap using https://github.com/iovisor/bcc/blob/master/tools/memleak.py please?

So we'll see where the memory is used.

dunglas · 2023-11-19T13:09:10Z

@mathieudz According to the profiles, you're still running a version that is not the latest RC. The populateEnv() function doesn't exist anymore.

mathieudz · 2023-11-22T21:01:44Z

I have updated my docker image again.
Some observations after running ~ 2 hours:
Memory (sys_bytes) according to metrics in prometheus

It seems to correlate with the memory usage according to top (vmem). The difference seems to fluctuate around 3.6 GB (top shows on average 3.6 GB more than prometheus). So there must be some memory that is not accounted for by the go metrics in prometheus. Maybe the memory allocation by PHP. Anyway, since the difference is rather stable, it means that the memory leak fully accounted for by the metrics shown here above.
Then I have also run memleak.py for those two hours. "leaks" go up and down, but never over 1 GB. A few minutes after starting I saw 800 MB, but the last time (see attachment) it didn't even exceed 400 MB. So no memory leak measured there?
Lastly we have the pprof profile, which only shows a memory usage of ~ 800 MB, which is only half of what the metrics in prometheus show. This time no populateEnv anymore, so I should be on a more recent version.
memleak-20231122-2.log
pprof.adhoc.20231122.zip

See dunglas/frankenphp#280.

mathieudz · 2023-12-15T13:42:46Z

@dunglas It's OK for me to close this one as the subject of the issue is solved in the runtime now and keep the discussion about memory leaks in #366 . Let me know if you want more info from my side.

dunglas · 2023-12-15T15:41:10Z

Let's do that then. Thank you very much for your help, it has been very helpful, and I hope that we'll soon fix the leak too.

dunglas mentioned this issue Oct 30, 2023

docs: improve workers docs #281

Merged

dunglas closed this as completed in #281 Oct 30, 2023

dunglas reopened this Oct 30, 2023

dunglas mentioned this issue Nov 22, 2023

frankenphp: add frankenphp_loop_max option php-runtime/runtime#156

Merged

dunglas mentioned this issue Dec 11, 2023

Memory leak #366

Closed

Nyholm pushed a commit to php-runtime/runtime that referenced this issue Dec 12, 2023

frankenphp: add frankenphp_loop_max option (#156)

1620873

See dunglas/frankenphp#280.

Nyholm pushed a commit to php-runtime/frankenphp-symfony that referenced this issue Dec 12, 2023

frankenphp: add frankenphp_loop_max option (#156)

b3bc258

See dunglas/frankenphp#280.

dunglas added bug Something isn't working documentation Improvements or additions to documentation labels Dec 14, 2023

dunglas closed this as completed Dec 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Restart worker after configurable number of requests #280

Restart worker after configurable number of requests #280

mathieudz commented Oct 30, 2023

dunglas commented Oct 30, 2023

mathieudz commented Oct 30, 2023

mathieudz commented Oct 30, 2023

dunglas commented Oct 30, 2023

dunglas commented Oct 30, 2023

dunglas commented Oct 30, 2023

mathieudz commented Oct 30, 2023 •

edited

Loading

mathieudz commented Oct 30, 2023

EvanEdwards commented Oct 30, 2023

mathieudz commented Oct 30, 2023

dunglas commented Oct 30, 2023

mathieudz commented Oct 31, 2023

withinboredom commented Nov 1, 2023

dunglas commented Nov 1, 2023

dunglas commented Nov 1, 2023

mathieudz commented Nov 1, 2023

KennedyTedesco commented Nov 1, 2023 •

edited

Loading

withinboredom commented Nov 1, 2023

dunglas commented Nov 16, 2023 •

edited

Loading

mathieudz commented Nov 18, 2023 •

edited

Loading

dunglas commented Nov 18, 2023

dunglas commented Nov 19, 2023

mathieudz commented Nov 22, 2023 •

edited

Loading

mathieudz commented Dec 15, 2023

dunglas commented Dec 15, 2023

Restart worker after configurable number of requests #280

Restart worker after configurable number of requests #280

Comments

mathieudz commented Oct 30, 2023

dunglas commented Oct 30, 2023

mathieudz commented Oct 30, 2023

mathieudz commented Oct 30, 2023

dunglas commented Oct 30, 2023

dunglas commented Oct 30, 2023

dunglas commented Oct 30, 2023

mathieudz commented Oct 30, 2023 • edited Loading

mathieudz commented Oct 30, 2023

EvanEdwards commented Oct 30, 2023

mathieudz commented Oct 30, 2023

dunglas commented Oct 30, 2023

mathieudz commented Oct 31, 2023

withinboredom commented Nov 1, 2023

dunglas commented Nov 1, 2023

dunglas commented Nov 1, 2023

mathieudz commented Nov 1, 2023

KennedyTedesco commented Nov 1, 2023 • edited Loading

withinboredom commented Nov 1, 2023

dunglas commented Nov 16, 2023 • edited Loading

mathieudz commented Nov 18, 2023 • edited Loading

dunglas commented Nov 18, 2023

dunglas commented Nov 19, 2023

mathieudz commented Nov 22, 2023 • edited Loading

mathieudz commented Dec 15, 2023

dunglas commented Dec 15, 2023

mathieudz commented Oct 30, 2023 •

edited

Loading

KennedyTedesco commented Nov 1, 2023 •

edited

Loading

dunglas commented Nov 16, 2023 •

edited

Loading

mathieudz commented Nov 18, 2023 •

edited

Loading

mathieudz commented Nov 22, 2023 •

edited

Loading