Flush FPM logs in case of timeouts #1133

mnapoli · 2022-01-04T16:15:34Z

When Lambda times out with the PHP-FPM layer, the logs written by the PHP script are never flushed to stderr by PHP-FPM. That means logs never reach CloudWatch, which makes timeouts really hard to debug.

With this change, Bref waits for the FPM response until 1 second before the actual Lambda timeout (via a connection timeout on the FastCGI connection).

If Bref reaches that point, it will ask PHP-FPM to gracefully restart the PHP-FPM worker, which:

flushes the logs (logs end up in CloudWatch, which is great)
restarts a clean FPM worker, without doing a full FPM restart (which may take longer)

Follow up of #770, #772, #895, #1106

May address some of #862

Note: this does not change anything for the Function layer (only affects FPM). Also this does not show a full stack track of the place in the code where the timeout happens (#895 did). Still it's an improvement over the current status.

Here is an example of a timeout with the PHP logs correctly written to stderr:

I tried to make the error message as explicit as possible.

I would love for some of you to try this PR and confirm that it helps (that PHP logs end up in CloudWatch) and that there are no side effects.

You can do that via composer.json:

-    "bref/bref": "^1.0",
+    "bref/bref": "dev-timeouts-fpm",

When Lambda times out with the PHP-FPM layer, the logs written by the PHP script are never flushed to stderr by PHP-FPM. That means they never reach CloudWatch, which makes timeouts really hard to debug. With this change, Bref waits for the FPM response until 1 second before the actual Lambda timeout (via a connection timeout on the FastCGI connection). If Bref reaches that point, it will ask PHP-FPM to gracefully restart the PHP-FPM worker, which: - flushes the logs (logs end up in CloudWatch, which is great) - restarts a clean FPM worker, without doing a full FPM restart (which may take longer) Follow up of #770, #772, #895 May address some of #862 Note: this does not change anything for the Function layer (only affects FPM). Also this does not show a full stack track of the place in the code where the timeout happens (#895 did). Still it's an improvement over the current status.

shadowhand

Clever solution. I don't know that we can test it for you, because only our production instances see timeouts.

mnapoli · 2022-01-04T22:00:14Z

Testing in production? 😛

I'm also considering releasing that behind a feature flag (env variable for example), but would that change anything for you?

t-richard

It's simple and smart enough 👍 (having FPM in Lambda is more or less a hack already 😅)

Just have one comment

t-richard · 2022-01-05T09:13:20Z

tests/Handler/FpmHandlerTest.php

@@ -1071,7 +1071,7 @@ public function test FPM timeouts are recovered from()
                'httpMethod' => 'GET',
            ], $this->fakeContext);
            $this->fail('No exception was thrown');
-        } catch (FastCgiCommunicationFailed $e) {
+        } catch (Timeout $e) {


FastCgiCommunicationFailed is still thrown in FpmHandler.php

Shouldn't it be 👇 ?

Suggested change

} catch (Timeout $e) {

} catch (FastCgiCommunicationFailed|Timeout $e) {

In this test I specifically want to test for a timeout, so if I get any other exception it's that the test is broken? (and so the FastCgiCommunicationFailed exception should bubble up and fail my test)

allan-simon · 2022-01-05T16:22:42Z

haha funny we're facing this issue right now , trying your branch right now :)

allan-simon · 2022-01-05T16:50:49Z

I tried some minutes ago your branch , and unfortunately for us we still only in clouwatch

2022-01-05T17:43:58.568+01:00 | START RequestId: f5bb07ad-7fb3-49cf-88a7-4e13a19735f2 Version: $LATEST
2022-01-05T17:44:13.586+01:00 | END RequestId: f5bb07ad-7fb3-49cf-88a7-4e13a19735f2
2022-01-05T17:44:13.586+01:00 | REPORT RequestId: f5bb07ad-7fb3-49cf-88a7-4e13a19735f2 Duration: 15015.50 ms Billed Duration: 15000 ms Memory Size: 1769 MB Max
2022-01-05T16:44:13.586Z f5bb07ad-7fb3-49cf-88a7-4e13a19735f2 Task timed out after 15.02 seconds

mnapoli · 2022-01-05T17:26:28Z

@allan-simon thanks for trying it! It seems in your case the new feature did not kick in at all 🤔

If you have a 15 seconds Lambda timeout, then FPM should stop at 14 seconds and throw an explicit exception (the "Task timed out" message shouldn't even appear, because the 15 seconds shouldn't be reached).

At the 14th second, the exception should be thrown no matter what, so that's confusing. Sorry to insist but could you double-check that the pull request was really deployed in your case?

allan-simon · 2022-01-05T19:39:36Z

@mnapoli no problem, I will double check that, you're totally right that I may have been too excited and forgot something

allan-simon · 2022-01-06T09:12:50Z

@mnapoli indeed, I only did "composer install" , and forgot it does not upgrade the composer.lock, I confirm your branch works like a charm and is a huge life saver for us ^^ (so great luck from us that you did the PR at that very same time )

mnapoli · 2022-01-06T09:29:12Z

@allan-simon AWESOME, thank you for testing!

shadowhand · 2022-01-06T12:55:18Z

Woohoo! 🎊

allan-simon · 2022-01-06T13:58:37Z

I'm also considering releasing that behind a feature flag (env variable for example), but would that change anything for you?

if that allows this PR to be more quickly merged in mainstream , yes :)

shadowhand · 2022-01-06T15:15:09Z

I'm also considering releasing that behind a feature flag (env variable for example), but would that change anything for you?

I don't think that would change anything for us, but it would probably be better for release planning.

mnapoli · 2022-01-08T17:03:41Z

Let's get this out of the door!

Thanks for testing on your end too @allan-simon!

shadowhand · 2022-01-11T15:20:55Z

demo/http.php

@@ -3,6 +3,7 @@
 require __DIR__ . '/../vendor/autoload.php';

 if (isset($_GET['sleep'])) {
+    error_log('This is a log');


@mnapoli was this merged into 1.5.0? Seems like extraneous debugging.

This is demo code, nothing to worry about IMHO
It just allows to test this feature works but will never be executed in you app 🙂

Yeah, this directory isn't even a demo really, just a sample app for me to play/develop. It's not shipped to users.

Follow-up of #1133 This is because sending SIGUSR2 to FPM (the previously implemented solution) did not really stop with 100% certainty the PHP script that timed out. Indeed, it merely interrupted the currently blocked call (e.g. a sleep, a DB call, etc.), flushed the logs and carried on. My guess is that this could have caused the PHP script to continue to run in some cases, possibly running into yet another timeout on a next line (e.g. another DB call). This PR fixes the timeout test that wasn't really working (🤦) and restarts FPM completely in case of timeout. That is confirmed to completely stop the execution of the timed out script + flush the logs to stderr.

mnapoli · 2022-01-27T15:42:46Z

For reference, follow-up: #1144

Flush FPM logs in case of timeouts

Follow-up of #1133 This is because sending SIGUSR2 to FPM (the previously implemented solution) did not really stop with 100% certainty the PHP script that timed out. Indeed, it merely interrupted the currently blocked call (e.g. a sleep, a DB call, etc.), flushed the logs and carried on. My guess is that this could have caused the PHP script to continue to run in some cases, possibly running into yet another timeout on a next line (e.g. another DB call). This PR fixes the timeout test that wasn't really working (🤦) and restarts FPM completely in case of timeout. That is confirmed to completely stop the execution of the timed out script + flush the logs to stderr.

mnapoli added the bug label Jan 4, 2022

mnapoli mentioned this pull request Jan 4, 2022

Allow developer to control FpmHandler timeout #770

Closed

mnapoli force-pushed the timeouts-fpm branch from 1d9405e to 57cf99f Compare January 4, 2022 16:19

mnapoli mentioned this pull request Jan 4, 2022

Handle timeouts more gracefully by allowing the application to shutdown #895

Open

2 tasks

mnapoli force-pushed the timeouts-fpm branch from 57cf99f to e9fcb95 Compare January 4, 2022 16:21

mnapoli mentioned this pull request Jan 4, 2022

Handle Lambda timeouts and PHP-FPM crashes better #862

Closed

shadowhand reviewed Jan 4, 2022

View reviewed changes

t-richard approved these changes Jan 5, 2022

View reviewed changes

mnapoli mentioned this pull request Jan 8, 2022

Control FPM timeout to give time to flush out logs #1106

Closed

mnapoli merged commit 25c2415 into master Jan 8, 2022

mnapoli deleted the timeouts-fpm branch January 8, 2022 17:03

shadowhand reviewed Jan 11, 2022

View reviewed changes

mnapoli mentioned this pull request Jan 27, 2022

Restart FPM completely in case of timeouts #1144

Merged

mnapoli added a commit that referenced this pull request Feb 14, 2023

Merge pull request #1133 from brefphp/timeouts-fpm

f19f91f

Flush FPM logs in case of timeouts

mnapoli mentioned this pull request Jun 10, 2023

Perf & Timeout issue with PHP-FPM 8.1 #1564

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Flush FPM logs in case of timeouts #1133

Flush FPM logs in case of timeouts #1133

mnapoli commented Jan 4, 2022 •

edited

Loading

shadowhand left a comment

mnapoli commented Jan 4, 2022

t-richard left a comment

t-richard Jan 5, 2022

mnapoli Jan 5, 2022

allan-simon commented Jan 5, 2022

allan-simon commented Jan 5, 2022

mnapoli commented Jan 5, 2022

allan-simon commented Jan 5, 2022

allan-simon commented Jan 6, 2022

mnapoli commented Jan 6, 2022

shadowhand commented Jan 6, 2022

allan-simon commented Jan 6, 2022

shadowhand commented Jan 6, 2022

mnapoli commented Jan 8, 2022

shadowhand Jan 11, 2022

t-richard Jan 11, 2022

mnapoli Jan 11, 2022

mnapoli commented Jan 27, 2022

	} catch (Timeout $e) {
	} catch (FastCgiCommunicationFailed\|Timeout $e) {

Flush FPM logs in case of timeouts #1133

Flush FPM logs in case of timeouts #1133

Conversation

mnapoli commented Jan 4, 2022 • edited Loading

shadowhand left a comment

Choose a reason for hiding this comment

mnapoli commented Jan 4, 2022

t-richard left a comment

Choose a reason for hiding this comment

t-richard Jan 5, 2022

Choose a reason for hiding this comment

mnapoli Jan 5, 2022

Choose a reason for hiding this comment

allan-simon commented Jan 5, 2022

allan-simon commented Jan 5, 2022

mnapoli commented Jan 5, 2022

allan-simon commented Jan 5, 2022

allan-simon commented Jan 6, 2022

mnapoli commented Jan 6, 2022

shadowhand commented Jan 6, 2022

allan-simon commented Jan 6, 2022

shadowhand commented Jan 6, 2022

mnapoli commented Jan 8, 2022

shadowhand Jan 11, 2022

Choose a reason for hiding this comment

t-richard Jan 11, 2022

Choose a reason for hiding this comment

mnapoli Jan 11, 2022

Choose a reason for hiding this comment

mnapoli commented Jan 27, 2022

mnapoli commented Jan 4, 2022 •

edited

Loading