Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error handling #713

Closed
Nyholm opened this issue Aug 11, 2020 · 14 comments
Closed

Error handling #713

Nyholm opened this issue Aug 11, 2020 · 14 comments
Labels

Comments

@Nyholm
Copy link
Contributor

Nyholm commented Aug 11, 2020

This is a follow up from #707 and is related to #709

After trying out #707 again I found something strange. I've got two routes, one that times out with sleep(40) and one normal route. My lambda function has a timeout of 16s and PHP-FPM of 15s.

T0: Access normal route,
T1: Get a response
T2: Access route with sleep
T12: (internal server error) Lambda times out with and my application logs are visible.
T20: Access normal route
T21: (internal server error) hollodotme\FastCGI\Exceptions\WriteFailedException Failed to write request to socket [broken pipe]" (hollodotme/fast-cgi-client/src/Sockets/Socket.php on line 385)

So at T12 I get a hollodotme\\FastCGI\\Exceptions\\ReadFailedException Stream got blocked, or terminated exception. After that I cannot write to PHP-FPM anymore..

fwrite(): send of 3512 bytes failed with errno=32 Broken pipe in /var/task/vendor/hollodotme/fast-cgi-client/src/Sockets/Socket.php on line 385

@hollodotme do you know if this is correct?


Another thing I find weird is that my "timout" happens after 10s. I would expect it to be 15 s.

Screenshot 2020-08-11 at 13 53 33

@mnapoli
Copy link
Member

mnapoli commented Aug 11, 2020

In those cases, Bref should restart PHP-FPM to restart from scratch (https://github.com/brefphp/bref/blob/master/src/Event/Http/FpmHandler.php#L58-L62). It seems it's not happening here, can you confirm?

@Nyholm
Copy link
Contributor Author

Nyholm commented Aug 11, 2020

The Lambda function is never crashing, Ie, we are never running FpmHandler::start() again.

I think the socket write error is because PHP-FPM's child died. PHP-FPM spins up a new one but we still try to write to the old child.

@Nyholm
Copy link
Contributor Author

Nyholm commented Aug 11, 2020

Im seeing strange effects...

When I use request_terminate_timeout = 15s it seams like it behaves like: "If the PHP-FPM child is idle for 15 seconds, then we will cause Socket write errors".

@Nyholm
Copy link
Contributor Author

Nyholm commented Aug 11, 2020

I use https://github.com/Nyholm/bref-overwrite-layer and I cannot come to any other conclusion than what I've stated above.

Because request_terminate_timeout is not acting as we expected then I suggest to revert it.

@mnapoli mnapoli added the bug label Aug 11, 2020
@hollodotme
Copy link

@Nyholm The code that causes the mentioned ReadFailedException above explicitly checks for a broken stream using stream_get_meta_data(), see https://github.com/hollodotme/fast-cgi-client/blob/v3.1.2/src/Sockets/Socket.php#L546:

$info = stream_get_meta_data( $this->resource );

# ...

if ( $info['unread_bytes'] === 0 && $info['blocked'] && $info['eof'] )
{
	throw new ReadFailedException( 'Stream got blocked, or terminated.' );
}

I also wrote a test to simulate this scenario by explicitly killing a php-fpm pool/worker process with the following signals and then try to access its stream, see https://github.com/hollodotme/fast-cgi-client/blob/v3.1.2/tests/Integration/SignaledWorkersTest.php#L45:

[
	[
		# SIGHUP
		'signal' => 1,
	],
	[
		# SIGINT
		'signal' => 2,
	],
	[
		# SIGKILL
		'signal' => 9,
	],
	[
		# SIGTERM
		'signal' => 15,
	],
]

So yes, I can confirm that most probably the underlying php-fpm process crashed or was killed when this exception pops up.

@Nyholm
Copy link
Contributor Author

Nyholm commented Aug 11, 2020

My questions are:

  1. Why is the PHP-FPM process killed when I use request_terminate_timeout=15s?
    T0: Request 1, response
    T16: Request 2, "Stream got blocked, or terminated."

Ie, the process is idle for 15s and I cannot write to it anymore.

  1. The Socket is never recovering. If I get a ReadFailedException, this socket is still in retured when I ask the SocketCollection for an idle socket. It might be a reason for this, but in this scenario it means that as soon as my server is idle for 15s, nobody will ever be able to use it.

@hollodotme
Copy link

@Nyholm regarding 2. The Client should always remove the broken Socket from SocketCollection when this exception occurrs, see:

Those are the only usages of Socket#fetchResponse() where the exception may occurr. You should not have to ask or interact with SocketCollection directly.

@hollodotme
Copy link

@Nyholm regarding 1. If I interpret this fpm-source-code correctly, request_terminate_timeout setting seems to behave like a hard idle timeout and really kills the child after it lived for request_terminate_timeout seconds. Disclaimer: I just had a quick look.

  1. https://github.com/php/php-src/blob/37a878b8c3fdd47ef5883a7a44a262c1c9909b0a/sapi/fpm/fpm/fpm_process_ctl.c#L306
  2. https://github.com/php/php-src/blob/ee0579350feb211f9715057cca0095d83b187408/sapi/fpm/fpm/fpm_request.c#L270

Maybe also the combination of pm = static and pm.max_children = 1 is problematic with request_terminate_timeout. But that's also a wild guess.

@Nyholm
Copy link
Contributor Author

Nyholm commented Aug 11, 2020

Regarding 2

I get an error when I write. See my stack trace.

{
    "errorType": "hollodotme\\FastCGI\\Exceptions\\WriteFailedException",
    "errorMessage": "Failed to write request to socket [broken pipe]",
    "stack": [
        "#0 /var/task/vendor/hollodotme/fast-cgi-client/src/Sockets/Socket.php(200): hollodotme\\FastCGI\\Sockets\\Socket->write()",
        "#1 /var/task/vendor/hollodotme/fast-cgi-client/src/Client.php(99): hollodotme\\FastCGI\\Sockets\\Socket->sendRequest()",
        "#2 /var/task/vendor/hollodotme/fast-cgi-client/src/Client.php(77): hollodotme\\FastCGI\\Client->sendAsyncRequest()",
        "#3 /var/task/vendor/bref/bref/src/Event/Http/FpmHandler.php(110): hollodotme\\FastCGI\\Client->sendRequest()",
        "#4 /var/task/vendor/bref/bref/src/Event/Http/HttpHandler.php(22): Bref\\Event\\Http\\FpmHandler->handleRequest()",
        "#5 /var/task/vendor/bref/bref/src/Runtime/LambdaRuntime.php(106): Bref\\Event\\Http\\HttpHandler->handle()",
        "#6 /opt/bootstrap(35): Bref\\Runtime\\LambdaRuntime->processNextEvent()",
        "#7 {main}"
    ]
}

Im running hollodotme/fast-cgi-client v3.1.2 with some added echo's for debugging on AWS Lambda.

@hollodotme
Copy link

@Nyholm

Regarding 2

I get an error when I write. See my stack trace.

Ah, I see. When writing to a broken Socket, the socket is not removed from the collection, indeed. 😞
I'll open an issue for that later this evening.

@Nyholm
Copy link
Contributor Author

Nyholm commented Aug 11, 2020

Regarding 1.

That is an interesting idea. I've tried pm = static and pm.max_children = 1 locally with no issues.

When testing with pm = dynamic on lambda still gives the same error.

Hm..

@Nyholm
Copy link
Contributor Author

Nyholm commented Aug 12, 2020

Hm... or maybe it is the fact that we run php-fpm in nondaemon... I'll try that next.

@hollodotme
Copy link

I released v3.1.3 which fixes the bug that broken sockets were not removed from the collection - https://github.com/hollodotme/fast-cgi-client/releases/tag/v3.1.3

@mnapoli
Copy link
Member

mnapoli commented Nov 14, 2020

I'll be closing this one because it seems resolved, let's reopen if I'm wrong or if there's any further issue.

Thanks all!

@mnapoli mnapoli closed this as completed Nov 14, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants