feat(runloop) execute plugins on Nginx-produced errors #3533

thibaultcha · 2018-06-12T23:26:10Z

Context

Before this patch, several scenarios would make Kong/Nginx produce
errors that did not execute plugins:

Nginx-produced client errors (HTTP 4xx) on invalid requests per the
Nginx settings (e.g. headers too large, URI too large, etc...)
Proxy TCP/HTTP errors on the last balancer try (e.g. connection
refused, connection timeout, read/write timeout, etc...)
Any other Nginx-produced server errors (HTTP 5xx), if any

Because plugins are not executed in such cases, logging plugins in
particular do not get a chance to report such errors, leaving Kong
administrator blind when relying on bundled or homegrown logging
plugins. Additionally, other plugins, e.g. transformation, do not get a
chance to run either and could lead to scenarios where those errors
produce headers or response bodies that are undesired by the
administrator.

This issue is similar to that fixed by #3079, which handled the
following case:

Requests short-circuited by Kong itself (vs Nginx in 1, 2, and 3.)

See also #490 & #892 for issues related to 4. This patches addresses 1,
2, and 3.

Problem

Nginx-produced errors (4xx and 5xx) are redirected to the
/kong_error_handler location by way of the error_page directive,
but the runloop is not given a chance to run, since the error handler
is only instructed to override the Nginx-defined HTML responses, and
produces Kong-friendly ones.

In short, a patch solving this issue must enable the runloop within the
error handler.

Proposed solution

Several (many) possibilities were evaluated and explored to fix this.
The proposed patch is the result of many attempts, observations, and
adjustments to initial proposals. In a gist, it proposes:

Executing the response part of the runloop in the error_page location.
Ensure that if the request part of the runloop did not have a chance
to run, we do build the list of plugins to execute (thus, global plugins
only since the request was not processed).
Ensure that if the request part of the runloop did run, we preserve
its ngx.ctx values.

A few catches:

Currently, ngx.req.get_method() in the error_page location block
will return GET even if the request method was different (only
HEAD is preserved). A follow up patch for Nginx is being prepared.
As such, several tests are currently marked as "Pending".
When the rewrite phase did not get a chance to run, some parts of
the request may not have been parsed by Nginx (e.g. on HTTP 414), thus
some variables, such as $request_uri or request headers may not have
been parsed. Code should be written defensively against such
possibilities, and logging plugins may report incomplete request data
(but will still report the produced error).
The uninitialized_variable_warn directive must be disabled in the
error_page location block for cases when the rewrite phase did not
get a chance to run (Nginx-produced 4xx clients errors).
The ngx.ctx variables must be copied by reference for plugins to be
able to access it in the response phases when the request was
short-circuited by Nginx (e.g. HTTP 502/504 from the upstream module).
Using a named location for the error_page handler is not possible,
since in some cases the URI might not be parsed by Nginx, and thus Nginx
refuses to call named locations in such cases (e.g. on HTTP 414).

Resulting behavior

The 03-plugins_triggering_spec.lua test suite is now a good reference
for the behavior observed upon Nginx-produced errors. This behavior can
be classified in 2 categories:

When the request is not parsed by Nginx (HTTP 4xx)
a. The error_page directive is called (rewrite and access are
skipped)
b. The runloop retrieves the list of global plugins to run (it
cannot retrieve more specific plugins since the request could not be
parsed)
c. The error content handler overrides the Nginx body with a Kong-friendly
one (unchanged from previous behavior)
d. It executes the header_filter phase of each global plugin.
e. It executes the body_filter phase of each global plugin.
f. It executes the log phase of each global plugin.
When the error was produced during upstream module execution (HTTP 5xx)
a. The rewrite and access phases ran normally (we stashed the
ngx.ctx reference)
b. The error_page directive is called
c. The error content handler applies the original ngx.ctx by
reference
d. It does not run the plugins loop, since ngx.ctx contains
the list of plugins to run (global and specific plugins in
this case)
e. The error content handler overrides the Nginx body with a Kong-friendly
one (unchanged from previous behavior)
f. It executes the header_filter phase of each global plugin.
g. It executes the body_filter phase of each global plugin.
h. It executes the log phase of each global plugin.

Fix #3193

thibaultcha · 2018-06-13T20:54:09Z

kong/templates/nginx_kong.lua

@@ -147,11 +148,25 @@ server {
        }
    }

-    location = /kong_error_handler {
+    location /kong_error_handler {


Need to keep the =

Report both connection timeouts at the `balancer` phase and upstream timeouts as `timeout` events when using passive healthchecks. Previously, connection timeouts were being misreported as `tcp_failures`. Upstream timeouts can now be caught due to recent runloop improvements for executing plugins on Nginx-produced errors (#3533). Includes regression tests for both cases.

Context ------- Before this patch, several scenarios would make Kong/Nginx produce errors that did not execute plugins: 1. Nginx-produced client errors (HTTP 4xx) on invalid requests per the Nginx settings (e.g. headers too large, URI too large, etc...) 2. Proxy TCP/HTTP errors on the last balancer try (e.g. connection refused, connection timeout, read/write timeout, etc...) 3. Any other Nginx-produced server errors (HTTP 5xx), if any Because plugins are not executed in such cases, logging plugins in particular do not get a chance to report such errors, leaving Kong administrator blind when relying on bundled or homegrown logging plugins. Additionally, other plugins, e.g. transformation, do not get a chance to run either and could lead to scenarios where those errors produce headers or response bodies that are undesired by the administrator. This issue is similar to that fixed by #3079, which handled the following case: 4. Requests short-circuited by Kong itself (vs Nginx in 1, 2, and 3.) See also #490 & #892 for issues related to 4. This patches addresses 1, 2, and 3. Problem ------- Nginx-produced errors (4xx and 5xx) are redirected to the `/kong_error_handler` location by way of the `error_page` directive, but the runloop is not given a chance to run, since the error handler is only instructed to override the Nginx-defined HTML responses, and produces Kong-friendly ones. In short, a patch solving this issue must enable the runloop within the error handler. Proposed solution ----------------- Several (many) possibilities were evaluated and explored to fix this. The proposed patch is the result of many attempts, observations, and adjustments to initial proposals. In a gist, it proposes: * Executing the response part of the runloop in the error_page location. * Ensure that if the request part of the runloop did not have a chance to run, we do build the list of plugins to execute (thus, global plugins only since the request was not processed). * Ensure that if the request part of the runloop did run, we preserve its `ngx.ctx` values. A few catches: * Currently, `ngx.req.get_method()` in the `error_page` location block will return `GET` even if the request method was different (only `HEAD` is preserved). A follow up patch for Nginx is being prepared. As such, several tests are currently marked as "Pending". * When the `rewrite` phase did not get a chance to run, some parts of the request may not have been parsed by Nginx (e.g. on HTTP 414), thus some variables, such as `$request_uri` or request headers may not have been parsed. Code should be written defensively against such possibilities, and logging plugins may report incomplete request data (but will still report the produced error). * The `uninitialized_variable_warn` directive must be disabled in the `error_page` location block for cases when the `rewrite` phase did not get a chance to run (Nginx-produced 4xx clients errors). * The `ngx.ctx` variables must be copied by reference for plugins to be able to access it in the response phases when the request was short-circuited by Nginx (e.g. HTTP 502/504 from the upstream module). * Using a named location for the `error_page` handler is not possible, since in some cases the URI might not be parsed by Nginx, and thus Nginx refuses to call named locations in such cases (e.g. on HTTP 414). Resulting behavior ------------------ The `03-plugins_triggering_spec.lua` test suite is now a good reference for the behavior observed upon Nginx-produced errors. This behavior can be classified in 2 categories: 1. When the request is not parsed by Nginx (HTTP 4xx) a. The `error_page` directive is called (`rewrite` and `access` are skipped) b. The runloop retrieves the list of **global** plugins to run (it cannot retrieve more specific plugins since the request could not be parsed) c. The error content handler overrides the Nginx body with a Kong-friendly one (unchanged from previous behavior) d. It executes the `header_filter` phase of each global plugin. e. It executes the `body_filter` phase of each global plugin. f. It executes the `log` phase of each global plugin. 2. When the error was produced during upstream module execution (HTTP 5xx) a. The `rewrite` and `access` phases ran normally (we stashed the `ngx.ctx` reference) b. The `error_page` directive is called c. The error content handler applies the original `ngx.ctx` by reference d. It does **not** run the plugins loop, since `ngx.ctx` contains the list of plugins to run (global **and** specific plugins in this case) e. The error content handler overrides the Nginx body with a Kong-friendly one (unchanged from previous behavior) f. It executes the `header_filter` phase of each global plugin. g. It executes the `body_filter` phase of each global plugin. h. It executes the `log` phase of each global plugin. Fix #3193

Report both connection timeouts at the `balancer` phase and upstream timeouts as `timeout` events when using passive healthchecks. Previously, connection timeouts were being misreported as `tcp_failures`. Upstream timeouts can now be caught due to recent runloop improvements for executing plugins on Nginx-produced errors (#3533). Includes regression tests for both cases. From #3539

thibaultcha mentioned this pull request Jun 12, 2018

Kong plugins cannot log read and connection timeouts #3193

Closed

thibaultcha commented Jun 13, 2018

View reviewed changes

thibaultcha force-pushed the feat/plugins-on-nginx-errors branch from 56c8d9c to 2f78f51 Compare June 14, 2018 00:36

jeremyjpj0916 mentioned this pull request Jun 14, 2018

kong 500 error not get logged #3535

Closed

hishamhm mentioned this pull request Jun 14, 2018

fix(healthchecks) report timeouts on passive healthchecks #3539

Merged

thibaultcha force-pushed the feat/plugins-on-nginx-errors branch from 2f78f51 to 774d326 Compare June 15, 2018 01:26

thibaultcha force-pushed the feat/plugins-on-nginx-errors branch from 774d326 to 3dc04a3 Compare June 15, 2018 02:14

thibaultcha merged commit e709c95 into next Jun 15, 2018

thibaultcha deleted the feat/plugins-on-nginx-errors branch June 19, 2018 18:47

thibaultcha mentioned this pull request Jul 26, 2018

why passive tcp_failure healthychecker need api retries greater than 0 #3645

Closed

thibaultcha mentioned this pull request Nov 5, 2018

Kong 1.0rc2 migrations from 0.14.1 to 1.0rc2 issues #3924

Closed

thibaultcha mentioned this pull request Nov 27, 2018

Incorrect HTTPMethod logged on ReadTimeout #4025

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(runloop) execute plugins on Nginx-produced errors #3533

feat(runloop) execute plugins on Nginx-produced errors #3533

thibaultcha commented Jun 12, 2018 •

edited

Loading

thibaultcha Jun 13, 2018

feat(runloop) execute plugins on Nginx-produced errors #3533

feat(runloop) execute plugins on Nginx-produced errors #3533

Conversation

thibaultcha commented Jun 12, 2018 • edited Loading

Context

Problem

Proposed solution

Resulting behavior

thibaultcha Jun 13, 2018

Choose a reason for hiding this comment

thibaultcha commented Jun 12, 2018 •

edited

Loading