-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(healthchecks) add active and passive health checks #3096
Conversation
end | ||
|
||
|
||
local function check_http_statuses(arg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we also apply check_positive_int
in this check as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why? The only logic I can see the check_positive_int()
validator enforcing would be checking if the argument is an integer. Is that what you are suggesting we should check in this validator? Otherwise, the check_http_statuses()
validator already has narrower acceptance rules regarding the allowed range.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not saying this isn't an valid check to perform, ofc, just making sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes that is why I'm suggesting, just for completeness sake
if not ok then | ||
ngx_log(ngx_ERR, "failed to retry the dns/balancer resolver for ", | ||
tostring(addr.host), "' with: ", tostring(err)) | ||
|
||
return responses.send(500) | ||
return ngx.exit(errcode) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
curious about this change away from responses.send
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For 500 in particular it would work, but it's for consistency with the change above. Our responses
module sometimes does ngx.say which causes a crash in the balancer context.
@@ -32,6 +32,7 @@ lua_shared_dict kong 5m; | |||
lua_shared_dict kong_cache ${{MEM_CACHE_SIZE}}; | |||
lua_shared_dict kong_process_events 5m; | |||
lua_shared_dict kong_cluster_events 5m; | |||
lua_shared_dict kong_healthchecks 5m; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just doing due diligence, have we investigated that this is a large enough allocation? and that the healthcheck library handles and alerts on OOM/eviction conditions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Large enough is always relative, but yes, the library does alert on shm allocation failures. This PR logs whatever comes from the healthcheck library.
healthchecks = { | ||
type = "table", | ||
schema = gen_schema(healthchecks_defaults), | ||
default = healthchecks_defaults, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from a UX perspective, what happens when I want to define a healthchecks
sub-schema with mostly default data, but a few changes? do i need to manually fill out the whole healthchecks_defaults
as part of my request?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The above gen_schema()
function generates fields that have the default
attribute set to the one defined in the above stub healthchecks_defaults
, so you should indeed be able to partially specify health check configuration and expect default values to be inserted when necessary.
About to push the following patch to this branch: diff --git a/kong/core/balancer.lua b/kong/core/balancer.lua
index 664c0e55..d45cc027 100644
--- a/kong/core/balancer.lua
+++ b/kong/core/balancer.lua
@@ -604,7 +604,7 @@ end
local function init()
local upstreams, err = get_all_upstreams()
if not upstreams then
- log(ERR, "failed loading initial list of upstreams: ", err)
+ log(ngx.STDERR, "failed loading initial list of upstreams: ", err)
return
end
@@ -615,7 +615,7 @@ local function init()
if ok ~= nil then
oks = oks + 1
else
- log(ERR, "error creating balancer for ", name, ": ", err)
+ log(ngx.STDERR, "failed creating balancer for ", name, ": ", err)
errs = errs + 1
end
end
diff --git a/kong/dao/schemas/upstreams.lua b/kong/dao/schemas/upstreams.lua
index de3444e8..cf409850 100644
--- a/kong/dao/schemas/upstreams.lua
+++ b/kong/dao/schemas/upstreams.lua
@@ -41,6 +41,10 @@ local function check_http_statuses(arg)
return false, "array element is not a number"
end
+ if math.floor(s) ~= s then
+ return false, "must be an integer"
+ end
+
-- Accept any three-digit status code,
-- applying Postel's law in case of nonstandard HTTP codes
if s < 100 or s > 999 then
diff --git a/spec/01-unit/007-entities_schemas_spec.lua b/spec/01-unit/007-entities_schemas_spec.lua
index ebc9f32a..05881ede 100644
--- a/spec/01-unit/007-entities_schemas_spec.lua
+++ b/spec/01-unit/007-entities_schemas_spec.lua
@@ -763,6 +763,7 @@ describe("Entities Schemas", function()
{{ active = { healthy = { http_statuses = { -1 }}}}, "status code" },
{{ active = { healthy = { http_statuses = { 99 }}}}, "status code" },
{{ active = { healthy = { http_statuses = { 1000 }}}}, "status code" },
+ {{ active = { healthy = { http_statuses = { 111.314 }}}}, "must be an integer" },
{{ active = { healthy = { successes = 0.5 }}}, "must be an integer" },
{{ active = { healthy = { successes = 0 }}}, "must be an integer" },
{{ active = { healthy = { successes = -1 }}}, "an integer between" }, |
348a2e6
to
764b522
Compare
Backport from commit fd5c8dc39127e9362975faefd30d915f0710fa77 by @kikito
Reworks the management of balancer objects from being on-demand, on each request, to be event-based, responding to DAO events. Balancers are now initialized on system startup, and updated based on propagated worker-events and cluster-events. These changes allow healthchecks to update properly. Attaches a healthchecker object to each balancer. The balancers report HTTP statuses and TCP failures to the healthchecker. Moves objects that were attached to the balancer object using fields such as `__target_history` into weakly-keyed local tables.
Use ctx.KONG_PROXIED to differentiate between responses produced by an upstream vs. responses produced by Kong plugins.
Allow the internal http_server to produce a series of success and failure responses (e.g. 10 successes, followed by 5 failures, followed by 10 successes).
Adds two new endpoints to Target entities: * `/upstreams/:id/targets/:target/healthy` * `/upstreams/:id/targets/:target/unhealthy` These post an event that is forwarded to all relevant healthcheckers, updating their immediate status. This is useful for manually re-enabling a target that has been disabled by passive health checks.
Add tests for: * manual recovery using /healthy endpoint: test scenario using passive healthchecks as circuit breaker, then using the endpoint to restore the target. * manual shutdown using the /unhealthy endpoint.
Remove hardcoded references to 127.0.0.1 in balancer integration tests. This sets the stage for IPv6 tests, pending an update of lua-resty-dns-client.
Adds a test that attests that a Kong-generated error (in this case a `401 Unauthorized` generated by the key-auth plugin) does not add up to the health checker event counter. This integration test: 1. configures healthchecks with a 1-error threshold 2. adds a key-auth plugin with no configured consumer 3. runs request, which fails with 401, but doesn't hit the 1-error threshold 4. deletes the plugin 5. starts proxied servers, and runs two full rounds of the balancer wheel 6. each server reports one full round of the wheel worth of successes, meaning they are both healthy and were not affected by the 401.
764b522
to
f87844e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Applied the above patch, and rebased on top of next
(itself including the recent master
work).
@hishamhm Please do merge at your convenience, but please do so with the "Create a merge commit" option in this particular case (large features). Thanks!
After install the version 0.12, how can I set the upstream to use the active health check? |
Kong Community Edition (CE) 0.12 is currently available only in a Release
Candidate form, which means that our documentation site GetKong.org hasn't
yet been updated for CE 0.12 features like health checks.
This pull request Kong/docs.konghq.com#566 includes
documentation of health checks - hopefully that'll suffice until we get
GetKong.org updated.
…On Sun, Dec 24, 2017 at 11:34 PM, xiaolongiii ***@***.***> wrote:
After install the version 0.12, how can I set the upstream to use the
active health check?
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#3096 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAFyuMGDU2SrPSy-973c8ZExD3jb8pwhks5tD1AhgaJpZM4RCwrG>
.
|
I can't work out how to set the array of status for healthchecks.active.healthy.http_statuses. Could you provide an example curl command? No matter what i try, i just get "array element is not a number" in response. |
Sorted - requires JSON input. Doesn't seem to work with "key=var" input format. |
@craig-mld Thanks, I have the same problem with you,I try -d "healthchecks.active.healthy.http_statuses[]=200" -d "healthchecks.active.healthy.http_statuses=200" , etc , all get wrong messages. -d '{"healthchecks.active.healthy.http_statuses":[200]}' is ok |
Summary
Adds support for health checks in Kong.
Health checkers can perform active health checks, periodically pinging a configurable upstream URL and checking for the HTTP status code, TCP errors and timeouts, and also passive health checks, checking these same signals on ongoing traffic, allowing for circuit breaker functionality.
Major kudos to @Tieske for his work on lua-resty-dns-client and our joint work on lua-resty-healthcheck (on which this is based), and to @thibaultcha for his reviews while this PR was in the making.
Full changelog
healthchecks
configuration field for theupstreams
entity/upstream/:id/targets/:target/healthy
and/upstream/:id/targets/:target/unhealthy
Issues resolved
Fix #112
Fix #154