-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(balancer) async initialization of balancer objects #3187
Conversation
this test runs on every request. Would it be possible to use a timer, started from |
@Tieske can one ensure that the timer will be serviced before the first access? I don't want to risk it ever running in an inconsistent state. |
if the timer is set at 0, it will run immediately |
b4efc00
to
096c7b5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed with the approach in general, just left a few concerns!
kong/core/handler.lua
Outdated
if not balancer_initialized then | ||
local ok, err = balancer.init() | ||
if not ok then | ||
log(ngx.ERR, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should add some context to this error message, like "failed to initialize balancers: " ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, it might be worth to bump this to an EMERG
message (instead of ERR
)? Error generally is used for something that impacts the execution of a particular request. An error such as this one has much wider implications to many of the subsequent requests that would be made.
kong/core/balancer.lua
Outdated
@@ -616,11 +637,16 @@ local function init() | |||
if ok ~= nil then | |||
oks = oks + 1 | |||
else | |||
log(ngx.STDERR, "failed creating balancer for ", name, ": ", err) | |||
log(ngx.ERR, "failed creating balancer for ", name, ": ", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe my below comment was the reason why this was an STDERR
level (probably best to use CRIT
though)
kong/core/balancer.lua
Outdated
@@ -27,6 +28,7 @@ local _load_targets_into_memory | |||
|
|||
-- table holding our balancer objects, indexed by upstream name | |||
local balancers = {} | |||
local balancers_initialized = nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be sweet to declare this in a do end
block wrapping the below init
function?
kong/core/balancer.lua
Outdated
return true | ||
end | ||
|
||
local lock, err = resty_lock:new("kong", { timeout = 3600 }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this timeout a bit excessive? Something like 30
might be safer? Is there a particular intent why it is so high? Anyway, it doesn't make much sense for it to be higher than the default exptime
(which is 30s), and actually, it would be modified to 30
right now, see https://github.com/openresty/lua-resty-lock/blob/master/lib/resty/lock.lua#L104
4c672d5
to
b0346d6
Compare
f413715
to
7c65baa
Compare
7c65baa
to
4afa415
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The overall approach looks good to me - I am of the opinion that we propose this patch to @aaronhmiller so that he can give it a try! :)
@@ -470,7 +517,7 @@ local function get_balancer(target, no_create) | |||
return nil, "balancer not found" | |||
else | |||
log(ERR, "balancer not found for ", upstream.name, ", will create it") | |||
return create_balancer(upstream) | |||
return create_balancer(upstream), upstream |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dammit, seems like this already was an issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it wasn't -- I mean, it was with regard to the function signature but it didn't happen because the get_balancer
-> create_balancer
code path was never hit when all balancers were created at startup. Now that it is async and get_balancer
can run before init
finishes running, then this code path can happen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My one concern here is that this can be confusing for users of get_balancer and increase cognitive load: create_balancer
can return 1 or 2 values, making get_balancer
return 1, 2, or 3 values. Usually, we deal with this by catching errors like:
local balancer, err = create_balancer(upstream)
if not balancer then
return nil, err
end
return balancer, upstream -- still somewhat annoying to me, I prefer `balancer, nil, upstream` but up to you!
kong/core/balancer.lua
Outdated
if ok then | ||
return balancers[upstream.id] | ||
else | ||
return nil, "timeout waiting for balancer for " .. upstream.id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: this goes against our code style which instructs to avoid subsequent branches when the previous one already returns :/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, ok, I had read the style doc differently! my vague recollection is that it instructed to return early on the ground of performing validations and avoiding unneeded computation -- since there's no difference in computation here I thought it didn't make a difference here (I tend to find this more readable because it reads as "or else, fail!" instead of "the normal code flow is to fail" -- I'll just invert the test condition)
will change it, no probs! :)
@@ -289,9 +293,50 @@ do | |||
end | |||
end | |||
|
|||
local creating = {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we scope this and other create_balancer
dependencies under a do ... end
block? It'd be a little bit saner/easier to skim through imho
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is already inside a do-block and far down so it's only visible to the two relevant functions that need it. Can we leave this one as-is?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm I missed it; quite large for do
block, but that's fine!
kong/core/balancer.lua
Outdated
local upstreams, err = get_all_upstreams() | ||
if not upstreams then | ||
log(ngx.STDERR, "failed loading initial list of upstreams: ", err) | ||
return | ||
return nil, "failed loading initial list of upstreams: " .. err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems we don't log this returned error in our balancer.init()
call in init_worker
? Maybe we should keep this as a CRIT
log + early return?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok! 👍
When upstreams include targets using hostnames instead of IPs, the initialization of balancer objects needs to perform DNS resolution, which involves opening a socket, which is not available in the init_worker phase. This patch performs balancer initialization in a timer, which makes initialization asynchronous to worker initialization. For this reason, it adds a locking system to create_balancer to avoid race conditions when requests happen while the balancers are being initialized. The locking system uses a simplified version of `resty.lock` which does not use the shm, since balancers are per-worker and we are concerned only with intra-worker concurrency control.
4afa415
to
1d5a10e
Compare
local upstreams, err = get_all_upstreams() | ||
if not upstreams then | ||
log(ngx.STDERR, "failed loading initial list of upstreams: ", err) | ||
log(CRIT, "failed loading initial list of upstreams: " .. err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor, but better not make use of Lua-land string concatenation in those error messages, and rely on the variadic arguments of ngx.log
instead
@@ -470,7 +517,7 @@ local function get_balancer(target, no_create) | |||
return nil, "balancer not found" | |||
else | |||
log(ERR, "balancer not found for ", upstream.name, ", will create it") | |||
return create_balancer(upstream) | |||
return create_balancer(upstream), upstream |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My one concern here is that this can be confusing for users of get_balancer and increase cognitive load: create_balancer
can return 1 or 2 values, making get_balancer
return 1, 2, or 3 values. Usually, we deal with this by catching errors like:
local balancer, err = create_balancer(upstream)
if not balancer then
return nil, err
end
return balancer, upstream -- still somewhat annoying to me, I prefer `balancer, nil, upstream` but up to you!
When upstreams include targets using hostnames instead of IPs, the initialization of balancer objects needs to perform DNS resolution, which involves opening a socket, which is not available in the init_worker phase.
This patch delays balancer initialization to the access phase.Updated: This patch performs balancer initialization in a timer, which makes initialization asynchronous to worker initialization. For this reason, it adds a locking system to create_balancer to avoid race conditions when requests happen while the balancers are being initialized.
The locking system uses a simplified version of the
resty.lock
logic which doesnot use the shm, since balancers are per-worker and we are concerned here only
with intra-worker concurrency control.