-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf(runloop) warm up DNS records for Services on updates #4656
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be worth noting (or documenting internally) that the cache provided by lua-resty-dns-client is a per-worker LRU, so this action will have limited impact in multi-core environments.
Cool improvement nonetheless :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know the exact event order and if the event handler in this case is called in 1 worker (through post_local
) or on all workers (through post
). So cannot confirm @p0pr0ck5 remark.
Usefulness of this change is mostly for scaling a Kong cluster imo.
return | ||
end | ||
|
||
kong.dns.toip(host) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this call will fail if the configured hostname is not an actual hostname, but an upstream. toip
does not resolve upstreams, it only does the dns part. Resolving upstream
names is done in balancer.lua
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw: could be that the error just gets ignored, in that case all is well, but better test it first so we do not get unexpected error messages in the logs.
I tried this and yes (the same problem was in my original pr), @p0pr0ck5 is right, it currently only warms up a single worker. |
Any reason not to rewrite the cache to use thibaultcha's awesome ml cache as a dependency so DNS cache could become a global vs local worker lru? I would think such rewrites for resources that will be used or updated often it makes sense? Is there a downside I am not seeing(besides a little extra mem needed)? |
@jeremyjpj0916 if there a large number of entries (20), the dns server might respond with a random subset (say 4 only). Say we have a 12 core machine, with hence also 12 workers. Kong would get 4 random results in each of the 12 workers, so 48 results in total. And would hence probably have all 20 entries, and basic statistics will ensure we still proxy to all the backends. Now alter Kong such that it shared the dns data. Say so we do a single query, get 4 backends, all 12 Kong workers now start proxying to only those 4 backends. So 16 backends sit idle. That is most likely not what you want. |
@Tieske interesting, not super familiar with DNS outside of the co I work for where generally a hostname resolves to 1 ip essentially per data center in our internal network, when you only have 2-3 dc's(if the app is HA) it keeps the entries small. Maybe the situation you are describing is more common on public cloud. One of those engineering decisions where you want to make the shoe fit for all cases as best you can so now it makes a bit more sense why it works the way it does. Thanks for the insight. I suppose the alternative if you still wanted a global would be to track which workers had queried a nameserver yet for that in the cache, key(the hostname)->value(a object containing pids of worker processes and the ip's they resolved) pairing and if the value does not contain entries within it for that worker process yet then do a lookup so your statistics could play out and get all those entries. A little more complicated than current design though for sure 😄 . |
@kikito A valid perf test for this (even if manual) would be to start Kong, create a service to example.com and a route pointing to it, wait a second, issue one first request, and check the latency headers. Do this with and without the patch (both times on a freshly started, empty database Kong to avoid cache warmup on each run) and verify the latency difference. Might want to bash-script this to do it a few times. Please report back if the difference is observed as expected! |
A more robust way to avoid slow/flaky tests may be to point (@jeremyjpj0916 Just as an aside here, I do have plans for an mlcache-based DNS resolver library, but really unsure if they will ever see the light of day, due to time constraints. I often mention such a DNS library as a typical usage example for mlcache in talks and such, since many mlcache features would benefit such a library and make it fairly simple to write. lua-resty-dns-client does a great job for us today already, and provides a nice load balancing abstraction :) ) |
I'm not getting it? You mean each worker does its own query, but shares the results. And then each worker uses the entries in the shared results? The technical solution to that would be to use a TCP query. This is tracked here Kong/lua-resty-dns-client#63 |
5e51e8f
to
8879abb
Compare
Precisely.
Interesting so DNS via UDP can generally return a subset whereas TCP its always going to return the full list? Then yeah the tcp approach is cleaner than my idea to enable using something like ml-cache . If this were ever done might make sense to give people the option to use the udp + per worker lru vs tcp + ml-cache down the road if the TCP approach gets hardened(seems like it just needs retries added based on your raised issue). Or if you were eager to implement, a tcp first with fallback to udp upon call failure(so no tcp retransmit) is doable now if this was an itch you have been wanting to scratch hah. Where if its tcp success it goes in the ml-cache and used by all workers and if that call happens to fail it ends up udp + cache in a local lru. Likely would just be easier to fork the dns dependency and add re-transmit though at that point 😄 . Anyways I will leave it at that, don't wanna derail this PR too much but interesting discussion. |
TCP DNS queries do not imply that the server will return a “full list” of records for a name, and UDP queries do not imply that a “partial” list is returned. The underlying transport is unrelated to the number of records returned. If the authoritative server (or an intermediate resolver) is only returning a subset of the record set, that’s a problem with that resolver/auth server, not the transport or client. Forcing TCP queries to try to get a full list of records is incorrect behavior. For example, the behavior of Consul returning a subset of records is unrelated to the transport mechanism, as documented by Consul:
https://www.consul.io/docs/agent/dns.html Consuls resolver also has some questionable behavior, like truncating responses without setting the truncate flag 😳 and this behavior is configurable anyhow: https://www.consul.io/docs/agent/options.html#enable_truncate If anything, the approach of caching records at all is antithetical to the approach of supporting the edge case of “round robining” multiple different answers from an authoritative DNS server. If Kong wants to support response record sets that would be truncated over TCP, it should follow the spec and fetch the record set over TCP (and Consul should be configured appropriately). |
I tried the test suggested by @hisham. Here's the bash script I used: for i in {1..20}
do
kong start &> test.log
http POST :8001/services name=serv host=example.net path=/ &> test.log
http POST :8001/services/serv/routes hosts:='["example.net"]' name=rout &> test.log
sleep 3
http :8000 Host:example.net -h 2>&1 | grep X-Kong-Proxy-Latency
http DELETE :8001/routes/rout &> test.log
http DELETE :8001/services/serv &> test.log
kong stop &> test.log
done (Twenty times, start kong, create route and service, wait 3 seconds, route traffic towards the service, output the With
With this PR, the latency fluctuates more, sometimes it gets to 0 or 1:
So for the default kong case the first request seems to benefit from this change, on average. |
(Just to note, the default behavior for Kong is not to run one worker process, but to run a number of worker processes equal to the number of cores: https://github.com/Kong/kong/blob/master/kong/templates/kong_defaults.lua#L16) |
@Tieske A bit off-topic, but would "keep asking for entries for a while" help solve the problem? Something like:
|
@p0pr0ck5 Thanks, fixed the last phrase in my comment. |
@kikito The suggested test case isn't robust enough to be asserting this behavior. Can we please follow the approach I suggested above, or something similar? |
@kikito are you sure next:
PR:
which is what I would expect from this change. |
@thibaultcha your suggestion is good for a test case in the suite to check that the code is called, mine was meant as a manual check of the perf impact. |
9b34165
to
66d44a7
Compare
This change prewarms the DNS cache when a new Service is added or updated.
66d44a7
to
4a23ba3
Compare
@thibaultcha I added a test which creates a service, waits a bit, and then checks that the DNS request was done, without involving any traffic. |
if utils.hostname_type(data.entity.host) == "name" then | ||
timer_at(0, warmup_hostname_dns_timer, data.entity.host) | ||
end | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is worth noting that this event will only trigger for the node handling the Service creation/update. Those worker events are not propagated to other nodes in the cluster. The benefits of this early DNS query are very limited in production deployments imho, and is mostly optimizing a development/testing use-case.
|
||
if data.operation == "create" or | ||
data.operation == "update" then | ||
if utils.hostname_type(data.entity.host) == "name" then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: a single branch instead of a nested if
would be simpler:
if data.operation == "create" or data.operation == "update"
and utils.hostname_type(data.entity.host) == "name"
then
end
if data.operation == "create" or | ||
data.operation == "update" then | ||
if utils.hostname_type(data.entity.host) == "name" then | ||
timer_at(0, warmup_hostname_dns_timer, data.entity.host) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Timers are a scarce resource. We need error handling allocating here, at least to prevent administrators from not realizing they've maxed-out their number of pending timers.
- It'd be nice to have the ability to force an
asyncQuery
from lua-resty-dns-client, so we don't have to even allocate this timer in the first place, and directly rely on lua-resty-dns-client'sasyncQuery
time. Is that possible with today's lua-resty-dns-client @Tieske?
As it stands, this can easily overwhelm timer resources when performing a large number of subsequent Services creations in a short period of time (e.g. from a script or from delarative config), to the point that given the below comment (on this only warming up the local node), I'm not sure this approach is worth our time.
If we could group the DNS lookups in a single timer, we would be much better off already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding to the above:
- This also only warms up a single worker, which furthermore emphasizes that this is optimizing local/testing environments but doesn't do much to help a production deployment, as far as I can tell?
- We may also be evicting (oldest) entries from the worker's DNS' lrucache in favor of warmup DNS lookups, but without being guaranteed that the warmed-up records will be queried before the evicted records, which somewhat deafeats the purpose of our LRU cache.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'd be nice to have the ability to force an asyncQuery from lua-resty-dns-client, so we don't have to even allocate this timer in the first place, and directly rely on lua-resty-dns-client's asyncQuery time. Is that possible with today's lua-resty-dns-client @Tieske?
No. The asyncQuery
is related to the stale_ttl
setting. Where we return stale data, but fire a refresh query in the background.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was my understanding as well. Alas.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This also only warms up a single worker, which furthermore emphasizes that this is optimizing local/testing environments but doesn't do much to help a production deployment, as far as I can tell?
The benefits would be (if warm up is done in each worker, not just 1) for scaling a Kong cluster, the newly spun up Kong node would be faster in dealing with first requests. This might actually make a difference with "wall-of-traffic" events.
But as said; only if it is done for all the workers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
newly spun up Kong node would be faster in dealing with first requests
This is not what this PR does (at least in DB mode). Making fresh node's requests faster is already handled by 8884973.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thx @thibaultcha I missed that one. That will deal with it indeed. Considering that, imo, this PR is not worth the additional complexity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. @kikito, I think this was a worthwhile experiment, but how about we close this and move on?
One comment here: |
Ok, I have decided to close down this PR without merging - I agree that the complexity is not worth the benefits. |
This is alternative approach to what was originally proposed on #4656. Also warmups all the workers on init (not just worker 0).
This is alternative approach to what was originally proposed on #4656. Also warmups all the workers on init (not just worker 0).
This is alternative approach to what was originally proposed on #4656. Also warmups all the workers on init (not just worker 0).
No description provided.