resolve agent ip on init #27

JoshRagem · 2017-06-11T13:27:21Z

to avoid millions of lookups to an adress that will almost never change. dns round robin should help the worker pool balance across all agents at the hostname.

@capodilupo @jblanchette

jblanchette · 2017-06-11T13:32:14Z

Is there a way to test this is working the way you expect? What lead you to find this was wrong? What you wrote seems to make sense for us, do we know why it wasnt that way the whole time? One note: I don't see the inet call in the old way, what makes that call not need the lookup?

…

On Jun 11, 2017 9:27 AM, "JoshRagem" ***@***.***> wrote: to avoid millions of lookups to an adress that will almost never change. dns round robin should help the worker pool balance across all agents at the hostname. @capodilupo <https://github.com/capodilupo> @jblanchette <https://github.com/jblanchette> ------------------------------ You can view, comment on, or merge this pull request online at: #27 Commit Summary - resolve agent ip on init File Changes - *A* rebar.lock <https://github.com/WhoopInc/dogstatsde/pull/27/files#diff-0> (8) - *M* src/dogstatsd_worker.erl <https://github.com/WhoopInc/dogstatsde/pull/27/files#diff-1> (3) Patch Links: - https://github.com/WhoopInc/dogstatsde/pull/27.patch - https://github.com/WhoopInc/dogstatsde/pull/27.diff — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#27>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AGSxLabzI6HpppzBkFGgppvnrxvijOJeks5sC-s6gaJpZM4N2ZLC> .

JoshRagem · 2017-06-11T13:42:31Z

@jblanchette

Is there a way to test this is working the way you expect?

I think setting up just a regular test system with a dummy agent allows us to test that this library still works. Also some testing on k8s to check the actual dns ttl for services (I read conflicting things about this).

What lead you to find this was wrong? ... in the old way, what makes that call
not need the lookup?

I have not testing dns ttl on k8s yet but I was investigating TTL options in kube-dns and dns masq and I realized that we want a very different TTL for the datadog agents because they don't change and they are the majority of lookups. look at this line: https://github.com/WhoopInc/dogstatsde/blob/master/src/dogstatsd_worker.erl#L113
since a hostname is passed in to gen_udp:send it will do a lookup on each message. This function will also take an inet ip addr so it should work with my change.

my change is effectively a TTL of infinity for the ddog agent address.

capodilupo · 2017-06-11T13:46:53Z

src/dogstatsd_worker.erl

@@ -33,9 +33,10 @@ init([]) ->
    State = case stillir:get_config(dogstatsd, send_metrics) of
                true ->
                    {ok, Socket} = gen_udp:open(0),
+                    Ip = inet:getaddr(stillir:get_config(dogstatsd, agent_port), inet),


think you meant (agent_port -> agent_address)
Ip = inet:getaddr(stillir:get_config(dogstatsd, agent_address), inet),

waisbrot · 2017-06-11T18:00:24Z

If you've got the statsd agent as a daemon set, then I think what you actually want is to always send to your local instance.

When I was setting things up, I couldn't figure out how to do that and so I created a service as a work-around. If you resolve DNS only once and use a service then metrics may stop working any time a statsd agent goes down (which would happen on upgrades or scale-in events).

Maybe there's a way a pod can look up which node it's running on? If so, you you could stuff the appropriate node->ip mapping into etcd.

JoshRagem · 2017-06-11T18:27:00Z

support for getting the node ip from a pod is slated for 1.7 per kubernetes/kubernetes#42717

the statd agents are running as a daemonset and the service points at them, but the dns service crashes under high dns load, not the statsd agents, so taking load off dns seems very valuable. losing metrics due to agent crash is no big deal for services since they send over udp, but dns service failure is very bad.

if I remember my k8s right, the dns returns the ip of the service and the proxy forwards to the actual pods. This branch effectively removes a whole step from that pathway. having the host ip would be better once that is available.

JoshRagem · 2017-06-11T18:28:18Z

(it looks like 1.7 will be released at the end of this month: https://github.com/kubernetes/features/blob/master/release-1.7/release-1.7.md)

waisbrot · 2017-06-11T19:23:23Z

Sounds reasonable, then.

In general, it makes sense to me that you'd want to resolve up front since the typical case is that you want to send to localhost and it's just this slightly-weird use of Kubernetes that might make it matter. I don't remember how services work; I was just assuming they must be DNS based if DNS cache is so short. But maybe that's for other reasons.

to avoid millions of lookups to an adress that will almost never change

JoshRagem · 2018-01-02T15:33:37Z

closing this because I'm tired of seeing it in my list and it's not really merge-able

capodilupo reviewed Jun 11, 2017

View reviewed changes

JoshRagem force-pushed the one-lookup branch from 3117fae to 5b987f4 Compare June 11, 2017 15:19

resolve agent ip on init

f634ea3

to avoid millions of lookups to an adress that will almost never change

JoshRagem force-pushed the one-lookup branch from 5b987f4 to f634ea3 Compare June 12, 2017 15:42

JoshRagem closed this Jan 2, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

resolve agent ip on init #27

resolve agent ip on init #27

JoshRagem commented Jun 11, 2017

jblanchette commented Jun 11, 2017 via email

JoshRagem commented Jun 11, 2017

capodilupo Jun 11, 2017

JoshRagem Jun 11, 2017

waisbrot commented Jun 11, 2017

JoshRagem commented Jun 11, 2017

JoshRagem commented Jun 11, 2017

waisbrot commented Jun 11, 2017

JoshRagem commented Jan 2, 2018

resolve agent ip on init #27

resolve agent ip on init #27

Conversation

JoshRagem commented Jun 11, 2017

jblanchette commented Jun 11, 2017 via email

JoshRagem commented Jun 11, 2017

capodilupo Jun 11, 2017

Choose a reason for hiding this comment

JoshRagem Jun 11, 2017

Choose a reason for hiding this comment

waisbrot commented Jun 11, 2017

JoshRagem commented Jun 11, 2017

JoshRagem commented Jun 11, 2017

waisbrot commented Jun 11, 2017

JoshRagem commented Jan 2, 2018