Conversation
Is there a way to test this is working the way you expect? What lead you
to find this was wrong?
What you wrote seems to make sense for us, do we know why it wasnt that way
the whole time?
One note: I don't see the inet call in the old way, what makes that call
not need the lookup?
…On Jun 11, 2017 9:27 AM, "JoshRagem" ***@***.***> wrote:
to avoid millions of lookups to an adress that will almost never change.
dns round robin should help the worker pool balance across all agents at
the hostname.
@capodilupo <https://github.com/capodilupo> @jblanchette
<https://github.com/jblanchette>
------------------------------
You can view, comment on, or merge this pull request online at:
#27
Commit Summary
- resolve agent ip on init
File Changes
- *A* rebar.lock
<https://github.com/WhoopInc/dogstatsde/pull/27/files#diff-0> (8)
- *M* src/dogstatsd_worker.erl
<https://github.com/WhoopInc/dogstatsde/pull/27/files#diff-1> (3)
Patch Links:
- https://github.com/WhoopInc/dogstatsde/pull/27.patch
- https://github.com/WhoopInc/dogstatsde/pull/27.diff
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#27>, or mute the thread
<https://github.com/notifications/unsubscribe-auth/AGSxLabzI6HpppzBkFGgppvnrxvijOJeks5sC-s6gaJpZM4N2ZLC>
.
|
I think setting up just a regular test system with a dummy
I have not testing dns ttl on k8s yet but I was investigating TTL options in kube-dns and dns masq and I realized that we want a very different TTL for the datadog agents because they don't change and they are the majority of lookups. look at this line: https://github.com/WhoopInc/dogstatsde/blob/master/src/dogstatsd_worker.erl#L113 my change is effectively a TTL of infinity for the ddog agent address. |
src/dogstatsd_worker.erl
Outdated
@@ -33,9 +33,10 @@ init([]) -> | |||
State = case stillir:get_config(dogstatsd, send_metrics) of | |||
true -> | |||
{ok, Socket} = gen_udp:open(0), | |||
Ip = inet:getaddr(stillir:get_config(dogstatsd, agent_port), inet), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
think you meant (agent_port -> agent_address)
Ip = inet:getaddr(stillir:get_config(dogstatsd, agent_address), inet),
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
If you've got the statsd agent as a daemon set, then I think what you actually want is to always send to your local instance. When I was setting things up, I couldn't figure out how to do that and so I created a service as a work-around. If you resolve DNS only once and use a service then metrics may stop working any time a statsd agent goes down (which would happen on upgrades or scale-in events). Maybe there's a way a pod can look up which node it's running on? If so, you you could stuff the appropriate node->ip mapping into etcd. |
support for getting the node ip from a pod is slated for 1.7 per kubernetes/kubernetes#42717 the statd agents are running as a daemonset and the service points at them, but the dns service crashes under high dns load, not the statsd agents, so taking load off dns seems very valuable. losing metrics due to agent crash is no big deal for services since they send over udp, but dns service failure is very bad. if I remember my k8s right, the dns returns the ip of the service and the proxy forwards to the actual pods. This branch effectively removes a whole step from that pathway. having the host ip would be better once that is available. |
(it looks like 1.7 will be released at the end of this month: https://github.com/kubernetes/features/blob/master/release-1.7/release-1.7.md) |
Sounds reasonable, then. In general, it makes sense to me that you'd want to resolve up front since the typical case is that you want to send to localhost and it's just this slightly-weird use of Kubernetes that might make it matter. I don't remember how services work; I was just assuming they must be DNS based if DNS cache is so short. But maybe that's for other reasons. |
to avoid millions of lookups to an adress that will almost never change
closing this because I'm tired of seeing it in my list and it's not really merge-able |
to avoid millions of lookups to an adress that will almost never change. dns round robin should help the worker pool balance across all agents at the hostname.
@capodilupo @jblanchette