Skip to content
This repository has been archived by the owner on Mar 26, 2024. It is now read-only.

resolve agent ip on init #27

Closed
wants to merge 1 commit into from
Closed

resolve agent ip on init #27

wants to merge 1 commit into from

Conversation

JoshRagem
Copy link

to avoid millions of lookups to an adress that will almost never change. dns round robin should help the worker pool balance across all agents at the hostname.

@capodilupo @jblanchette

@jblanchette
Copy link

jblanchette commented Jun 11, 2017 via email

@JoshRagem
Copy link
Author

@jblanchette

Is there a way to test this is working the way you expect?

I think setting up just a regular test system with a dummy agent allows us to test that this library still works. Also some testing on k8s to check the actual dns ttl for services (I read conflicting things about this).

What lead you to find this was wrong? ... in the old way, what makes that call
not need the lookup?

I have not testing dns ttl on k8s yet but I was investigating TTL options in kube-dns and dns masq and I realized that we want a very different TTL for the datadog agents because they don't change and they are the majority of lookups. look at this line: https://github.com/WhoopInc/dogstatsde/blob/master/src/dogstatsd_worker.erl#L113
since a hostname is passed in to gen_udp:send it will do a lookup on each message. This function will also take an inet ip addr so it should work with my change.

my change is effectively a TTL of infinity for the ddog agent address.

@@ -33,9 +33,10 @@ init([]) ->
State = case stillir:get_config(dogstatsd, send_metrics) of
true ->
{ok, Socket} = gen_udp:open(0),
Ip = inet:getaddr(stillir:get_config(dogstatsd, agent_port), inet),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

think you meant (agent_port -> agent_address)
Ip = inet:getaddr(stillir:get_config(dogstatsd, agent_address), inet),

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@waisbrot
Copy link
Contributor

If you've got the statsd agent as a daemon set, then I think what you actually want is to always send to your local instance.

When I was setting things up, I couldn't figure out how to do that and so I created a service as a work-around. If you resolve DNS only once and use a service then metrics may stop working any time a statsd agent goes down (which would happen on upgrades or scale-in events).

Maybe there's a way a pod can look up which node it's running on? If so, you you could stuff the appropriate node->ip mapping into etcd.

@JoshRagem
Copy link
Author

support for getting the node ip from a pod is slated for 1.7 per kubernetes/kubernetes#42717

the statd agents are running as a daemonset and the service points at them, but the dns service crashes under high dns load, not the statsd agents, so taking load off dns seems very valuable. losing metrics due to agent crash is no big deal for services since they send over udp, but dns service failure is very bad.

if I remember my k8s right, the dns returns the ip of the service and the proxy forwards to the actual pods. This branch effectively removes a whole step from that pathway. having the host ip would be better once that is available.

@JoshRagem
Copy link
Author

(it looks like 1.7 will be released at the end of this month: https://github.com/kubernetes/features/blob/master/release-1.7/release-1.7.md)

@waisbrot
Copy link
Contributor

Sounds reasonable, then.

In general, it makes sense to me that you'd want to resolve up front since the typical case is that you want to send to localhost and it's just this slightly-weird use of Kubernetes that might make it matter. I don't remember how services work; I was just assuming they must be DNS based if DNS cache is so short. But maybe that's for other reasons.

to avoid millions of lookups to an adress that
will almost never change
@JoshRagem
Copy link
Author

closing this because I'm tired of seeing it in my list and it's not really merge-able

@JoshRagem JoshRagem closed this Jan 2, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants