Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[haproxy] Count available/unavailable backends #1915

Merged
merged 2 commits into from
Sep 28, 2015

Conversation

walter-erquinigo
Copy link

The count_per_status currently counts both backends and frontends. An
important query is how many backends are available or unavailable
disregarding the big set of tags, which causes the typeaheads in the
datadog ui to never load.
In order to achieve this, a new counter 'haproxy.backend_hosts' is added.
It has two tags: service and available. The latter is either true or
false.

@igor47
Copy link
Contributor

igor47 commented Sep 15, 2015

i think another problem is that in busy environments, the number of tags on the count_per_status metric gets very large. adding additional tags is going to make things even worse.

in our environment, the count_per_status metric never loads because there are tons of services, but there are also tons of statuses. the statuses reported are not super-useful -- i rarely (read:never) want to see a count of backends in status down_1/3.

it might be nice to create a totally separate metric, like haproxy.backend_hosts; that metric would only include the hosts on a backend, and would not include the backend itself, or the frontend. then we can tag that metric using available: which would take on true or false.

@igor47
Copy link
Contributor

igor47 commented Sep 15, 2015

because the logic appears mostly lifted from the status counting function, it still suffers from the problem that we're going to be including the backend itself in the counts and not just the hosts in the backend. that is, if there are 2 hosts in a backend that are both up, we're going to end up reporting 3 hosts as up (2 hosts + the backend) instead of two hosts

not sure how to counter that; maybe for the backend the hostname is null or something? need to dig a little deeper.

@walter-erquinigo
Copy link
Author

I've read the code and the weirdest part I've found is this https://github.com/DataDog/dd-agent/blob/master/checks.d/haproxy.py#L212
supposedly hosts_statuses never has the BACKEND host, so very likely there is already a bug there, I'll take a deeper look

@walter-erquinigo walter-erquinigo changed the title [haproxy] Count available/unavaiable backends and frontends [haproxy] Count available/unavailable backends Sep 15, 2015
The count_per_status currently counts both backends and frontends. An
important query is how many backends are available or unavailable
disregarding the big set of tags, which causes the typeaheads in the
datadog ui to never load.
In order to achieve this, a new counter 'haproxy.backend_hosts' is added.
It has two tags: service and available. The latter is either true or
false.
@walter-erquinigo
Copy link
Author

Igor, it seems that the code is ok now. There are 3 main comments:

  • The new metric haproxy.backend_hosts does what it does, it only counts backends the way you proposed.
  • I wrote a test that covers the most common cases: remove the FRONTEND and BACKEND and divide the rest among available:true/false according to the statuses UP, DOWN and MAINT.
  • Our internal metric is wrong because we have an old version of the agent that doesn't discard the BACKEND:

ours =>
screen shot 2015-09-21 at 3 45 24 pm

upstream =>
screen shot 2015-09-21 at 3 45 38 pm

I previously thought that we had a problem with the case of the word BACKEND but after double checking this is not the case at the dd-agent level. We just need to update the version of the agent in our side. Furthermore, when we use the change of this PR internally we will be discarding correctly the BACKEND.

The tests pass on Travis

@walter-erquinigo
Copy link
Author

@remh, this is ready to be reviewed

@simnyc
Copy link
Contributor

simnyc commented Sep 24, 2015

@a20012251 the agent team will review this shortly, the goal is to release it with 5.6

@walter-erquinigo
Copy link
Author

thanks!

@remh
Copy link
Contributor

remh commented Sep 28, 2015

It looks good to me thnaks a lot @a20012251 and @igor47

remh added a commit that referenced this pull request Sep 28, 2015
[haproxy] Count available/unavailable backends
@remh remh merged commit ce600bf into DataDog:master Sep 28, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants