Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[docker] Add host tag for docker swarm node role. #3735

Merged
merged 1 commit into from May 15, 2018

Conversation

devonboyer
Copy link

@devonboyer devonboyer commented May 10, 2018

Note: Please remember to review the Datadog Contribution Guidelines
if you have not yet done so.

What does this PR do?

This adds a host tag to docker swarm nodes containing the node role (manager or worker).

Motivation

Testing Guidelines

screen shot 2018-05-10 at 10 14 54 am

screen shot 2018-05-10 at 10 15 43 am

Additional Notes

To determine if the node is a manager/worker this checks the ControlAvailable flag in the response from docker info the same it is done by the docker cli.

@devonboyer devonboyer requested a review from a team May 10, 2018 14:17
Copy link
Contributor

@CharlyF CharlyF left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great proposal.
I think the tags at the host level might be too redundant with the container meta @mfpierre is working on. Being able to tag the metrics with the node role is the goal here.
As you implemented the method to extract those, I think we can get the best out of the two worlds:
Use the container meta for the tag at the host level (i.e. infra list) and use the tag extractor in the check to add the tags to the docker metrics

@devonboyer
Copy link
Author

We would want people to be able to use this new docker_swarm_node_role tag to filter metrics as well so if its just added to metrics using the tag extractor in the check that will work.

I'm still not clear on why it might make more sense as host metadata vs. being a host tag? If a tag is included on all metrics anyway isn't that the same thing as being a host tag?

@hkaj
Copy link
Member

hkaj commented May 11, 2018

The core difference between host tags and host metadata is that host tags are part of the context of metrics coming from said host, whereas host metadata is not. So a change in host tags invalidates all metrics coming from the host and will require creating new time series for these metrics. This is why we avoid using host tags for things that can change frequently and/or all at once (or in a short time frame) across many hosts (like the docker version for example).

We also allow filtering by host tags in queries, but not by host metadata.

In this case it's fine to use tags as hosts don't flap between worker/manager.

Copy link
Member

@hkaj hkaj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great first contrib 🎉
LGTM

@devonboyer
Copy link
Author

devonboyer commented May 11, 2018

@hkaj

I did some testing to determine what would happen with the tags if a worker node was promoted to a manager and/or manager demoted to a worker.

Since the host metadata (and host tags) is collected every 4 hours by default, it can take a long time for the tags to get updated for metrics (unless the agent is restarted).

If it makes sense to have the docker_swarm_node_role tag to be updated right away for metrics I think we would need to use the tag extractor @CharlyF mentioned and also maybe include this information in host metadata so it's easy for people to check in the infrastructure list (although the metadata would be slow to get updated for the same reason).

@xvello
Copy link
Contributor

xvello commented May 11, 2018

Please note that tags from the Agent5 orchestrator package / Agent6 Tagger are only applied to docker / kubernetes metrics and AutoDiscovery checks (but are constant once the check template is resolved).
They would not be added to system.cpu.usage for example.

@hkaj
Copy link
Member

hkaj commented May 11, 2018

This is a good point.

However host tags allow us to factor this info in a single payload instead of attaching it to each metric (it's common for hosts to have 10's of tags x thousands of metrics).

And promoting a worker to manager shouldn't happen often in prod. Immutability dictates that one should rather create a new node built and configured for that purpose. I don't think it's too bad to impose the 4h delay in this context.

edit: + what Xavier said

@devonboyer
Copy link
Author

@xvello Could you elaborate on what you mean by

but are constant once the check template is resolved

I am seeing metrics for system.cpu.usage in Datadog tagged with docker_swarm_node_role which is not a Docker/Kubernetes metric.

@devonboyer
Copy link
Author

@hkaj I'm not sure where we left off in the discussion for this PR. I am not sure I understand @xvello's last comment.

It sounds like we agree that keeping this as a host_tag is fine.

@xvello
Copy link
Contributor

xvello commented May 15, 2018

@devonboyer indeed, host tag will be better suited than container tag for this use case, let's go with that code

@devonboyer devonboyer merged commit c20819f into master May 15, 2018
@devonboyer devonboyer deleted the devon/swarm-node-role-tag branch May 15, 2018 14:55
@truthbk truthbk added this to the 5.25.0 milestone Jun 11, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants